USAGE
uweb [OPTIONS] [input.uw] > output.html
-h
,--help
-
Show this help message.
-t template-directory
,--template-dir template-directory
-
Specify the directory containing the ERB template files to use. By default, `uweb will use the template files located next to the program itself.
-o file
,--output file
-
The file to redirect the standard output to.
INPUT
If no input file or the special -
input file is given, then uweb
will read
the standard input.
The input is a valid XHTML (or HTML if you are sloppy). This allows the input
file to be maintained by any X/HTML editor. To achieve this, uweb
uses a
specific form of normal X/HTML elements for its commands. The commands are
detected using regular expressions, so they are a bit fussy about the format;
each command must appear in a line of its own and the order of attributes must
be exactly as described below. However, you can be free with white spaces, the
quote character you use and the case (upper or lower).
All the command tags are empty (have no text content), and uweb will accept
all possible ways of expressing them - `<tag …>
, <tag … />
, and <tag
…></tag>
. Choose the form you prefer according to whether you are using
XHTML or HTML, and the tools you use (some are fussier than others).
<link rel='source' href='relative path'>
Scans the specified file for source chunks. These chunks can then be embedded
into the generated documentation. Note that no output is generated at this
point, and uweb
will silently ignore source file lines that are not contained
in any chunk. It is good practice to surround the whole source file with a
top-level source chunk, to ensure all the code is properly documented.
A chunk is designated by surrounding comment lines. Different systems use
different ways to mark chunks (or foldable regions). Vim and Emacs recognize
regions using comments containing {{{ name
and ending with }}} [name]
.
These are typically surrounded by arbitrary non-alphanumeric comment
indicators such as //
, /* … */
, --
etc., which are ignored. In
contrast, Visual Studio doesn’t use comments; instead it accepts the directives
#region name
and #endregion [name]
.
All these forms are recognized by uweb
. Adding additional ones is a simple
matter of modifying the regular expression used to detect chunk comments. Note
that specifying the name after the end of the chunk is optional, but it if is
specified it must match the name given at the beginning of the chunk.
Lines can be excluded from being scanned for uweb
comment lines by using the
special chunk name !uweb
. In this case specifying the name after the end of
the chunk is mandatory.
If uweb
encounters several chunks with the same name - in any of the scanned
source files - it expects them to contain exactly the same content (up to
indentation). This is the closest uweb
can come to actually reusing the same
chunk.
All scan commands must appear before any chunk embedding commands, to allow
uweb
to calculate all cross references. Hence uweb
uses a <link>
tag
which must appear in the <head> section of the X/HTML document, before any of
the <body>
content.
<a rel='include' name='relative path'>
Includes the content of the specified file at this point, as if it was part of the original input. This allows breaking up the input to several files for easier maintenance.
<a rel='chunk' name='chunk name'>
Embed the specified named chunk at this point of the generated documentation. This command may only appear after the last <link rel='chunk'> was processed. This should automatically be the case as <link> tags may only appear in the <head> section. The chunk name must match one of the chunks detected when scanning the linked source files.
If all the chunk’s text lines are indented by some amount, then it is stripped from the generated documentation lines. Nested chunks are converted to hyperlinks; they need to be explicitly embedded elsewhere in the documentation.
OUTPUT
The uweb
output is an expanded valid XHTML (or HTML if you are sloppy)
documentation file. It is basically a copy of the input file, with <link
rel='source'>
tags removed, and <a rel='include'>
or <a rel='chunk'>
tags
expanded to the appropriate content.
A common practice is to pipe the generated X/HTML through htmltoc, hypertoc or
a similar program to automatically generate a table of contents based on the
standard X/HTML header tags (<h1>
, <h2>
, …).
The formatting of embedded chunks is determined by a two ERB files that are located in the template directory (by default, next to the program itself). Customizing or overriding these files allows to control the generated X/HTML documentation.
uweb.chunk
This file contains an ERB template is used to convert the <a rel='chunk'>
tag
into X/HTML documentation. Variables accessible to ERB are:
name
-
Of the embedded chunk.
refers_to
-
List of names of chunks included by this one.
nested_in
-
List of names of chunks that include this one.
lines
-
List of source lines of this chunk, without the stripped indentation and without the terminating line break characters. Nested chunks are already converted to hyperlinks.
locations
-
List of the locations of this chunk in the scanned source files. Typically a chunk only appears in a single source file. However, some chunks are "reused" across several source files. Each entry is an object with the following fields:
path
-
Of the source file this chunk appears in.
first_line
-
Index of the first chunk source line in the file.
last_line
-
Index of the last chunk source line in the file.
All indices are one-based. Apply the String.idify
method to names when using
them in anchors.
uweb.nest
This template is used to collapse included chunks into hyperlinks to the embedded nested chunk. Variables accessible to ERB are:
indent
-
The additional indentation spaces stripped from the nested chunk, compared to the containing chunk.
from
-
The name of the containing chunk.
to
-
The name of the nested chunk.
Again, apply the String.idify
method to names when using them in anchors.
"Inverse" Literate Programming
Literate programming is a concept invented by Knuth, where the program is written as a single source file called a "web" (this was 1981, before there was an "Internet", never mind the "World Wide Web"). This "web" is "weaved" to generate the human readable documentation and "tangled" to to generate both program’s machine readable source code. The "web" consists of a linear presentation of the program (an article, a manual or even a book), containing embedded code "chunks". Thus one needs to "tangle" the web, reordering and combining the chunks into one or more source files that are then compiled as usual. In contrast, "weaving" is mainly concerned with annotating the chunks with cross-references, generating indices and a table of contents, pretty-printing the code and other formatting issues.
The key insight of literate programming is overcoming the gap between the best human presentation order, and the structural requirements imposed by the programming language(s) used. The classical example at the time was allowing to define a C function once in the narrative, then generating both a '.h' declaration and a '.c' definition for it.
Literate programming has never become mainstream, especially with the introduction of advanced IDEs with features such as intelligent auto-completion and refactoring. However the notion of automatically generating documentation from a source file (through "structured comments") has gained popularity with JavaDoc and its innumerable clones for other languages.
These popular tools, however, give up on the key insight of creating a linear narrative for optimal presentation of the program. Instead their documentation structure closely follows the physical code structure (typically a loosely coupled collection of classes). This makes them ideally suited for generating random-access library reference manuals. In contrast, literate programming excels at describing "read input, run algorithm, write output" programs.
The use of "inverse" literate programming allows generating a "classical" literate programming style document from arbitrary source files, without giving up on the use of IDEs, build systems etc. It even allows to retrofit such documentation to any existing project with minimal disruption to the existing source files.
The key idea (which uweb
shamelessly lifts from ProgDoc
is to invert the "tangle" step and incorporate it into the "weaving". That is,
instead of generating the source from chunks, extract the chunks from the
source files, and embed them into placeholders specified in the documentation.
This is how most code-related papers or articles are written in practice.
Note the term "inverse literate programming" was used by Heiko Kirschke to describe what seems to be a "structured comments" tool for LISP. Also, the term "reverse literate programming" was used by Markus Knasmuller in a different meaning then the given above. ProgDoc implements "inverse" literate programming but does not use a special term for it.
There are other tools implementing "inverse literate programming" in the sense used here, for example Codnar and Antiweb.
Why uweb
There are many literate programming tools, each with its own set of trade-offs.
The uweb
tool goals are, in descending order of importance:
Inverse literate programming
By providing an inverted form of literate programming (shamelessly lifted
from ProgDoc), uweb
allows the source files to be maintained using your
favorite tools such as build systems or advanced syntax-highlighting
auto-completing refactoring wizard-infested IDEs. This also makes it possible
to retrofit literate programming documentation to existing projects.
Focus on HTML
The uweb
input file is a valid X/HTML file (if you want it to be). This
allows any X/HTML editor to be used to maintain it, though you might have to
be careful about the formatting of the uweb
commands.
If you want to focus on printed documentation, uweb
is probably not the best
tool. It is of course possible to generate PDF (or even LaTeX) from the
X/HTML file, but you would get a much greater degree of control over the
output using many other literate programming tools.
Simplicity
The uweb
tool uses a minimal set of commands and options. It is implemented
as a single Ruby file accompanied by two ERB template files, which you can
drop anywhere in your path. The whole script is about 400 lines of code
accompanied by roughly 300 lines of documentation.
Language independence
This means no code pretty printing, automatic indexing of identifiers, or any similar language-specific advanced features. This trade-off is common to many literate programming tools, though several allow for language-specific plug-ins to implement some of these features.
Customization
By editing or overriding the template files bundled with uweb
, it is easy to
customize the generated X/HTML documentation format. It is of course also
possible to customize the appearance of the result using CSS.
For more advanced features (e.g. indices, automatically numbered table of
contents, PDF generation), you can adapt the uweb
templates to work with a
different input documentation format, such as DocBook (or even LaTeX).
In this case, however, you would probably also want to tweak the format of
the uweb
commands as well, by modifying the regular expressions used for this
purpose (isolated from the rest of the code and clearly marked at the end of
the uweb
script).
Portability
The uweb
implementation is a pure Ruby file. It should work "out of the box"
on any Windows or UNIX platform. If you don’t want to install Ruby, it should
be possible to convert it to a standalone executable using the rubyscript2exe
compiler.
AUTHOR
VERSION
This is uweb version 0.3. For the latest version see github.
LICENSE
Copyright © 2008, 2016 Oren Ben-Kiki
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with uweb. If not, see http://www.gnu.org/licenses/.