uweb

This describes the internals of the uweb program. It also serves as an example on how to write files using it. See the uweb manual page for a description of the command line flags, input format, etc.

Requirements

The program requires very few gems, most notably erb. It also supports being compiled into a standalone binary using rubyscript2exe.

Requirements
Location(s): uweb.rb (306 - 315)
require 'set'
require 'getoptlong'
require 'erb'
begin
  require 'rubyscript2exe'
rescue LoadError
  # Not using rubyscript2exe.
end

Main Program

The main program is pretty boring: parse command line, load templates, process input, done.

Main program
Location(s): uweb.rb (325 - 369)
def main
  parse_options
  load_templates
  $did_embed_chunks = false
  process_file($input)
  Chunk.verify_all_used if $warnings
end

def die(message)
  abort("#{$in_path}(#{$at_line_number}): #{message}")
end

def parse_options
  $template_dir = defined?(RUBYSCRIPT2EXE) ? RUBYSCRIPT2EXE.exedir \
                                           : File.dirname($0)
  $warnings = true
  GetoptLong.new([ '--help',          '-h', GetoptLong::NO_ARGUMENT ],
                 [ '--template-dir',  '-t', GetoptLong::REQUIRED_ARGUMENT ],
                 [ '--output-file',   '-o', GetoptLong::REQUIRED_ARGUMENT ],
                 [ '--no-warnings',   '-n', GetoptLong::NO_ARGUMENT ]) \
            .each do |option, argument|
    case option
      when '--help'
        usage
      when '--template-dir'
        $template_dir = argument
      when '--output-file'
        $stdout = File.open(argument, 'w')
      when '--no-warnings'
        $warnings = false
      else
        abort("Unknown option \"#{option}\"")
    end
  end
  case ARGV.length
  when 0
    $input = '-'
  when 1
    $input = ARGV[0]
  else
    usage
  end
end

Usage

Since RDoc::usage is no longer with us, we provide our own version. The output is in valid AsciiDoc format and can be passed to asciidoctor to generate a manual page or an HTML file.

Usage
Location(s): uweb.rb (317 - 323)
# Since RDoc::usage is no more.
def usage
  print $usage
  exit(0)
end

Loading templates

We let erb do the heavy lifting for us here.

Loading templates
Location(s): uweb.rb (371 - 382)
def load_templates
  $chunk_template = load_template('uweb.chunk')
  $nest_template = load_template('uweb.nest')
end

def load_template(path)
  path = $template_dir + '/' + path
  abort("Missing template file \"#{path}\"") unless File.exist?(path)
  return ERB.new(IO.read(path))
end

Processing input

We simply scan the input one line at a time, comparing them with the pre-established patterns.

Processing input
Location(s): uweb.rb (384 - 457)
def process_file(file)
  each_file_line(file) do |line|
    case line
    when $source_pattern
      scan_source($source_pattern.extract(line))
    when $chunk_pattern
      embed_chunk($chunk_pattern.extract(line))
    when $include_pattern
      process_file($include_pattern.extract(line))
    else
      puts line
    end
  end
end

def each_file_line(path)
  $in_path ||= nil
  save_path = $in_path
  $in_path = path
  $at_line_number ||= nil
  save_line_number = $at_line_number
  $at_line_number = 0
  in_file = path == '-' ? $stdin : in_file = File.open($in_path, 'r')
  in_file.each_line do |line|
    line.chomp!
    $at_line_number += 1
    yield line
  end
  $in_path = save_path
  $at_line_number = save_line_number
end

Scanning sources

The patterns we look for in the input X/HTML files are:

Input patterns
Location(s): uweb.rb (664 - 741)
$source_pattern =
  Pattern.new("<link rel='source'> tag",
              / <
                \s* link
                \s .* rel
                \s* =
                \s* ['"]? source ['"]?
              /ix,
              / ^
                \s* <
                \s* link
                \s+ rel
                \s* =
                \s* ['"] source ['"]
                \s+ href
                \s* =
                \s* ['"] (.+) ['"]
                \s* \/?
                \s* >
                (?: \s* <
                    \s* link
                    \s* \/
                    \s* > )?
                \s* $
              /ix)

$include_pattern =
  Pattern.new("<a rel='include'> tag",
              / <
                \s* a
                \s .* rel
                \s* =
                \s* ['"] include ['"]
              /ix,
              / ^
                \s* <
                \s* a
                \s+ rel
                \s* =
                \s* ['"] include ['"]
                \s+ name
                \s* =
                \s* ['"] (.+) ['"]
                \s* \/?
                \s* >
                (?: \s* <
                    \s* a
                    \s* \/
                    \s* > )?
                \s* $
              /ix)

$chunk_pattern =
  Pattern.new("<a rel='chunk'> tag",
              / <
                \s* a
                \s .* rel
                \s* =
                \s* ['"] chunk ['"]
              /ix,
              / ^
                \s* <
                \s* a
                \s+ rel \s* =
                \s* ['"] chunk ['"]
                \s+ name
                \s* =
                \s* ['"] (.+) ['"]
                \s* \/?
                \s* >
                (?: \s* <
                    \s* a
                    \s* \/
                    \s* > )?
                \s* $
              /ix)

Scanning sources

We also scan any sources the X/HTML input links to:

Scanning sources
Location(s): uweb.rb (417 - 456)
Nested in: Processing input
$is_in_unscanned_chunk = false

def scan_source(path)
  die("<link rel='source'> after <a rel='chunk'>") if $did_embed_chunks
  instances = []
  each_file_line(path) do |line|
    if $end_pattern === line
      name = $end_pattern.extract(line)
      if not $is_in_unscanned_chunk
        die('End chunk outside any chunk') unless instances.size > 0
        die('End chunk "' + name \
          + '" does not match start chunk "' + instances.last.chunk.name + '"') \
          if name and name != instances.last.chunk.name
        instances.last.end_scan
        instances.pop
      elsif name == '!uweb'
        $is_in_unscanned_chunk = false
      end
      next
    end
    if !$is_in_unscanned_chunk and $begin_pattern === line
      name = $begin_pattern.extract(line)
      $is_in_unscanned_chunk = name == '!uweb'
      unless $is_in_unscanned_chunk
        die("Chunk #{name} contains itself") \
          if instances.any? { |instance| instance.chunk.name == name }
        parent = instances.last
        instance = Chunk.begin_scan(name)
        instances.push(instance)
        parent.add(instance) if parent
      end
      next
    end
    instances.last.add(line) if instances.last
  end
  die("Missing end of chunk \"#{instances.last.chunk.name}\"") \
    if instances.size > 0
end

The patterns we look for in the source files are:

Source patterns
Location(s): uweb.rb (743 - 769)
$begin_pattern =
  Pattern.new('Begin chunk',
              / (?: \{\{\{
                  | \# \s* region )
              /ix,
              / ^
                (?: \W* \{\{\{
                  | \s* \# \s* region )
                \s+ ( .+? )?
                (?: \s* \W+ )?
                \s* $
              /ix)

$end_pattern =
  Pattern.new('End chunk',
              / (?: \}\}\}
                  | \# \s* endregion )
              /ix,
              / ^
                (?: \W* \}\}\}
                  | \s* \# \s* region )
                (?: \s+ ( .+? ) )?
                (?: \s* \W+ )?
                \s* $
              /ix)

Detecting Patterns

A pattern is just a glorified regular expression. There is some trickiness in using global state to decide whether to complain or ignore "bad" lines, used to implement the special !uweb section.

Detecting patterns
Location(s): uweb.rb (613 - 631)
class Pattern
  def initialize(name, detect_regexp, extract_regexp)
    @name = name
    @detect_regexp = detect_regexp
    @extract_regexp = extract_regexp
  end

  def ===(line)
    return line =~ @detect_regexp
  end

  def extract(line)
    match = @extract_regexp.match(line)
    die('Invalid ' + @name) unless $is_in_unscanned_chunk or match
    return match && match[1]
  end
end

Embedding chunks

The whole point of the program is to embed chunks of sources into the generated X/HTML:

Embedding chunks
Location(s): uweb.rb (459 - 486)
def embed_chunk(name)
  $did_embed_chunks = true
  chunk = Chunk.by_name(name)
  chunk.is_used = true
  # TRICKY: Captured by the binding.
  name = name = chunk.name
  refers_to = refers_to = chunk.refers_to.size == 0 ? nil : chunk.refers_to.entries.sort
  nested_in = nested_in = chunk.nested_in.size == 0 ? nil : chunk.nested_in.entries.sort
  locations = locations =
    chunk.instances.entries \
         .map { |instance| instance.location } \
         .sort
  lines = lines = \
    chunk.instances.entries[0].content \
          .map { |content| content.class == Instance ? nest_chunk(chunk, content) \
                                                     : content.escape_xhtml }
  $chunk_template.run(binding)
end

def nest_chunk(from_chunk, to_instance)
  # TRICKY: Captured by the binding.
  indent = indent = to_instance.indentation || ''
  from = from = from_chunk.name
  to = to = to_instance.chunk.name
  return $nest_template.result(binding).chomp
end

Tracking chunks

To support the above, we need to track all the chunks we extracted in the source files:

Tracking chunks
Location(s): uweb.rb (488 - 535)
class Chunk
  def Chunk.begin_scan(name)
    @@chunk_by_name ||= {}
    return Instance.new(@@chunk_by_name[name] ||= Chunk.new(name))
  end

  def Chunk.by_name(name)
    @@chunk_by_name ||= {}
    chunk = @@chunk_by_name[name]
    die("Unknown chunk \"#{name}\"") unless chunk
    return chunk
  end

  def Chunk.verify_all_used
    exist_unused = false
    @@chunk_by_name.keys.sort.each do |name|
      next if @@chunk_by_name[name].is_used
      $stderr.puts("Chunk \"#{name}\" was not used")
      exist_unused = true
    end
    exit(1) if exist_unused
  end

  attr_reader :name, :instances, :refers_to, :nested_in
  attr_accessor :is_used

  def initialize(name)
    @name = name
    @instances = Set.new
    @refers_to = Set.new
    @nested_in = Set.new
    @is_used = false
  end

  def add(new_instance)
    old_instance = @instances.entries.last
    abort('Chunk "' + @name \
        + '" instance at file "' + new_instance.location.path \
        + '" line ' + new_instance.location.first_line.to_s \
        + ' has different content than instance at ' \
        + 'file "' + old_instance.location.path \
        + '" line ' + old_instance.location.first_line.to_s) \
      if old_instance and not old_instance.has_same_content?(new_instance)
    @instances.add(new_instance)
  end
end

To generate a nice output and better error messages, we also need to track, for each chunk, the location(s) it appeared in the source:

Tracking locations
Location(s): uweb.rb (537 - 555)
class Location
  attr_reader :path, :first_line, :last_line

  def initialize
    @path = $in_path
    @first_line = $at_line_number
  end

  def done
    @last_line = $at_line_number
  end

  def <=>(other)
    return @path == other.path ? @first_line <=> other.first_line \
                               : @path <=> other.path
  end
end

Since a chunk can appear in multiple locations, we have a distinct notion of a chunk instance:

Tracking chunk instances
Location(s): uweb.rb (557 - 611)
class Instance
  attr_reader :chunk, :location, :content, :indentation

  def initialize(chunk)
    @chunk = chunk
    @location = Location.new
    @content = []
    @indentation = nil
  end

  def add(content)
    @content.push(content)
    return unless content.class == Instance
    @chunk.refers_to.add(content.chunk.name)
    content.chunk.nested_in.add(@chunk.name)
  end

  def end_scan
    @indentation = @content.map { |c| c.indentation }.compact.min.clone || ''
    @location.done

    @content.each do |content|
      if content.class == Instance
        content.indentation.sub!(@indentation.clone, '') \
          if @indentation.size > 0
      else
        content.chomp!
        content.sub!(@indentation, '') if @indentation.size > 0
      end
    end
    @chunk.add(self)
  end

  def has_same_content?(other)
    return false unless @chunk == other.chunk
    return false unless @location.first_line - @location.last_line \
                     == other.location.first_line - other.location.last_line
    return false unless @content.size == other.content.size
    @content.each_index do |i|
      content = @content[i]
      other_content = other.content[i]
      return false unless content.class == other_content.class
      if content.class == Instance
        return false unless content.has_same_content?(other_content)
        return false unless (content.indentation == '') \
                         == (other_content.indentation == '')
      else
        return false unless content == other_content
      end
    end
    return true
  end
end

Monkey patching

To support all the above, we monkey-patch the Array and String classes, adding useful methods to them:

Monkey patching
Location(s): uweb.rb (633 - 660)
class Array
  def map_with_index!
    each_with_index do |entry, index|
      self[index] = yield(entry, index)
    end
  end

  def map_with_index(&block)
    dup.map_with_index!(&block)
  end
end

class String
  def indentation
    return nil if self == ''
    return /^\s*/.match(self)[0]
  end

  def escape_xhtml
    return self.gsub(/&/, '&amp;').gsub(/</, '&lt;').gsub(/>/, '&gt;')
  end

  def idify
    return self.gsub(/\W+/, '-')
  end
end