This describes the internals of the uweb program. It also serves as an example on how to write files using it. See the uweb manual page for a description of the command line flags, input format, etc.
The program requires very few gems, most notably erb. It also supports being compiled into a standalone binary using rubyscript2exe.
require 'set' require 'getoptlong' require 'erb' begin require 'rubyscript2exe' rescue LoadError # Not using rubyscript2exe. end
The main program is pretty boring: parse command line, load templates, process input, done.
def main parse_options load_templates $did_embed_chunks = false process_file($input) Chunk.verify_all_used if $warnings end def die(message) abort("#{$in_path}(#{$at_line_number}): #{message}") end def parse_options $template_dir = defined?(RUBYSCRIPT2EXE) ? RUBYSCRIPT2EXE.exedir \ : File.dirname($0) $warnings = true GetoptLong.new([ '--help', '-h', GetoptLong::NO_ARGUMENT ], [ '--template-dir', '-t', GetoptLong::REQUIRED_ARGUMENT ], [ '--output-file', '-o', GetoptLong::REQUIRED_ARGUMENT ], [ '--no-warnings', '-n', GetoptLong::NO_ARGUMENT ]) \ .each do |option, argument| case option when '--help' usage when '--template-dir' $template_dir = argument when '--output-file' $stdout = File.open(argument, 'w') when '--no-warnings' $warnings = false else abort("Unknown option \"#{option}\"") end end case ARGV.length when 0 $input = '-' when 1 $input = ARGV[0] else usage end end
Since RDoc::usage is no longer with us, we provide our own version. The output is in valid AsciiDoc format and can be passed to asciidoctor to generate a manual page or an HTML file.
# Since RDoc::usage is no more. def usage print $usage exit(0) end
We let erb do the heavy lifting for us here.
def load_templates $chunk_template = load_template('uweb.chunk') $nest_template = load_template('uweb.nest') end def load_template(path) path = $template_dir + '/' + path abort("Missing template file \"#{path}\"") unless File.exist?(path) return ERB.new(IO.read(path)) end
We simply scan the input one line at a time, comparing them with the pre-established patterns.
def process_file(file) each_file_line(file) do |line| case line when $source_pattern scan_source($source_pattern.extract(line)) when $chunk_pattern embed_chunk($chunk_pattern.extract(line)) when $include_pattern process_file($include_pattern.extract(line)) else puts line end end end def each_file_line(path) $in_path ||= nil save_path = $in_path $in_path = path $at_line_number ||= nil save_line_number = $at_line_number $at_line_number = 0 in_file = path == '-' ? $stdin : in_file = File.open($in_path, 'r') in_file.each_line do |line| line.chomp! $at_line_number += 1 yield line end $in_path = save_path $at_line_number = save_line_number end Scanning sources
The patterns we look for in the input X/HTML files are:
$source_pattern = Pattern.new("<link rel='source'> tag", / < \s* link \s .* rel \s* = \s* ['"]? source ['"]? /ix, / ^ \s* < \s* link \s+ rel \s* = \s* ['"] source ['"] \s+ href \s* = \s* ['"] (.+) ['"] \s* \/? \s* > (?: \s* < \s* link \s* \/ \s* > )? \s* $ /ix) $include_pattern = Pattern.new("<a rel='include'> tag", / < \s* a \s .* rel \s* = \s* ['"] include ['"] /ix, / ^ \s* < \s* a \s+ rel \s* = \s* ['"] include ['"] \s+ name \s* = \s* ['"] (.+) ['"] \s* \/? \s* > (?: \s* < \s* a \s* \/ \s* > )? \s* $ /ix) $chunk_pattern = Pattern.new("<a rel='chunk'> tag", / < \s* a \s .* rel \s* = \s* ['"] chunk ['"] /ix, / ^ \s* < \s* a \s+ rel \s* = \s* ['"] chunk ['"] \s+ name \s* = \s* ['"] (.+) ['"] \s* \/? \s* > (?: \s* < \s* a \s* \/ \s* > )? \s* $ /ix)
We also scan any sources the X/HTML input links to:
$is_in_unscanned_chunk = false def scan_source(path) die("<link rel='source'> after <a rel='chunk'>") if $did_embed_chunks instances = [] each_file_line(path) do |line| if $end_pattern === line name = $end_pattern.extract(line) if not $is_in_unscanned_chunk die('End chunk outside any chunk') unless instances.size > 0 die('End chunk "' + name \ + '" does not match start chunk "' + instances.last.chunk.name + '"') \ if name and name != instances.last.chunk.name instances.last.end_scan instances.pop elsif name == '!uweb' $is_in_unscanned_chunk = false end next end if !$is_in_unscanned_chunk and $begin_pattern === line name = $begin_pattern.extract(line) $is_in_unscanned_chunk = name == '!uweb' unless $is_in_unscanned_chunk die("Chunk #{name} contains itself") \ if instances.any? { |instance| instance.chunk.name == name } parent = instances.last instance = Chunk.begin_scan(name) instances.push(instance) parent.add(instance) if parent end next end instances.last.add(line) if instances.last end die("Missing end of chunk \"#{instances.last.chunk.name}\"") \ if instances.size > 0 end
The patterns we look for in the source files are:
$begin_pattern = Pattern.new('Begin chunk', / (?: \{\{\{ | \# \s* region ) /ix, / ^ (?: \W* \{\{\{ | \s* \# \s* region ) \s+ ( .+? )? (?: \s* \W+ )? \s* $ /ix) $end_pattern = Pattern.new('End chunk', / (?: \}\}\} | \# \s* endregion ) /ix, / ^ (?: \W* \}\}\} | \s* \# \s* region ) (?: \s+ ( .+? ) )? (?: \s* \W+ )? \s* $ /ix)
A pattern is just a glorified regular expression. There is some trickiness in using global state to decide whether to complain or ignore "bad" lines, used to implement the special !uweb section.
class Pattern def initialize(name, detect_regexp, extract_regexp) @name = name @detect_regexp = detect_regexp @extract_regexp = extract_regexp end def ===(line) return line =~ @detect_regexp end def extract(line) match = @extract_regexp.match(line) die('Invalid ' + @name) unless $is_in_unscanned_chunk or match return match && match[1] end end
The whole point of the program is to embed chunks of sources into the generated X/HTML:
def embed_chunk(name) $did_embed_chunks = true chunk = Chunk.by_name(name) chunk.is_used = true # TRICKY: Captured by the binding. name = name = chunk.name refers_to = refers_to = chunk.refers_to.size == 0 ? nil : chunk.refers_to.entries.sort nested_in = nested_in = chunk.nested_in.size == 0 ? nil : chunk.nested_in.entries.sort locations = locations = chunk.instances.entries \ .map { |instance| instance.location } \ .sort lines = lines = \ chunk.instances.entries[0].content \ .map { |content| content.class == Instance ? nest_chunk(chunk, content) \ : content.escape_xhtml } $chunk_template.run(binding) end def nest_chunk(from_chunk, to_instance) # TRICKY: Captured by the binding. indent = indent = to_instance.indentation || '' from = from = from_chunk.name to = to = to_instance.chunk.name return $nest_template.result(binding).chomp end
To support the above, we need to track all the chunks we extracted in the source files:
class Chunk def Chunk.begin_scan(name) @@chunk_by_name ||= {} return Instance.new(@@chunk_by_name[name] ||= Chunk.new(name)) end def Chunk.by_name(name) @@chunk_by_name ||= {} chunk = @@chunk_by_name[name] die("Unknown chunk \"#{name}\"") unless chunk return chunk end def Chunk.verify_all_used exist_unused = false @@chunk_by_name.keys.sort.each do |name| next if @@chunk_by_name[name].is_used $stderr.puts("Chunk \"#{name}\" was not used") exist_unused = true end exit(1) if exist_unused end attr_reader :name, :instances, :refers_to, :nested_in attr_accessor :is_used def initialize(name) @name = name @instances = Set.new @refers_to = Set.new @nested_in = Set.new @is_used = false end def add(new_instance) old_instance = @instances.entries.last abort('Chunk "' + @name \ + '" instance at file "' + new_instance.location.path \ + '" line ' + new_instance.location.first_line.to_s \ + ' has different content than instance at ' \ + 'file "' + old_instance.location.path \ + '" line ' + old_instance.location.first_line.to_s) \ if old_instance and not old_instance.has_same_content?(new_instance) @instances.add(new_instance) end end
To generate a nice output and better error messages, we also need to track, for each chunk, the location(s) it appeared in the source:
class Location attr_reader :path, :first_line, :last_line def initialize @path = $in_path @first_line = $at_line_number end def done @last_line = $at_line_number end def <=>(other) return @path == other.path ? @first_line <=> other.first_line \ : @path <=> other.path end end
Since a chunk can appear in multiple locations, we have a distinct notion of a chunk instance:
class Instance attr_reader :chunk, :location, :content, :indentation def initialize(chunk) @chunk = chunk @location = Location.new @content = [] @indentation = nil end def add(content) @content.push(content) return unless content.class == Instance @chunk.refers_to.add(content.chunk.name) content.chunk.nested_in.add(@chunk.name) end def end_scan @indentation = @content.map { |c| c.indentation }.compact.min.clone || '' @location.done @content.each do |content| if content.class == Instance content.indentation.sub!(@indentation.clone, '') \ if @indentation.size > 0 else content.chomp! content.sub!(@indentation, '') if @indentation.size > 0 end end @chunk.add(self) end def has_same_content?(other) return false unless @chunk == other.chunk return false unless @location.first_line - @location.last_line \ == other.location.first_line - other.location.last_line return false unless @content.size == other.content.size @content.each_index do |i| content = @content[i] other_content = other.content[i] return false unless content.class == other_content.class if content.class == Instance return false unless content.has_same_content?(other_content) return false unless (content.indentation == '') \ == (other_content.indentation == '') else return false unless content == other_content end end return true end end
To support all the above, we monkey-patch the Array and String classes, adding useful methods to them:
class Array def map_with_index! each_with_index do |entry, index| self[index] = yield(entry, index) end end def map_with_index(&block) dup.map_with_index!(&block) end end class String def indentation return nil if self == '' return /^\s*/.match(self)[0] end def escape_xhtml return self.gsub(/&/, '&').gsub(/</, '<').gsub(/>/, '>') end def idify return self.gsub(/\W+/, '-') end end