Monday, 5 September 2011

Htmlification with Lua

Most developers hate typing HTML; we know it well, but will go on elaborate detours to avoid actually writing it.

The normal way to generate HTML dynamically is to use some template engine. Like any modern language, Lua has a number of these, reflecting the fact that template syntax is a matter of taste; Cosmo is probably the most well thought-out one. However, there is another approach which is to use the flexible data syntax of Lua itself to generate HTML.

The word htmlify is used by the Orbit web framework to describe generating HTML using Lua code. This example is taken from the LuaNova introductory article:

 -- html1.lua
 function generate()
     return html {
         head{title "HTML Example"},
             h2{class="head","Here we go again!"}
 C:\basic>lua html1.lua
 <html ><head ><title >HTML Example</title>></head>
 <body ><h2 class="head">Here we go again!</h2></body></html>

You will notice that this script has no declarations for tags like html, and the orbit module does not define them either: instead, the function environment of generate is modified with htmlify so that any unknown symbol is converted into a function which generates the text for an HTML element. Again, Lua's flexible table syntax is used to its best advantage; we don't need parentheses for functions that have a single string or table argument, and these tables can have both array-like and hash-like sections. Any key-value pair becomes an attribute of the element, and the array items become the contents of the element.

There are two obvious issues with this implementation; first, it will cheerfully generate HTML with misspelt tags, and second, the output is not very readable, and needs to go through a prettifier.

Another way of getting the same results is to treat (X)HTML as XML and use XML tools to generate and pretty-print it. The LuaExpat binding defines a standard called LOM for expressing XML as Lua data structures:

 <abc a1="A1" a2="A2">inside tag `abc'</abc>

is expressed as:

     attr = {
         [1] = "a1",
         [2] = "a2",
         a1 = "A1",
         a2 = "A2",
     tag = "abc",
     "inside tag `abc'"

LuaExpat does not provide a pretty-printer, so LuaExpatUtils came into being, adapted from stanza.lua from the Prosody IM server by Mathew Wild and Waqas Hussain. That's the joy of Open Source; with a little care about licensing and giving everyone their due, there is rarely need to write library code ab initio.

LuaExpatUtils provides a single module `lxp.doc':

 local doc = require 'lxp.doc'
 local lom = require 'lxp.lom'
 local d = lom.parse '<abc a1="A1" a2="A2"><ef>hello</ef></abc>'
 print(doc.tostring(d,'','  '))
 <abc a1='A1' a2='A2'>

It also provides what we might call 'xmlification', directly inspired by Orbit:

 local children,child,toy,name,age = doc.tags 'children,child,toy,name,age'
 d2 = children {
     child {name 'alice', age '5', toy {type='fluffy'}},
     child {name 'bob', age '6', toy {type='squeaky'}}

The key difference in usage is that there is no function environment magic going on; the doc.tags function gives us an explicit list of tag constructor functions. The big implementation difference is that d2 is a LOM document, not a string, and so we can flexibly convert into good-looking XML text later.

Here, I'm going to be less strict, and do the kind of global namespace modification which should always come with a mental health warning. It's just as easy to 'monkey-patch' Lua as it is Ruby, but the community feels it should not be used by libraries. Any global modification changes the environment that all code sees, and breaks the fragile set of expectations that users of the language share. And considering the maintainance nightmares that Ruby programmers routinely inflict on each other, this seems a sensible attitude.

But here I just want to write standalone Lua scripts that generate HTML without too much ceremony:

 doc = require 'lxp.doc'
   __index = function(t,name)
     _G[name] = doc.tags(name)
     return _G[name]

Everything is stored in tables in Lua, and _G is the value of the global table. We want to catch any undefined global symbols, and this is exactly what the __index metamethod does for us here. Any undefined name such as 'div' will be made into a tag constructor, which we store back in the global table so that we don't create a new instance each time (that is, once 'div' has been encountered, this metamethod will not be fired for any subsequent div) That's basically it; (there are some details, of course; table and select are existing Lua globals, for instance, so we explicitly pull these tags in as table_ and select_)

 -- test2.lua
 require 'html'
 html {
     body { title 'Test' },
     h2 'title',
     p 'first para',
     p 'second para'

html is specialized; it constructs the html tag but also writes out the pretty-printed to a file called test2.xml. Findiing the name of the current script is straightforward; this is passed as as the zero-th element of the global arguments array arg; the name can then be extracted from the path.

 if arg then -- not interactive
   name = arg[0]:match('([%w_%-]+)%.lua')

And test2.html looks like this:

  <p>first para</p>
  <p>second para</p>

You may consider a little script like this to be another form of HTML template. It is less redundant and easier to type, and editor-friendly (editors like SciTE are good at finding matching braces). It is trivial to parameterize this code. And (nota bene) we have the full power of the language available to make shortcuts for common constructions.

For instance, HTML lists are ubiquitous and follow a very simple pattern:

 function list (t)
     for i = 1,#t do
         t[i] = li(t[i])
     return ul(t)

So list{'one','two','three'} will make an unordered list.

Obviously this is a job for a library, and Orbiter provides orbiter.html that gives us more sophisticated versions of the above function and others like HTML tables. But that project is sufficiently interesting to need its own article.

The code for html.lua is here.