Imaginary/Real: Documenting Lua

Why is Documentation so Lacking?

As Barbie the Software Engineer might say: Documentation is Hard. It's not intrinsically fun like coding, and often gets written badly as an afterthought. In open source projects, where fun is so often the guiding principle, it can get badly neglected.

There are basically two kinds of comment; one that is aimed at other developers of the code and the other which is aimed at the users of a library. The latter is particularly hard and important, since we are building abstractions for other people to use, and if they have to actually read the code to work out how to use the library, that abstraction has broken. This is a common problem with language bindings to existing libraries; they will be announced as a 'thin wrapper around libfoo' and it's assumed that you can work out how to use the bindings from the libfoo documentation or from libfoo examples in C. Well, not everyone can (or needs) to read C well enough to make that practical. We know the classic rebuke, "RTFM!" but that assumes that there is a manual - it's often "RTFS!" these days.

Testing has become a minor religion in the last ten years, and mostly it's a good thing, especially when using dynamic languages. Tests are supposed to be the best kind of documentation, because they can automatically be shown to be correct. But if the scope of the tests is too narrow, then they can be harder to read than the source. That is, also write examples that can serve as tests.

We cannot ascribe poor documentation to laziness, since these projects are otherwise well-constructed and may have exhaustive tests. The developers clearly don't think documentation correlates with wide use by the community; perhaps communication is not a priority. After all, research mathematicians write papers and do not immediately think how to explain it to undergraduates; they are writing for peers.

This might seem an indirect criticism, but it isn't intended as such. (I resisted the urge to point to some representative Github projects for this reason.) The Open Source universe needs all kinds, from researchers to popularizers. In the 'social coding' universe, fork has become a verb of respect, so documenting an existing project and doing a pull request is a polite way to contribute to the usefulness of a project.

You might think that elaborate conventions and markup could get in the way of documentation. The Go standards simply require you to preface the package and any exported items with a plain comment; just about the only convention is that indented lines are shown preformated. And still those plain comments are often not written. Part of the problem is a variant on the "increment i by one" kind of comment, which is just an awkward paraphrase of the code. So a Stack.Push method might end up with the blindingly obvious "Push an element onto the stack". Anybody with any self-insight is going to find that silly, and so they don't do it. I have noticed that coding libraries and documenting their interfaces are modal activities; it's useful to approach documentation with a fresh head (much as how writing and proof-reading are separate but essential steps in preparing an article.)

The key thing is to care enough to return to the code with "beginner's mind", and finish the job.

The xDoc Tradition of API Documentation

APIs are rarely as self-documenting as their creators might think. Javadoc is probably the grand-daddy of a per-function style of API documentation, and it has inspired numerous xDoc tools for other languages. Here is an example of LuaDoc markup:

 --- foostr stringifies a list of Foo objects.
 -- @param foolist a list of <i>Foo</i> objects
 -- @param delimiter string, as in table.concat
 -- @return a string
 -- @see Foo, table.concat
 function foostr(foolist,delimiter)

Limited code inference occurs, so that we do not have to specify the function name, but otherwise everything is explicit. Dynamic languages pose a particular challenge here because we're forced to specify the parameter types in an informal way. Projects may define their own conventions, but there's no standard way to write 'list of Foo'. You may use see-references to other documented entities but they have to be in a @see section.

People have recognized for a while the limitations of this way of documenting APIs, which focuses on the individual pieces and can be silent on how to put them together, but it's much better than nothing. So the debate is often about making these comments easier to write. For instance, in a comparison between Doxygen and JavaDoc, Gushie says:

The most important problem with the Javadoc comment in the comparison is how much I need to concentrate on formatting issues while writing it. When writing Javadoc I am constantly thinking about what should and shouldn't be linked, whether the list will look right, etc. This is frustrating because, while I do want to document my code well I also want to focus on coding.

And that's it: a tool that forces you out of code flow is not going to make you happy about documentation.

One of the problems I have with LuaDoc is that HTML is awkward to type and ugly to look at, especially when you want to provide something as simple as a list. So one of the main requirements was that Markdown could be used instead. I try to be a good boy and not reinvent too many wheels, but LuaDoc proved fairly resistant to this change, since it helpfully stripped out the blank lines and line feeds from the comments at an early stage. But the tipping point was when I moved the Penlight codebase away from using the module function; LuaDoc loves the module function and it's often necessary to put it in a comment so that it will recognize something as a module. So ultimately the issue was that LuaDoc wanted code to be written in a particular way; futhermore, we all know that 'Lua does not have classes' but it's straightforward to work in an OOP style; again, there was no provision for such an choice. The guiding principle should be: a documentation tool should not limit a developer's choices.

So, wheel-reinvention happened, leading to LDoc. In a few days, I had something that did the job equally well, for myself. Seasoned campaigners will know of course that this is the easy bit; the hard bit is getting a tool that will work on other people's codebases. Thanks to some painstaking testing by Lorenzo Donatti and others, LDoc is now at the point where it's ready for prime time testing.

The lack of inline entity references in LuaDoc means that your references have to be footnotes to your comments. This style felt better:

 --- foostr stringifies a list of Foo objects.
 -- @param foolist a list of @{Foo} objects
 -- @param delimiter string, as in @{table.concat}
 -- @return a string
 function foostr(foolist,delimiter)

Using Markdown helps formatting lists:

 --------------
 -- creates a new @{Boo} object.
 -- @param spec a table of:
 --
 --  - `age` initial age of object (optional, defaults to 0)
 --  - `name` descriptive name used in caption
 --  - `foolist` list of @{Foo}
 --
 -- @return @{Boo}
 function Boo:create(spec)

(The blank lines are somewhat irritating, but I chose eventually to work with standard Markdown here; it is certainly less awful than the equivalent HTML.)

This kind of 'ad-hoc' table structure is a common pattern in Lua, and I've argued before that it's a good idea to have named types so that documentation can refer to them. An in-between option is to create a dummy type which just exists for documentation purposes.

Plain and Simple

Not everyone likes this 'tag soup' style of documention. For instance, this is the style followed in the Lua manual:

 ---
 -- Receives zero or more integers. Returns a string with length equal to
 -- the number of arguments, in which each character has the internal numerical
 -- code equal to its corresponding argument.
 -- Note that numerical codes are not necessarily portable across platforms.
 function string.char(...) end

This can be very effective with documenters who have a good clear prose style.

LDoc can support this by making the usual 'Summary' and 'Parameters/Returns' sections in the HTML template optional.

It was always awkward to configure LuaDoc, so I felt that LDoc needed a configuration file. As is often the case, Lua itself is an excellent format for such files.

 -- config.ld for Lua libraries
 project = 'Lua'
 description = 'Lua Standard Libraries'
 full_description = [[
 These are the built-in libraries of Lua 5.1

 Plus documentation for lpeg and luafilesystem.
 ]]
 file = {'builtin',exclude = {'builtin/globals.lua'}}
 format = 'discount' -- use lua-discount for formatting
 -- switch off the top summary and 'Parameters/Returns' in the template
 no_summary = true
 no_return_or_parms = true

(You can in fact even define your own template and/or stylesheet and reference them in the config.ld file.)

And here is the result, thanks to the API files from mitchell's TextAdept editor project.

The principle followed here is: allow the documenter as much control as possible, but make it straightforward for end-users to build the documentation. So a simple invocation of 'ldoc .' in the right directory will find the configuration file and do the rest.

Type Annotations

On the other end of the scale, it's useful to have type annotations for Lua functions. This is a requirement for full IDE support in the Eclipse Koneki project. Fabien Fleutot provided support for tag modifiers, so that one could say things like @param[type=string] name. I have provided convenient alias tags so one can say:

 --------------
 -- list available servers providing a service
 -- @tparam string service name
 -- @treturn {Server,...} list of servers
 function list_available(service)
 ...
 end

Beyond the built-in Lua types ('string','number',etc) there is no agreed-upon way to specify types, so some invention is necessary. The convention is that {T,...} is a list-like table containing T objects; (see the documentation for further discussion.) LDoc will attempt to do lookup on all identifiers within such a type specification, so that Server becomes a link to that type's definition.

Examples, Readmes and Command-Line Help

Per-function API documentation is often not enough. In particular, good examples can clarify the intended use, and narrative documentation can introduce the concepts and fill in the background. Complementary 'modalities' of explanation and extra redundancy allow end-users of different backgrounds and thinking styles to have various ways of understanding the topic.

So LDoc allows integrating the API documentation with examples and Markdown text as linked documentation. The best example I currently have is the winapi documentation. This is actually a C Lua extension, which is another LDoc feature. Here is the config.ld:

 file = "winapi.l.c"
 output = "api"
 title = "Winapi documentation"
 project = "winapi"
 readme = "readme.md"
 examples = {'examples', exclude = {'examples/slow.lua'}}
 description = [[
 A minimal but useful binding to the Windows API.
 ]]
 format = 'markdown'

Those of us who live in the command-line like to look up documentation quickly without messing around with a browser. Generating man-pages is an interesting and very do-able goal for LDoc, but man is showing its age and is pretty Unix-centric. Some great discussions with Norman Clarke inspired an LDoc feature where it looks up libraries on the module path and parses doc information.

So if you have an installed library with documentation, then it's easy to look up a particular function:

 $ ldoc -m pl.pretty.dump

 function        dump(t, ...)
 Dump a Lua table out to a file or stdout.

 t        {table} The table to write to a file or stdout.
 ...      {string} (optional) File name to write to. Defaults to writing
                 to stdout.

Thanks to mitchell's work at luadoc-ifying the Lua manual, this works for the standard Lua libraries (plus lfs and lpeg) as well:

 $ ldoc -m lfs.currentdir

 function        currentdir()
 Returns a string with the current working directory or nil plus an error
  string.

(ldoc -m lfs would list all the available functions.)

Further Work

It's possible to process the inner representation generated by LDoc using a custom Lua module, allowing multi-purpose pipelines. We're still working on use cases for this, but the basic machinery is there.

Other output formats could be supported; the program is meant to be extendable and I think generating LaTeX could be an excellent way of getting good-quality output.

Lua is surprisingly flexible at expressing concepts like modules and classes, so simple lexical code inference is probably not going to scale. Eventually we will have to use full static code analysis like David Manura's LuaInspect.

Imaginary/Real

Wednesday, 21 December 2011

Documenting Lua