Thursday 5 January 2012

Building Programs with Lake

Carpenters Bitching About Tools

One of the irritating things about programming is the set of available tools for building software. A lot of the appeal of dynamic languages comes from the simple fact that you don't build, you just run. A great deal of the fuss is accidental complexity, which is Fred Brooks' term for the gap between the complexity of the task and the actual complexity of the solution.

In the begining, there was Stu Feldman's make. People have subsequently wondered what deep reason there was behind the need for tabs, but the referenced quote from The Art Of Unix Programming shows us that it was an unhappy accident:

Why the tab in column 1? Yacc was new, Lex was brand new. I hadn't tried either, so I figured this would be a good excuse to learn. After getting myself snarled up with my first stab at Lex, I just did something simple with the pattern newline-tab. It worked, it stayed. And then a few weeks later I had a user population of about a dozen, most of them friends, and I didn't want to screw up my embedded base. The rest, sadly, is history.

make is a fantastic tool, firmly in the Unix tradition of doing one thing well. It runs commands when any of their inputs are more recent than their output. Its power comes from all the other powerful little tools that come with a traditional Unix environment. (This naturally becomes an issue on Windows, where you basically have to mimic that environment closely enough, for instance MSYS.) Its pain comes from having to cope with all the weirdness of different systems, even within the POSIX world, with tools of inadequate expressiveness and power. In other words, it's a bad programming language.

There are many alternative build systems which came out of this frustration. And an entertainingly profane plea on Reddit to simply stop building new ones.

Human nature being what it is, it did not stop me. I have no particular great hopes of it gaining any fans, but Lake does provide a working example of how Lua is suited for embedded Domain Specific Languages (DSLs).

Using Lake to cope with the C

Say we have the canonical first program, hello.c. Running the lake command works as expected; there must be a lakefile:

 c.program 'hello'

and then

 $> lake
 gcc -c -O1 -Wall -MMD  hello.c
 gcc hello.o  -o hello.exe

Running lake again will give the message 'lake: up to date'. If hello.c changes (or we deleted hello.o) then things will rebuild.

This seems fairly underwhelming at first, but then we knew this was a trivial program to build in the first place. Now, if Lake finds the Microsoft command-line compiler cl.exe on the path, then it changes its tune:

 $> lake
 cl /nologo -c /O1 /WX /showIncludes  hello.c
 link /nologo hello.obj  /OUT:hello.exe

(This is what will happen on Windows if you execute Lake inside a Visual Studio command Prompt)

The whole idea about lakefiles is that they express the build on a higher level, and let the tool decide on the incantation. This is particularly useful if you aren't familiar with cl.exe, for instance.

But there is more. This simple lakefile provides:

  • cross-platform, compiler-agnostic builds (On Unix it knows to drop off the '.exe')
  • an automatically generated 'clean' target, so lake clean will do its job
  • a debug build by saying lake -g or lake DEBUG=1

I do work on embedded Linux sometimes. If I wanted my hello to work on a Gumstix then the incantation would simply be lake PREFIX=arm-linux and the correct compiler and linker would be invoked.

OK, let's get more fancy. The hello program has a second file utils.c and a shared header common.h. The lakefile now looks like this:

 c.program{'hello', src = 'hello utils'}

Please note the curly braces: program is a function of one argument, which is here a Lua table constructor. You can put parentheses around the table, but it isn't required. The sources are provided as a simple space-separated list of names; Lake already knows that the extension must be .c.

This build works as desired; if any of the two C files change then they will be recompiled, and the program linked. Lake knows that hello.exe depends on hello.o and utils.o, and it knows that these in turn depend on the corresponding source files. But it even knows that compiling the source files depends on the shared header - so that editing common.h (or just updating its timestamp with touch) will cause both of them to be rebuilt. Both of these compilers can be told to show what include files they depend on, using the -MMD and /showIncludes flags respectively. Lake uses this output to add extra dependencies to the compile rule for the files. Managing the dependencies manually is irritating, and easy to get wrong.

Say utils.c had a reference to sqrt. The lakefile should now be:

 c.program{'hello', src = 'hello utils', needs = 'math'}

Now for Unix builds, the math library will be linked in with -lm; on Windows, this is unncessary since the runtime already includes the math library.

Everyone has Needs

Lake uses this idea of needs for builds to specify their requirements on a higher level.

pkg-config is a marvelous utilty that provides exactly what is required here. Unfortunately, it is not used widely (or consistently) enough to be a one-stop shop for providing the gory details about every library. But Lake will try to use pkg-config if it is available to match needs. So a simple GTK+ C program can be built like so:

 c.program{'button',needs = 'gtk+-2.0'}

Lake resolves needs like this: first, whether it is built-in (like 'socket' or 'lua'), second, whether there are pkg-config aliases like 'gtk' available, and third, whether suitable global variables have been defined. So in resolving the unknown need 'foo' Lake will see if the globals FOO_INCLUDE_DIR, FOO_LIB_DIR and FOO_LIBS have been defined and point to existing directories. (Thereafter, it will try pkg-config.)

A lakefile is ultimately just a Lua script, and can have code that sets these variables explicitly.

 FOO_LIBS = 'foo3'
 if WINDOWS then
     FOO_DIR = 'c:\\foolib'
 else
     FOO_INCLUDE_DIR = '/usr/include/foo3'
 end

Explicit Rules: running Tests

Say I have a little Lua C extension. That's straightforward because Lake knows about Lua:

 mylib = c.shared{'mylib',needs = 'lua'}

Now I wish to run some Lua test files against the generated mylib.so or mylib.dll. For this, we make an explicit rule that connects Lua source files with an output file like so:

 lt = rule('.lua','.output','lua $(INPUT) > $(TARGET)')
 // populate the rule with targets; it depends on mylib
 lt ('test/*',mylib)
 // the default target depends on both the library and the test targets
 default{mylib,lt}

One important take-home here is that Lake works with targets in a very similar way to Make; the first target defined in a lakefile becomes the default, but if there are multiple targets then we have to define a dummy target that depends on these targets.

Now, maybe there is also a requirement that tests can always be run directly using lake tests. So we have to create a target dependent on the test targets, which first resets the tests by deleting the fake targets:

 target.tests {
   action(utils.remove, '*.output'),
   lt
 }

Depending on an unconditional action does the job. (However, this is not entirely satisfactory, since in an ideal world the order of dependencies being resolved should not matter, but this will do for now.)

Making the World a Better Place, one Semicolon at a Time

I remember an entertaining article by the famous Verity Stob on what to do when encountering C++ errors. One of the options was to write a Perl script to unmangle the errors so that they could be read by humans. She was writing satire, but like most good humour it was more than just a joke.

For instance, here is a wrong C++ program. Not terribly wrong, in fact almost competent:

 // errors.cpp
 #include <iostream>
 #include <string>
 #include <list>
 using namespace std;
 int main()
 {
   list<string> ls;
   ls.append("hello");
   cout << "that's all!" << endl;
   return 0;
 }

The response is pretty scary:

 errors.cpp:9: error: 'class std::list<std::basic_string<char,
 std::char_traits<char>, std::allocator<char> >,
 std::allocator<std::basic_string<char, std::char_traits<char>,
 std::allocator<char> > > >' has no member named
 'append'

Seasoned C++ programmers learn to filter their error messages mentally, but this is the kind of initial experience that drives kids to sniffing glue.

lake provides the ability to filter the output of a compiler, and reduce irrelevant noise. Here is the lakefile:

 if CC ~= 'g++' then quit 'this filter is g++ specific' end
 lake.output_filter(cpp,function(line)
   return line:gsub('std::',''):
     gsub('basic_string%b<>','string'):
     gsub(',%s+allocator%b<>',''):
     gsub('class ',''):gsub('struct ','')
 end)
 cpp.program {'errors'}

And now the error is reduced to:

 errors.cpp:9: error: 'list<string >' has no member named 'append'

And another case of rampant template trickery gone bad has been tamed, and our hypothetical beginner gets to Nirvana quicker.

This was, incidently, an accidental feature. I needed to parse the output of cl.exe to get the header dependency information (it is not written to a .d file like with gcc) so a postprocessing hook was needed.

What Next?

Naturally, this is not a new idea in the Lua universe. PrimeMover is similar in concept, and also has a bootstrap stage to construct a completely self-contained interpreter, which is definitely a strategy worth emulating.

I haven't dealt with topics like dependency-based programming because this is not intended as a manual (which is to be found here) This article is more about showing the advantages of a higher-level, needs-based build system based on a real programming language, which is compact enough that a fully self-contained Lake executable would be less than 300K.

A number of kind people have pointed out that 2,500 lines of code is a bit much for a single script, which is true, and of course I know better. Unfortunately I have too many projects and they keep me awake at night, demanding to be fed; the next evolution of Lake will have to take its turn.

A single file does make installing Lake easier; it just needs Lua and LuaFileSystem (known as the lua5.1 and liblua5.1-filesystem0 packages in the Debian/Ubuntu world) and for the script to be made executable and put on the path. If you have installed LuaRocks (also available on Debian/Ubuntu) then installing Lake is as simple as sudo luarocks install lake.

The priority is a sound system that is flexible enough to meet working programmer's needs, to get the right balance between declarative/dependency-driven and imperative. It is already possible to provide new needs for Lake by defining Lua modules that look like 'lake.needs.NAME', which can then be easily installed by LuaRocks or some more ad-hoc delivery system.

Using all the processing cores that developers have available is also a priority, which requires some interesting work in a language that does not do the necessary concurrency out of the box. The best cross-platform candidate would be Lua Lanes which provides a non-shared concurrency model with explicit data messaging using 'Lindas'.