Saturday, 16 May 2015

shmake - A Shell-based Build Tool

Beyond Make

Make is a marvelous tool in the Unix tradition: it does its one thing well. But modern software leans on build tools much more heavily, for conditional configuration and deployment, as well as for only rebuilding files when the files they depend on change. If Make was sufficient, then it would not be constantly being reinvented.

Make has always assumed that all the other Unix tools would be available, which makes cross-platform hard. So other build systems have appeared, either native build file generators (like CMake generating makefiles and MSBuild files) or directly tracking dependencies and running the compiler, like SCons or my own Lake.

shmake does not try to be cross-platform; its expressive power comes from the Unix shell. In fact, shmakefiles are shell scripts, except they are not executed standalone - rather they are run from within shmake itself. What it does is use shell as its DSL for expressing builds, rather than extending Make with shell-like constructs (like GNU Make) or inventing its own language (like CMake).

Assume a program consists of two source files and a header; here is a shmakefile for building it:

 . /tmp/

 COMPILE='gcc -c @(INPUT) -o @(TARGET)'
 LINK='gcc @(DEPS) -o @(TARGET)'

 T hello.o hello.c common.h "$COMPILE"
 T common.o common.c "$COMPILE"
 T hello hello.o common.o "$LINK"
 all hello

shmake thinks very much like Make; there are targets, but they're explicitly indicated by T: the target hello.o is the result of applying a compile command to hello.c, with common.h as an extra dependency. The object files are linked to give a target hello, and finally there's a special 'all' target which depends on hello. The commands use @() for variable expansion; for the first target, TARGET is hello.o, INPUT is hello.c and DEPS would be 'hello.c common.h'. Once the shmakefile is run within shmake, the target dependencies are checked and the tools invoked when a target is older than any of its inputs.

By itself, this isn't an improvement over Make. In particular, manually tracking dependencies (that hello.o depends also on common.h) is tedious and hard to get right. The point of this "hello build' is to emphasize that shmake is not limited to building programs, and is good for general dependency-based programming.

Convenient Compilation

For this simple program, the following shmakefile will do all the above, and more:

 simple$ cat shmakefile
 . /tmp/

 C hello hello.c common.c

 simple$ shmake
 compiling common.o
 compiling hello.o
 linking hello
 simple$ touch common.h
 simple$ shmake
 compiling hello.o
 linking hello

Note that shmake has detected the dependency of hello.c on common.h!

By default, shmake is not a chatty program (and the -q flag will make it even less chatty). To see the executed commands, use -v for verbose:

 simple$ shmake clean
 simple$ shmake -v
 gcc -c -Wall -MMD -O2 common.c -o common.o
 gcc -c -Wall -MMD -O2 hello.c -o hello.o
 gcc common.o hello.o -Wl,-s -o hello
 simple$ shmake clean
 simple$ shmake -v -g
 gcc -c -Wall -MMD -g common.c -o common.o
 gcc -c -Wall -MMD -g hello.c -o hello.o
 gcc common.o hello.o  -o hello

These are fairly generic flags (never compile C without -Wall; C's warnings are what most other languages would consider errors!) except for -MMD. This asks gcc to write a .d file, which shmake reads to build up target dependencies.

 simple$ cat hello.d
 hello.o: hello.c common.h

So 'C hello hello.c common.c' is a useful shortcut - under the hood, it expands to exactly the first explicit version. Note that since shmake is tracking all created targets, it knows how to construct a default 'clean' target, and the '-g' flag causes a non-optimized debug build to be made. If you pass a target name with extension .a, then C will build a static language; if you pass .so, it will build a shared library.

The implicit FAQ at this point might be:

  • OK, that's convenient. But can these shortcuts be composed?
  • This is a trivial program. How to make it link against libraries?
  • What if I have my own special flags?

First, they are composable because they create targets, and targets can be made to depend on them, just as in Make.

 C first first/*.c
 C second second/*.c
 all first second

Second, the 'C' command supports the usual -I, -L -l flags which are passed to the compile or the link stage. But the 'S' command is more flexible and easier to read. 'S cflags' and 'S lflags' can be used to set custom compiler and link flags, which answers the third point. 'shmake CC=clang' works as expected - name-value pairs become environment variables passed to the script.

Here is a complete shmake file for building Lua - compare to the original makefile.

 . /tmp/

 S exports true
 S libs readline
 S libs m

 echo "Building Lua for $PLAT platform"

 case $PLAT in
     S defines LUA_USE_LINUX
     S defines LUA_USE_LINUX
     S libs dl
     S defines LUA_ANSI
     S defines LUA_USE_POSIX
     S libs dl
     S defines LUA_USE_MACOSX
     echo "unsupported platform $PLAT. One of freebsd,Linux,Darwin,posix,ansi,solaris"
     exit 1

 C $LUA_LIB *.c -x 'lua.c luac.c'
 C lua lua.c $LUA_LIB
 C luac luac.c $LUA_LIB

 all lua luac

This is definitely easier to read than the original. One reason is the 'C' commands support wildcards, and the very useful -x,--exclude flag. Another is how the 'S' command makes conditional use of libraries and includes easier. The original makefile had to shell out make for each case; here we actually have a case statement since we have a real scripting language.

It is straightforward to get a debug build, just clean and use -g. It is also easier to modify. If the first line is replaced with:

 if [ -n "$BUILD_SO" ]; then
     S exports true

then 'shmake BUILD_SO=1' will make you a shared library. (It will fail to build luac, which needs to be statically linked, but everything else works.)

Scripting with C

Our computers have been steadily getting more memory and faster I/O, even if individual core speeds have stalled. However, C has not been getting bigger. So on modern machines, C programs compile & link very quickly, even without tcc. Of course tcc is convenient because it can skip the irritating compile/link cycle. There is still that irritating boilerplate before you can write code. When testing llib, I found this shmakefile useful:

 c-script$ cat shmakefile
 . /tmp/

 # simple C script!
 S quiet true


 T  $cfile $sfile "
 cat >@(TARGET) <<block
 #include <stdio.h>
 #include <stdlib.h>
 #include <llib/all.h>
 #include <llib/template.h>

 int main(int argc, char **argv)
 #line 1 \"@(INPUT)\"
 cat @(INPUT) >> @(TARGET)
 cat >>@(TARGET) <<block
     return 0;

 # llib needs C99 support
 C99 -g $P $cfile -n llib

 T all $P "./$P $A"

 c-script$ cat hello.s
 FOR (i,argc) printf("hello, world %s\n",argv[i]);
 c-script$ shmake P=hello A='one two three'
 hello, world ./hello
 hello, world one
 hello, world two
 hello, world three

Again, the shell script is doing all the work - 'hello.c' is a target which is generated by inserting 'hello.s' into some boilerplate. The compile step then creates the target 'hello' from 'hello.c', and finally the default target 'all' causes the program to be compiled and run. On this modest laptop, which most of you would scorn as a development platform, this compiles and executes practically instantly.

What is this '-n llib'? It means that the program needs llib. This is a useful concept borrowed from Lake. shmake resolves the need 'llib' by checking in turn if 'llib.need' or '~/.shmake/llib.need' exist. A need file is a simple config file that allows for expansions in the style of pkg-config's .pc files:

 c-script$ cat llib.need
 libs=-L${prefix}/llib -lllib

If shmake cannot find llib.need, it invokes pkg-config.

To make this accessible everywhere, you could have a wrapper called 'crun' that looks like 'shmake -f ~/mystuff/cscript/shmakefile P=$1' and ensure that llib.need sits in ~/.shmake, with actual prefix of installed llib.

Needs are a useful solution to finding libraries, in the same style as pkg-config, but not requiring it to be present. (pkg-config is not universal, and many packages forget to use it even if it's around.) Even within the Linux ghetto, packagers cannot agree where to put things. For instance, on Debian the tcl headers are in /usr/include/tcl8.6, the Lua headers are in /usr/include/lua5.2; other distros may simply drop them in /usr/include or elsewhere.

Short and Sweet

shmake came out of a need to scratch an itch ('why is building software still so irritating?') and the need to test llib. To make things easier, llib is bundled with the tarball. There is no special install required - just copy shmake onto your path. It was written using the time-honoured Barbarian-in-a-Hurry (BIAH) methodology so there isn't much care about memory leaks. It may occaisionally try to eat your homework.

If nothing else, it is a concrete example of using shell as an embedded scripting language. Which is the the point I want to emphasize; these are shell scripts. There are many good resources on how to learn shell, and shell has many uses outside the niche of building, unlike gmake. If you have limited time to master things (and this is usually true these days) then it is better to master shell than gmake, precisely because it is more universally useful, and shmake makes that choice easier.

Monday, 4 May 2015

Pragmatic Modern C++ Survival Guide

Compiling: Brute Force Helps

C++ has a reputation for being the slowest mainstream language to compile, leading to light sabre office duels. A certain creative boredom can set in, when people start chasing the dream of the Suceessor to C++: "The creation myth of Go is something like this: Rob Pike, Ken Thompson, and Robert Griesemer were waiting for a particularly long C++ build to take place when they decided to theorize a new language" Zack Hubert. It was also a reason why I blew several years of my life on an interactive C++ interpreter.

The fact is, that C++ compilers are not slow. It is the incredible amount of inlined code that the compiler must digest before it can give you the proverbial Hello World application. The six or so lines of code pull in more than 17,000 lines of include files. The culprit here is actually C++ hanging onto an outdated compile/build cycle that was already quaint in 1973.

So the best way to improve build times is to get the best system you can afford, with particular attention to fast disk I/O, like keeping the compiler with its headers and libraries on a SSD. And remember that you will pay for heavily optimized builds; better to test unoptimized with debug information, since then your crash traces can make sense in a debugger. But mainly, the less included code a file needs to bring in, the better.

Embrace the Future

As the quote goes, the future is already here, it's just badly distributed. The modern standard (2011) is only now being widely available on all platforms. On Windows, Visual Studio 2013 is pretty much there and the Community Edition is available free with fairly liberal licensing; GCC 4.9 is available from the excellent TDM-GCC project. On Linux, Ubuntu 14.04 has GCC 4.8, and Clang 3.4 is easily available from the repositories; on OS X Clang is now standard.

I mention these choices because it's good to get a second opinion if you're struggling with weird errors; in particular, Clang gives better diagnostics than GCC, even if you would still be using GCC as your 'production' compiler. C++ is a standardized language with a standard library, and you can use that fact.

The new language is much more fun. The standard way to iterate over a container was:

 for (vector<int>::iterator it = vi.begin(); it != vi.end(); ++it) {
     int ival = *it;

Now there is auto, much relief.

 for (auto it = vi.begin(); it != vi.end(); ++it) {
     int ival = *it;

And finally, range-based for-loops.

 for (auto ival: vi) {

You can even do this, thanks to std::initializer_list:

 for (auto i : {10,20,30,40}) {
   cout << i << endl;

And the cost? Totally nada - the short form is completely equivalent to the first verbose form, and will work with any type that defines begin and end.

This new form of for loop is interesting, because it's the first piece of basic C++ syntax to be based on the underlying standard library. A big step, because the language does not provide batteries, just the means to create batteries. std::string is not hard-wired into the language, neither is std::map. The Go language decided to implement them in as primitives, just as Java allowed System.String to participate in operator overloading, which is otherwise not allowed. Both of these languages are reactions to C++, which achieve simplicity at the cost of generality. The unusually awkward error messages generated by C++ stem mainly from this not-baked-in philosophy, rampant function overloading, plus a possibily excessive reliance on generic programming.

auto represents a very cool idea, local type inference. It is an interesting idea that the new langauges of this century lean on this heavily, so that (for instance) idiomatic Go code has relatively few explicit type annotations. Meanwhile, the dynamic language people are adding type annotations, not necessarily for performance but rather for maintenance and better tooling. auto comes with a big gotcha ; since C++ is a value-based language, auto v = expression makes v the value type and does a copy; if you want a reference to that value, say auto& v = expression. (This burned me more than once, so I feel I ought to pass on the warning).

Move semantics are a powerful new idea. The concrete result of this is that you no longer need to worry about a function returning a std::vector, since the vector constructed in the function will be moved to the returned vector; basically its data (a pointer to some values) is moved to the new vector and zeroed out in the old vector. But the old advice to pass const &T if you just want to access a large object still stands, since C++ still copies across by value.

Parts of a system may want to keep objects from another part. This can be tricky, because we don't have a garbage collector cleaning up when the last 'owner' of the object is gone. Say a GUI object needs to keep a a persistent store object as part of its state; what must it do when it dies? It can ignore it, and assume that the persistent store people will free it one day. But how do the PSP know that their object isn't still being used? More machinery needs to be added, and so forth. (No wonder big C++ systems are like Heath Robinson/Rube Goldberg cartoons). Better to use a well-proven idea that has a standard implementation - shared ownership through smart pointers.

 class B {
     shared_ptr<A> a;

     B(shared_ptr<A> a) : a(a) { }

     int method() {
         // this is the underlying pointer
         cout << a.get() << endl;

         // otherwise behaves just like a regular pointer
        return a->method();

 void test() {
     shared_ptr<A> pa (new A());
     B b1(pa);   //b1 shares pa
     B b2(pa);   //b2 shared pa

     // exactly the same operation, since B's share an A

 } // b1 and b2 are now dead. No one left sharing pa, so it is deleted.
This is easier to show than explain - it is a way for objects to share an object, in such a way that when the last sharer goes, the shared object may be safely deleted. It works because the default destructor for B works by destructing its members, releasing the shared pointers. Smart pointers can be put into containers; when the container is destroyed, the shared pointers are again released. A note on Names: readability can be helped by typedefs:
 typedef std::vector<std::shared_ptr<A>> SharedVectorA;

Here's the new generalized typedef in action:

 template <class T>
 using SharedVector = std::vector<std::shared_ptr<T>>;

which can be used like so: SharedVector<A> my_a_list.

Dealing with Errors from the Compiler and Its Community

I must emphasize that C++ compilers, however stubborn and irritating they can be, do not embody active malice. They merely wish to share their understanding of the problem, in a verbose and pedantic way, like a person with no social skills. So the thing to do is concentrate on the first error and ignore the spew.

For example, a naive but hopeful person might think that iostreams can dump out vectors directly. Here are the first lines of the error (clang 3.4)

 for1.cpp:11:9: error: invalid operands to binary expression
 ('ostream' (aka 'basic_ostream<char>') and 'vector<int>')
 cout << vi << endl;
 ~~~~ ^  ~~

That's not too bad! Notice that the compiler uses the terminology of compilers, not humans. You have to know what a 'binary expression' is and that it has 'operands'. These terms are not part of C++, but part of how people talk about C++.

But the compiler will now go on for another ninety lines, telling you how it could not match vector<int> against any of the many overloads of ostream& << TYPE. These can be safely ignored, once you know that you can't dump out a vector directly.

GCC 4.8 makes a complete hash of it, however.

 for1.cpp:11:12: error: cannot bind 'std::ostream {aka std::basic_ostream<char>}' lvalue to 'std::basic_ostream<char>&&'
     cout << vi << endl;

Visual C++ 2010 is relatively clear, but thereafter degenerates into irrevalent confusion:

 for1.cpp(11) : error C2679: binary '<<' : no operator found which takes a right-
 hand operand of type 'std::vector<_Ty>' (or there is no acceptable conversion)

And so an important point: there is more than one compiler in the world, and some of them are more sensible than others.

We move on to the attitude of the community to errors. For instance, unlike many introductory texts, I don't bother to prefix standard types with std:: because the result is easier to read and type. You ought in any case know the commonly-used contents of the std namespace; if there is an ambiguity, then the compiler will tell you about it. The one thing you must not do is say using namespace std in a header file, because you will truly be Polluting the Global Namespace for everyone else. Anyway, this is one of the little things that can generate a lot of hot air on the Internet - it is a red flag to a certain kind of bull who believes that stylistic quirks represent major heresies. I'm reminded of Henry Higgins in My Fair Lady when he sings "An Englishman's way of speaking/Absolutely classifies him/The moment he talks/He makes some other Englishmen despise him/One common language I'm afraid we'll never get".

Like with compiler messages beyond the statement of the original error, it is a good idea to ignore random online opinion. Much better to read the classics - Dr Stroustrup himself is refreshingly pragmatic. The term 'Modern C++' has been redefined several times in the last twenty years, so remember that much earnest wisdom and advice has expired. It may be an unfashionable position, but reading blog posts and looking at Stackoverflow answers is not the way to learn a programming language like C++ properly. Go read a good book, basically. You're no longer forced to use paper.

Use a Good IDE

I suspect this will upset some bulls, because they themselves are super productive in the editors originally created by the Unix gods. Well, great for them, but I can't help feeling that the anecdotal experiences of a few very talented and focussed people does not represent universal good advice, and certainly isn't data. Personally I find the Eclipse CDT makes me more productive, and in my Windows days found Visual Studio equally effective. But I will qualify this: as a professional you should not be dependent on an environment that you cannot code outside - that's when you need a good general-purpose code editor in addition. For instance, when a program is first taking shape, a plain editor is less distracting; I don't write these articles in Word because it it is far too busy, and fails to undertand the creative writing process: first write, then edit; then format.

There is a learning curve with big IDEs, and you will have to read the manual. For instance, it is straightforward to bring a makefile-based project into Eclipse, but then you have to tell it what your include paths and defines are - it cannot deduce this from your makefile, (which is a task which many people find difficult anyway). Once it knows where to find the includes, the errors start disappearing, and the program becomes a live document, in the sense that any symbol becomes a link to its definition using ctrl-click, and so forth. If you type nonsense, the environment will start emitting yellow ink and finally red. This takes some getting used to, since it will often complain before you've finished an edit. Again, it's about filtering out irrelevant criticism. The benefits go beyond 'hyper-linking' to code completion, where ctrl-enter will complete functions and methods for you, and safe global renaming, which is useful for us who can never get names exactly right the first time. Having constant feedback about errors means that often your builds will be correct, first time.

So right tool for the job at hand, and knowing when a tool is appropriate. Investing some learning time in your tools always pays off. For instance, learn the keyboard shortcuts of your editor/IDE and you will not waste too much time with pointing your mouse around like a tourist trying to buy things in a foreign market. Learn your debugger well, because exploring the live state of a program is the best way to track down most bugs. C++ can be frustrating to debug, but GDB can be taught to display standard library objects in a clear way.

There seems to be some contradiction in what I'm saying: first I say that rich environments are too busy and distracting, and then say that they are enormously helpful. The trick is again to know what tool to use for what phase of a project.

For instance, I encourage you to write little test programs to get a feeling for language features - for this an IDE is irritating because setting up a new project is tedious. Better then to have a code editor which can run the compiler and indicate the errors. But for larger programs, you need to navigate the code base efficiently and perform incremental changes.

Good Design and Modularity

Of course, C++ does not have modules. They're likely to arrive in the coming years for the 2017 standard, but currently all we have are 'compilation units', aka 'files'. A class consists of a header file and an implementation file? Not necessarily. I think people are encouraged to keep classes separate to avoid source files getting too large. If a group of classes hunt in a pack then there's no reason why they can't live in the same file, and if nobody outside a file refers to a class, then it doesn't need to be defined in a header. Finally, if a class is just a collection of static functions, then that class is acting like a namespace and should be expressed as such - this is not Java.

Code that needs to see a class obviously needs to see its definition. A weakness of the current C++ compilation model is that such client code also needs to see non-public parts of the class definition, mostly because to create an object, whether directly on the stack or using new, it needs know how big that class is.

I am not a paranoid person so I don't worry about 'others' seeing the 'insides' of my class definition; but it does irritate me that client code depends on private members, and all classes that these members refer to. So any change in the implementaton of a class requires a rebuild of all code that refers to it, and all that code needs to pull in the private dependencies. One way around this irritation is the so-called 'Pointer to Implementation' pattern (PIMPL).

 // pimpl.h
 class Test {
    struct Data;
    Data *self;

    Test(double x, double y);
    double add();

Notice the private struct that isn't fully defined! So the first thing the implementation does is define that struct. Thereafter, the methods all refer to the hidden implementation pointer.

 // pimpl.cpp
 #include "pimpl.h"

 struct Test::Data {
    double x;
    double y;

 Test::Test(double x, double y) {
    self = new Data();
    self->x = x;
    self->y = y;

 Test::~Test() {
    delete self;

 double Test::add() {
    return self->x + self->y;

The cost is some extra allocation and redirection when accessing object state. But now imagine that you wish to expose a simple interface to a complicated object, then PIMPL will simplify the public interface and any pure implementation changes will not require a rebuild.

Another method which uses redirection is to define an interface:

 // Encoder.h
 #include <inttypes.h>

 class Encoder {
     virtual int encoder(uint8_t *out, const uint8_t *in, size_t insize) = 0;
     virtual int decode(uint8_t *out, const uint8_t *in, size_t insize) = 0;

The derived subclasses do all the work:

 // simple-encoder.cpp
 #include "encoder.h"

 class SimpleEncoder: public Encoder {
     int encoder(uint8_t *out, const uint8_t *in, size_t insize) {

     int decode(uint8_t *out, const uint8_t *in, size_t insize) {

 Encoder *SimpleEncoder_new() {
     return SimpleEncoder();

Then the client code can get the proper subclass with a XXX_new function. This is not the most elegant way of creating subclasses, but it works and it's easy to understand; you add a new Encoder and add a reference to the builder in the file that's responsible for making new encoders. There is a little cost involved in the virtual method dispatch, but usually this will be dwarfed by the actual payload. (Again, premature optimization...).

The beauty of this decoupling is that the actual details of a particular encoder are kept within a single file, if it fits.

Good design is dividing larger systems into subsystems that are responsible for a single aspect of the problem. Networking, databases, GUI, etc are separate concerns and you make your later life easier by separating them out. And this makes for better builds on average, since modifying a subsystem implementation requires only compiling the affected files.

Monday, 27 April 2015

llib - a Set of Useful C Libraries

C: Why still Bother?

"Programmers waste enormous amounts of time thinking about ... the speed of non-critical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

Donald Knuth

I'll start with the classic Knuth quote in context: even when a program is CPU-intensive, it turns out that usually there's a small part of that program that actually needs to be fast and efficient. This has implications for how we use more low-level languages like C.

It's common to come across the postion that C has been superceded by C++. Since modern C is a kind-of subset of C++, the feeling is that one can write 'better C' and cherry-pick the convenient parts of C++, like strings and type-safe containers. Alas, but then you move instantly from writing good C to writing bad C++, at least according to the guardians of good C++ style. But other things being equal, C++ has definitely got the edge when it comes to easy-to-use containers and strings, and I've enjoyed that style enough to want it in C as well.

Personally, I enjoy C, I enjoy the fast compiles and small executables that don't have burdensome dynamic dependencies. Call me impatient, but waiting half-a-minute for a program to compile tends to clear out my active mental cache. After all, we have to work within constraints, including our own cognitive limitations. Arguing with a C++ compiler sometimes feels like wrestling with a pig, anyway. The lack of abstractive power in C becomes a strength when debugging code, while common C++ abstractions become hard to inspect.

However, C has a small standard library, so any non-trivial project tends to collect ad-hoc utility functions. Half-baked re-invented wheels litter the landscape of these systems. The clean narrative of C programs is often disturbed by outbreaks of pointer gymnastics, hard to read and maintain. This is because of not understanding Knuth's point - the 97% really doesn't have to be fast - and because higher-level coding in C always involves manual resource management. For instance, if I write a function to split a string using a delimiter into an array of strings, I will have to provide a function which cleans that array up. Any non-trivial C object will have a named destructor. For instance, in the mosquitto messaging API, we have mosquitto_sub_topic_tokenize and the corresponding mosquitto_sub_topc_tokens_free. POSIX regular expressions haveregcomp and regfree, and so forth.

Automatic storage (arrays allocated on the stack) have a powerful appeal since they're fast (no overhead!) and forgiving (they go away!). But how big is your stack? I don't know, and with recursion no-one knows. This is classic micro-optimization and is a source of joy for crackers who know how easy it is to smash the stack for fun and profit. Not to say that they are evil (I think we should reserve this term for mass murderers and the inventors of BASIC), just over-used without care. They are fast, but because they're in shoot-yourself-in-the-foot territory they should be used when code needs to be fast.

The llib project provides a set of BSD-licensed, useful libraries built around a simple object system. meant to be statically linked with your program. A good contrast would be Gnome's GLIB or Apache's APR, which are big awkward dynamic dependencies. Since this is now officially the 21st Century (many historians believe the 20th started in 1914) - it is modern C. Since we live in an imperfect world but still believe in cross-platform compatibilty, it can be compiled as C++, making it available to Visual Studio users. This will help to explain why the code is unnecessarily pedantic (like explicitly needing to cast from void* to T*. The source is also available as zipball.

Of course, deep embedded programmers, who will work with 4K RAM without obvious signs of distress, play another game altogether. This is not for them.

C with Objects

A llib 'object' is a pointer allocated with a header 'behind' the data - the block is allocated, header written, and the returned pointer is just past the header. There are no smart pointers in C, only dumb pointers with hidden secrets:

 typedef struct ObjHeader_ {  // 64 bits
     unsigned int type:14;
     unsigned int is_array:1;
     unsigned int is_ref_container:1;
     unsigned int _ref:16;
     unsigned int _len:32;
 } ObjHeader;

These objects are reference counted, so they may be easily shared by other objects - we replace the problem of ownership with the problem of sharing. For instance, llib strings are C strings which always have a terminating NUL-byte, but they have type information and must be freed using obj_unref:

 #include <llib/obj.h>
 char *s = str_new("hello dolly");
 printf("%d %d\n", strlen(s), array_len(s));  // -> 11 11
 printf("type %s is_array %d\n", obj_typename(s), obj_is_array(s));
 // -> type char is_array 1
 obj_unref(s);   // decrement the reference count

You may safely use these strings in any context expecting a C string, as long as you don't free them. Their header has type information, reference count and array size. This is useful if you have strings that contain embbedded NUL characters. The reference count starts as one, and when it becomes zero, the pointer is deallocated - this is called disposal.

 char *a = str_new("hello world");
 char **s = array_new(char*,2);
 a[0] = obj_ref(s);  // rc -> 2
 a[1] = obj_ref[s); // rc -> 3
 obj_unref(s);  // rc -> 2
 // string s is still very much alive

Stroustrup says "In C, arrays don't even know their size". These arrays do, however, which simplifies APIs that use them. For instance, str_split will return an array of strings, and array_len will tell you how big that array is. But how do we solve the pesky problem of freeing that array together with its strings?

 char **ra = array_new_ref(char*,2);
 char *t = str_new("frodo");
 ra[0] = ref(t);   // informal shortcut name for obj_ref
 ra[1] = ref(t);
 // all dead!
 char **strs = str_split("hello world"," ");
 assert(array_len(strs) == 2);
 // the array and its two strings are no more...

ra is a reference array which will unref all its elements when it is disposed. If str\_split was done in the old-fashioned way, I would also have to provide 'strings_free(for cleanup) andstrings_len` (perhaps depending on a NULL-terminated array convention). So a little more intelligence in objects means less API cognitive load.

Virtual Destructors

To llibify an object is straightforward, although more tedious than C++

 typedef struct {
     const char *name;
     int age;
 } Person;
 static Person_dispose(Person *p) {
 Person *person_new(const char *name, int age) {
     Person *res = obj_new(Person, Person_dispose);
     res->name = str_ref(name);
     res->age = age;
     return res
 Person *mother = person_new("Helga",52);
 char *name = str_new("Bob");
 Person *father = person_new(name,56);
 assert(obj_refcount(father->name) == 2);
 // name is still alive with rc == 1

So (a) objects share other objects explicitly using refcounting (b) they may have a disposal function which is called when they are disposed; they can then hand the objects back to the system. But obviously that disposal function can do much more, like freeing resources, so this is pretty much what C++ calls a virtual destructor.

Why str_ref? Because we have no guarantee that the string will be one of ours. If it's a plain C string, statically or dynamically allocated, then we'll have to make a refcounted copy.

You still have to manage the lifetime of your shared objects, although replacing free with unref gets you half way. The other half is remembering to increment the reference count when a function or object is 'borrowing' a pointer.

It would be most cool if the reference to an object would automatically be decremented when it went out of scope. This can't be done using standard C99 but GCC has a variable attribute cleanup which will call a function when that variable goes out of scope. Clang implements this as well (as does Intel C on Linux) so this is too good not to use. llib wraps this up as a macro:

    obj_scoped Person *john = person_new("john",23);
 } // john is gone!

Since objects may have custom disposal functions, this is effectively C++'s RAII pattern: people are obsessed with avoiding memory leaks, but resource leaks can be a bigger problem. It's also a way to avoid the single-return pattern that larger C functions adopt, since we now have a disposal guarantee for objects - multiple returns don't have to explicitly unref the objects falling out of scope.

This is not a new idea, of course; see "implementing smart pointers for the C programming language". I don't foreground it as much in llib because it is a non-standard feature which I cannot (yet) find a way to do in C++ (the fallback position) without resorting to actual smart pointers.

Resizable Arrays and Values

llib provides doubly-linked lists and binary-tree maps because, well, that's what we learned at school. These will aways be awkward in C; either they must store untyped pointers for you, or your types must have the necessary next/previous and left/right pointers in them. It turns out that resizable arrays are flexible and easy to make strongly-typed.

In llib, such arrays are called 'seqs' (inspired by my brief flirtation with Nim). They are essentially pointers to arrays, with a capacity, and they resize using the familar doubling method, like std::vector.

 int **ss = seq_new(int);
 FOR(i,100) {
     seq_add(ss, 10*(i+1));
 int *a = *ss;  // can always access the array
 assert(a[0] == 10 && a[1] == 20);
 // now take ownership of the array
 a = seq_array_ref(ss);
 // ss is now dead;  a has rc == 1

A common idiom is to build up an array using a seq, and then use seq_array_ref to provide you with the final array, sized to fit the actual number of elements. There are also functions to insert and remove elements, although of course these may be inefficient for large arrays. But as always, "when in doubt, use brute force", as Ken Thompson said.

Associative arrays are just one of those very cool things you absolutely need. But you don't necessarily need binary search. In llib 'simple maps' are arrays of pointers, where the even elements point to strings and the odd elements point to your data. Combining this with seqs results in the smap_* family of functions.

Now it turns out with these two concepts, you can do dynamic data, or as it is pronounced in Computer Science, Lisp. For instance, JSON data maps neatly into arrays of maps of arrays and so forth. To complete the picture we simply need to 'box' primitive values as llib objects, which is a well-known strategy in languages like Java and C# which are semi-dynamic. So value_float will make a typed object out of a double value, the predicate value_is_float can be used to test the type, and value_as_float will 'unbox' the double value. The representation is very simple, just a pointer to a double, plus llib type information. With this, one can construct, read and write arbitrary JSON-style data.

I suspect that many C programmers would regard this as the kind of thing that keeps them from Java . But, this is not an essential part of using llib, just a flexible way of looking at data. You do not (for instance) need to box primitive types in a normal homogenous array, only in arrays that may contain any kind of data. The JSON parser will unbox such an array for you, if it finds that all its values are numbers.

 double *xx = array_new(double,3);
 xx[0] = 1;
 xx[1] = 2;
 xx[2] = 3;
 // VAS is short for Value Array of Strings - tolerantly accepts any objects as well
 PValue* p = VAS("one","two","three",xx);
 // ==> ["one","two","three",[1,2,3]]
 // the other way
 PValue obj = json_parse_string("[10,20,30,40]");
 if (obj_is_instance(obj,"double")) { // check the type!
    double *aa = obj;
    FOR(i,array_len(aa)) printf("%f ",aa[i]);
 // ==> 10.000000 20.000000 30.000000 40.000000

Values provide another solution to an old C problem: how to return errors. Everyone agrees that errno is kind of icky, but we can only return one value directly.

 const char *str (int i) {
     if (i == 0)
         return value_errorf("number is bad %d",i);
     char buff[20];
     return str_new(buff);
 const char *res = str(0);
 if (value_is_error(res)){
     fprintf("error: %s\n",res);
 } else { // ok!
 obj_unref(res); // in both cases

str always returns a llib string, but an error is a string with a different dynamic type, so that value_is_error(res) is true.

What do Programs Need?

I don't want to give a full overview of the llib API, since there's The Documentation - (I've already been taken to task for writing artcles longer than the average attention span.)

Suffice to say that a major obsession of llib is string manipulation and common file format support, since the original C API is very basic. We start with an analysis of what non-trivial programs need in library support:

  • parsing command-line arguments arg
  • reading configuration files config
  • file and path operations file
  • flexible and more expressive string handling str
  • reading tabular data table
  • reading and writing XML and JSON xml, json
  • creating documents from templates template
  • parsing structured documents using a lexical scanner scan

It would be an unusual program that used all of this, but the beauty of static linking is "you don't pay for what you don't use". XML/JSON is part of the age we live in, but (particularly for the former) I don't claim these libraries are industrial - their niche is a program needing to read simple XML configuration files, or needing to speak JSON to external programs.

The other beauty of static linking is building executables with no unusual dependencies.

What's the Damage?

This all has a cost since each object must have a eight-byte hidden header. So the memory overhead for lots of small objects can be significant.

Most C programmers (incuding myself) don't like framework fascism, You may safely use parts of llib inside a C application but you must at least use obj_unref instead of free to clean up. It may be necessary to copy strings and arrays if they later might be freed.

The main discipline needed is to religiously balance every ref with an unref; be aware that reference arrays (and other reference containers) decrement the reference count of all their elements. But adding an object to such a container doesn't automatically increase its reference count, since that must be explicit. My experience is that it's hard to do this as an afterthought - write careful leak-free code from the start if the code is going to be part of a larger application. Apparently this was the story when folks tried to package Git as a library - Mr Torvalds had written the fastest code he could, and didn't bother to clean up memory since the process wouldn't last long enough for it to matter.

Currently, llib isn't thread-safe - the allocations (obj_new) access and modify static program state. unref can bite you if you're unlucky (and eventually everyone is unlucky with threading). Because of this static state, if both a program and a shared library use llib, then they must dynamically link to the core of llib. Fortunately, this is only a few Kb.

A tricky problem remains: of reliably detecting whether a pointer is 'one of ours'. This is principally needed when deciding to make a llib copy of a passed C string - otherwise it would make these libraries awkward to use, if you had to explicitly call str_new (or simply obj_ref) when passing strings. No solution seems entirely satisfactory; on Linux I keep track of high- and low- allocated memory pointers, which appears to work, but it would be easily confused by pointers allocated with just plain malloc. On Windows the multiple heaps used by the runtime make such a simple heuristic impossible, so I resort to keeping track of every allocation. Eventually, this becomes inefficient.

A Longer Example

The utility pkg-config is a descendent of the original gtk-config, which was needed to make building GTK+ applications less painful. As you can see from the end of the linked man page, this program has been re-written a few times. Unfortunately, it has picked up some awkward dependencies along the way, especially GLIB. I suspect its last author was having too much fun, so I decided to do another rewrite in the same spirit of indulgence.

pkg-config --flags gtk+-2.0 will give you all the C compile flags needed for compiling a GTK+ program, and --libs gives all the linker flags. It works by looking up files with extension .pc, which tells you what build flags that 'package' needs, plus what packages it further depends on, or requires. It's a classic file and text heavy program that any Python programmer would regard as unsuited to the old dog C. But (sorry lads) it has to be fast, because it's been shelled out a lot in makefiles all over the Gnomosphere.

It's a bit long to include in an article, so I invite you to read it here. I've tried to write it in as literate a style as possible, So it is deliberately kept as a single 563 line program.

Although a little larger (and it's hard to find 51K large), pkgconfig has no dynamic dependencies other than libc (I've excluded Linux-specific stuff here for clarity; it does no fancy stuff so easy to recompile for Windows and so forth as well.)

 pkgconfig$ ll pkgconfig
 -rwxrwxr-x 1 user user 51268 Apr  5 13:35 pkgconfig*
 pkgconfig$ ldd pkgconfig => /lib/i386-linux-gnu/ (0xb75f1000)
 pkgconfig$ ll $(which pkg-config)
 -rwxr-xr-x 1 root root 38592 Jul 14  2013 /usr/bin/pkg-config*
 pkgconfig$ ldd $(which pkg-config) => /lib/i386-linux-gnu/ (0xb7625000) => /lib/i386-linux-gnu/ (0xb7618000) => /lib/i386-linux-gnu/ (0xb7468000) => /lib/i386-linux-gnu/ (0xb742a000)

Wednesday, 2 October 2013

C++: Some Consequences of a Design Decision

Top Dog

C++ has a very solid position as the programming language which makes the least performance compromises while providing good abstraction mechanisms that allow code to be written at a high level. In many ways, it's an easier and safer language to learn than C, if you stick to the same imperative style but use std::string and the provided containers.

Both of these statements are of course open to debate. The first statement is true, if we look at the usage of this language in performance-critical applications. The second is often challenged. To quote Andrei Alexandrescu's comment in this Reddit comment:

In my opinion C++ is a language for experts and experts only [...] It has enough pitfalls in virtually all of its core constructs to make its use by benevolent amateurs virtually impossible except for the most trivial applications.

Ouch! That's fair enough; we were comparing it to C anyway (which is definitely not for sissies). It is not really a programming language for civilians, and not a good first language for anyone other than a would-be professional. (In fact I'd say you would get a better all-round education in C, even if later you turn in relief to non-military languages; learning C++ is mostly good for ... programming in C++.)

Error, Error on the -Wall

The big hurdle is the first one, and that's making sense of the error messages:

 #include <iostream>
 #include <string>
 #include <list>
 using namespace std; // so sue me
 int main()
     list<string> ls;
     string s = "hello";
     ls.append(" world");
     cout << ls << endl;
     return 0;

This isn't a bad attempt at a C++ program at all, leaving aside the pedantic belief that using namespace std is bad. (I take the pragmatic view that anybody is free to inject whatever namespaces they care to within their files, and not take away this freedom from others by injecting namespaces within header files. C++ is very good at resolving ambiguous name references and everyone should know the contents of std anyway.)

We get nearly four hundred lines of error messages, full of implementation details. In this case, the abstractions are leaking all over the user!

Verity Stob once suggested that the thing to do was to write a Perl script to parse the error output. This was very funny and true, but using Perl would increase the number of problems. My practical way of realising Verity's joke is to use lake and a suitable plugin:

 C:\Users\steve\dev\dev>lake -L filter.cpp error.cpp
 g++ -c -O2 -Wall -MMD  error.cpp -o error.o
 error.cpp: In function 'int main()':
 error.cpp:10:8: error: 'list<string >' has no member named 'append'
      ls.append(" world");
 error.cpp:11:10: error: no match for 'operator<<' (operand types are 'ostream {a
 ka ostream}' and 'list<string >')
      cout << ls << endl;
 lake: failed with code 1

Now our noob has a fighting chance, and can now go to the reference and actually find the appropriate method.

Templates Considered Harmful

The real issue is that the C++ standard libraries over-use generics. std::string and std::stream could be plain classes, as they once were. At this point, there will be someone suggesting that I am a plain ASCII bigot and forgetting the need for wstring and so forth. Fine, let them be plain classes as well. An incredible amount of ingenuity went into making templated string types work, and the library designers could have made their life easier by using a low-tech solution. Generally, we should not pander to library designers and their desires, since they chose the hard road: their job is to use the right level of abstraction and not complicate things unnecessarily.

C++'s standard generic containers are fantastically useful, but their design is overcomplicated by being also parameterized by an allocator. This is a useful feature for those that need it, but there could be two versions of (say) std::list overloaded by template parameters, which can be done in C++11 with variadic templates. This makes life a bit harder for library implementers, but they are precisely the people who can manage complexity better than users.

The Standard is the Standard, no point in moaning. But let's do an experiment to see what the consequences of a simplified standard library. I emphasize that tinycpp is an experiment, not a proposal (modest or otherwise). It originally was done for the UnderC project, since the interpreter could not cope with the full STL headers, and I've since filled in a few gaps. Here it's purpose is allow some numbers to be generated, since qualitative opinion is all too common.

These simplified 'fake' classes directly give us better error messages, especially if the compile bombs out on the first error. (Often after the first error the compiler is merely sharing its confusion.)

 $ g++ -Wfatal-errors -Itiny error.cpp tiny/iostream.o tiny/string.o
 error.cpp: In function 'int main()':
 error.cpp:10:8: error: 'class std::list<string>' has no member named 'append'
      ls.append(" world");
 compilation terminated due to -Wfatal-errors.

It's easy to forget the initial difficulty of learning to ride the bicycle, and to scorn training wheels as a useful means to that end.

Templates Slow you Down

People say 'C++ compiles slowly' but this not really true. A little C++ program will involve in about 20Kloc of headers being processed, a lot of that being template code. Using the tinycpp library that goes down to 1.4Kloc.

The three compilers tested here are mingw 4.8 on Windows 7 64-bit, MSVC 2010 on the same machine, and gcc 4.6 in a Linux Mint 32-bit VM.

Here is a comparison of build times for standard vs tinycpp:

  • mingw 0.63 -> 0.33
  • gcc 0.60 -> 0.20
  • msvc 0.82 -> 0.17

As always, gcc works better on Linux, even in a VM, and it's no longer slower than MSVC. In all cases the tinycpp version compiles significantly faster.

C++ programmers can get a bit defensive about compile times, and often end up suggesting throwing hardware at the problem. There seems to be a "You're in the Marines now boy!" macho attitude that wanting to build faster is a sign of civilian weakness and poor attention span. This attitude is off-putting and gets in the way of better engineering solutions. Most people just suck it up and play with light sabres.

Templates Make you Fat

With small programs, these compilers produce small executables when they are allowed to link dynamically to the C++ library. This is not considered a problem on Linux, because obviously everyone has upgraded to the latest distro. But if you want to chase cool new C++11 features you may find that most of your users don't have the cool new libstdc++ needed to run your program.

It is (curiously enough) easier to get a new shiny GCC for Windows precisely because it's not considered part of the system. Executables rule in Windows, so it's alarming to find that a small program linked statically against libstc++ is rather large, nearly 600kb for Windows. And since libstc++ is not part of Windows you (again) have to suck it up. (And this is definitely what Alexandrescu would consider a 'trivial application'.)

You can get down to 174Kb using the fake tinycpp libraries, which suggests that an up-to-date and properly engineered version of std-tiny would be useful for delivering executables, not just for speed and noob-friendliness.

MSVC does static linking much more efficiently; the numbers are 170Kb (std) and 95Kb (tiny). The resulting executables have no C runtime dependencies whatsoever. Which suggests that MSVC is (at least) a good choice for building releases for distribution. Using a cross-platform compiler-aware tool like CMake or Lake can make that less painful. Not an ideologically comfortable recommendation to accept, true, but whatever works best. (The command-line version of MSVC 2010 is freely available.)

This preoccupation with executable sizes seems last-century by now (after all, Go users are fine with megabyte executables since they see that as the price of no other runtime dependencies.) And large executables are not slower, providing the actual code that's executing at any point is compact enough to be cache-friendly. So perhaps I'm just showing my age at this point, although please note that resource-limited devices are much more common than desktop computers.

No Free Lunches

C++ programmers like the phrase 'abstraction overhead' because C++ is very good at reducing this to zero in terms of run-time. Often this is at the cost of executable size, compile time and confusing errors. This may be an acceptable price, but it is not free.

C++ is what it is; it is unlikely to change that much, except get even slower to compile as the Boost libraries move into the Standard library. But I think that there are some lessons to be learned for new languages:

  • keep the standard library as simple as possible: library developers should not have too much fun (They should write cool applications that use their libraries instead to get excess cleverness out of their system.)
  • error messages should not burden the user with implementation details; this means that the abstraction is leaking badly.
  • compile time still matters. Perhaps the people who use C++ more regularly are more likely to be those who like to think upfront (like embedded programmers) but this is not the only cognitive style that flourishes in programming. It is a mistake to think that long build times are a necessary evil, since with C++ they largely come from an outdated compilation model. New languages can do better than that.

Monday, 2 September 2013

Nimrod: The Return of Pascal

Why learn Another Language?

The first answer is: because it's fun. Just as a botanist is excited to find a new plant, programming language nerds like trying out new languages. Secondly, any new language uses new strategies for dealing with the basic problems of communicating algorithms to computers and intents to other programmers. So it is the most sincere form of criticism: a working implementation to constrast with the approaches taken by other languages. There's far too much armchairing and bikeshedding involved in discussions about languages, and you have to admire a guy who has spent a sizeable chunk of his life trying something new like Nimrod's author, Andreas Rumpf.
If you're not a language nerd, a new language might provide a solution to an actual computing problem you are facing. (Who would have guessed?)

Hello, Nimrod

For this exercise, I'm assuming a Unix-like system, but pre-compiled installers for Nimrod on Windows are available.
First, download and build Nimrod from here. It only takes a few minutes, and after making the suggested symlink nimrod will be on your path. In that directory, you will find a most useful examples folder, and the documentation is doc/manual.html for the manual,doc/tut1.html for the tutorial,2doc/lib.html for the standard library.

Here is a slightly non-trivial Hello-world application, just to test the compiler:

 # hello.nim: Hello, World!
 var name = "World"
 echo("Hello " & name & '!')

Compiling involves the simple incantation nimrod c hello.nim, which will generate a very chatty record of the compilation, and an executable hello. This has no external dependencies apart from libc and comes at about 130Kb; with nimrod c -d:release hello.nim the compiler agressively removes unneeded code and we are down to 39Kb.

This is the first take-home about Nimrod; it compiles into native code using the available C compiler and so can take advantage of all the optimization work that's gone into beasts like GCC and MSVC. There is no special runtime, so these executables can be shared with your colleagues without fuss. In the library documention doc/lib.html, 'pure' libraries will not introduce extra dependencies. Whereas (for instance) the re regular expression library currently implies an external dependency on PCRE.
The verbosity is interesting the first few times, and thereafter becomes tedious. I've defined these bash aliases to get cleaner output:

 $ alias nc='nimrod c --verbosity:0'
 $ alias ncr='nimrod c -d:release --verbosity:0'

Training a programmer's editor to present Nimrod code nicely is not difficult; using Python highlighting works well since the languages share many keywords. The main thing to remember is that Nimrod does not like tabs (thank the Gods!) here are some SciTE property definitions which you can put into your user configuration file (Options|Open User Options File); now F5 means 'compile if needed and run' and F7 just means 'compile'.

After a few invocations to get all the tools in memory, this compilation takes less than 200ms on this rather elderly machine. So the second take-home is that the compiler is fast (although not as fast as Go) and definitely faster than C++ or Scala. In particular, syntax errors will be detected very quickly.

A First Look

This code looks very much like a typical 'scripting' language, with hash-comments, explicitly-declared variables and string operations like concatenation (&). (A separate concatenation operator is a good decision, by the way; Javascript has a number of famous ambiguities that come from + meaning two very different things.)

However, this is not dynamic typing:

 # types.nim
 var s = "hello"
 var i = 10
 s = i
 $ nc types
 examples/types.nim(4, 5) Error: type mismatch: got (int) but expected 'string'

So s is statically-typed as 'string', i is typed as 'int', and no sane conversion should ever make an integer into a string implicitly. Nimrod does local type inference which examines the expression on the right-hand side and uses that type for the declared variable, just like the same statement would do in Go. Another good thing, since a variable cannot change type underneath you and you really need as many errors to happen at compile-time. The resulting code is also much more efficient than dynamically-typed code.

The next program looks very much like Python:

 # args.nim
 import os
 for i in 0..ParamCount():
 $ nc args
 Hint: operation successful (14123 lines compiled; 0.374 sec total; 12.122MB) [SuccessX]
 $ ./args one two three

But beware of surface resemblences; sharks and orcas look much the same, but are very different animals. The language that Nimrod reminds me of here is Rodrigo 'Bamboo' de Oliveira's Boo, the second-greatest programming language to come from Brazil. His comment is "We also love the Monty Python TV show! - but Boo is not Python". So Pythonistas should not assume that they can automatically skip the first semester with Nimrod. The first difference to note is that import brings all functions from the module into the current scope.

Apart from basic syntax, built-in functions like len and repr work mostly as you would expect from Python. Slicing is supported, but note the different syntax:

 var S = "hello dolly"
 var A = [10,20,30,40]
 var B = A[1..2]
 echo(len(A)," ",len(B)," ",len(S))
 for x in B: echo(x)
 # --->
 4 2 11
 [10, 20, 30, 40]

Type inference is fine and dandy, but is not letting us have the full picture. The 'lists' in square brackets are arrays, and they are fixed size.

The Return of Pascal

To a first approximation, an orca is a wolf in shark's clothing. Simularly, the language that Nimrod most matches in nature is Pascal:

 # pascal.nim
     TChars = range['A'..'C']
     TCharArray = array[TChars,int]
 var ch: TCharArray
 for c in 'A'..'C':
     ch[c] = ord(c)
 for c in low(ch)..high(ch):
     echo("char ",c,' ',ch[c])
 # --->
 char A 65
 char B 66
 char C 67

Paws have become flippers (= instead of :=, no semicolons or begin..end blocks) but this is classic Pascal typing, with subranges and array types declared over arbitrary ordinal types. So accessing ch['Z'] is a compile error 'index out of bounds'. Also, 'Z' is of type char and "Z" is of type string - quite distinct as they are in C as well. Like Pascal, arrays are always bounds checked, but this can be disabled by pragmas. The T convention for naming types should be familiar with anyone who was a Borland fan.

Please note that variables are case-insensitive! Underscores are ignored as well. (This may well change.)

Another indication that Nimrod comes from the Niklas Wirth school is that functions are called procedures, whether they return something or not.

 # proc.nim
 proc sqr(x: float): float = x*x
 # -->

You should not assume that float means 32-bits; the manual says "the compiler chooses the processor's fastest floating point type" and this usually is float64; there is also float32 if you wish to be explicit, just as in Go. (The usual conversions between integers and floats are allowed, since they are widening.) In a similar way, int always has the size of a pointer on the system (which is not true for C), and there is intXX where XX is 8,16,32 or 64.
Also as with Pascal, arguments may be passed by reference:

 # var.nim
 proc modifies (i: var int) =
     i += 1
 var i = 10
 for k in 1..3:
 # --->

This is a procedure that returns nothing. Every language draws a line in the sand somewhere and says "I don't think you should do that, Dave". One of Nimrod's rules is that you cannot just discard the results of a function that returns a value, unless you use the keyword discard before it like discard fun(), rather as we say (void)fun(); in C.

There is fairly standard exception handling. A cool novelty is that finally or except can be used as standalone statements:

 proc throws(msg: string) =
     raise newException(E_base, msg)
 proc blows() =
     finally:   echo "got it!"
     echo "pre"
     throws("blew up!")
     echo "post"
 # --->
 got it!
 Traceback (most recent call last)
 finally.nim(10)          finally
 finally.nim(7)           blows
 finally.nim(2)           throws
 Error: unhandled exception: blew up! [E_Base]

This is very similar in effect to Go's defer mechanism, and allows for deterministic cleanup.

Tuples are Structs

It's often better to take the Python strategy and return multiple results using a tuple

 # tuple.nim
     TPerson = tuple [name: string, age:int]
 proc result(): TPerson = ("Bob",42)
 var r = result()
 echo(r)             # default stringification
 echo (       # access by field name
 var (name,age) = r  # tuple unpacking
 echo (name,"|",age)
 # --->
 (name: Bob, age: 42)

Different tuple-types are equivalent if they have the same fields and types in order ('structural equivalence'). Nimrod tuples are mutable, and you should think of them more as akin to C's struct.
Functions defined over a type have a most curious and interesting property. Contining with tuple.nim we write a silly accessor function:

 proc name_of(t: TPerson): string =
 # --->

That last line is something to think about: we've got something like 'methods' by just using the function after the dot, as if it were a field; in fact you typically leave off the () in this case and have something very much like a read-only property.

'List' was a Bad Name Anyway...

I mentioned that [10,20] is a fixed-size array, which is the most efficient representation. Sequences can be extended, like C++'s vector or Python's List types.

 # Using seq constructor and append elements
 var ss = @["one","two"]
 # using newSeq, allocate up front
 var strings : seq[string]
 newSeq(strings, 3)
 strings[0] = "The fourth"
 strings[1] = "assignment"
 strings[2] = "would crash"
 #strings[3] = "out of bounds"

Using sequences of strings and the parseopt module, here is a simple implementation of the BSD head utility. The release executable is 58Kb, which is an order of magnitude smaller than the equivalent Go stripped executable. It's only 54 lines, but a little big to be an inline example. The case statement is very Pascal-like:

 case kind
 of cmdArgument:
 of cmdLongOption, cmdShortOption:
   case key
   of "help", "h": showUsage(nil)
   of "n":
     n = parseInt(val)
   of "version", "v":

parseopt isn't fully GNU compatible: in particular, you have to say ./head -n=3 head.nim rather than -n3 or -n 3. The code style is a bit low-level for my taste; compare with lapp; a well-behaved command-line tool must always provide its usage, so why not reuse that text to describe flags, arguments and their types? This style works particularly well with dynamic languages, but it can be done with Nimrod. Here is head, revised:

 # head.nim
 import lapp
 let args = parse"""
 head [flags] filename
   -n: (default 10) number of lines
   -v,--version: version
   <files> (default stdin...)
     n = args["n"].asInt
     files = args["files"].asSeq
 proc head(f: TFile, n: int) =
     var i = 0
     for line in f.lines:
         i += 1
         if i == n: break
 if len(files) == 1:
     for f in files:
         echo("----- ",f.fileName)

Associative arrays are the key here, plus a variant value type. lapp ensures that numerical flags are correctly converted, files are opened (and afterwards closes them on a exit hook set with addQuitProc). There are some conventions to be followed:

  • flags may have a short alias; the long name is always used to access the value
  • flags are bool values that default to false
  • parameters are enclosed in <...> and are string values with no default
  • you can specify the type explicitly: bool,int,float,string,infile',outfile, or set the default and have the type infered from that:stdinandstdout` have their usual meanings

One of the really cool things about type inference is that so many of the implementation details are hidden from users of a library. This is obviously good for the user, who has less to remember, but also for the library implementer, who has freedom to change the internal details of the implementation. It leads to a style which looks and feels like dynamic code, but is strictly typed with meaningful compile-time errors.

Here the type of args is irrelevant; it is an associative array between flag/argument names and some unspecified value type, which has known fields. (In fact, this version of lapp only exports parse, the fields, and a suitable definition of [] from 'tables')

'class' is not a transferable idea

People tend to reason from simularity, so the naive nature watcher constructs a false homomorphism between sharks and orcas. I fell into this trap, assuming 'inheritance' means 'polymorphism using virtual method tables'. Nimrod's optimization attitude is showing here: "Nimrod does not produce a virtual table, but generates dispatch trees. This avoids the expensive indirect branch for method calls and enables inlining". That's right, procedures are always statically dispatched. If you want methods, you need a different construct, multi-methods:

 # class.nim
     TPerson = object of TObject
         name: string
         age: int
 proc setPerson(p: ref TPerson, name: string, age:int) = = name
     p.age = age
 proc newPerson(name: string, age:int): ref TPerson =
 method greeting(p: ref TPerson):string = "Hello " & & ", age " & $p.age
     TGerman = object of TPerson
 proc newGerman(name: string, age:int): ref TGerman =
 method greeting(p: ref TGerman):string = "Hallo " & & ", " & $p.age & " Jahre alt"
 var bob = newPerson("Bob",32)
 var hans = newGerman("Hans",30)
 proc sayit(p: ref TPerson) = echo p.greeting
 # --->
 Hello Bob, age 32
 Hallo Hans, 30 Jahre alt

Here we are making objects which are references (by default they are value types, like tuples, unlike java), initialized with the standard procedure new. Note the Pascal-like special variable result in procedures!

As expected, you may pass Germans to sayit, because a German is a person, but greeting has to be declared as a method for this to work; if it were a proc, we would get a warning about the second greeting being unused, and Germans are then addressed in English.
The cool thing about multi-methods is that they restore symmetry; a traditional polymorphic call is only polymorphic in a. This makes sense in a language where dot method notation is just sugar for procedure calls where the first argument matches the type.

Generics Everywhere

Consider this, where no type is given for the argument of sqr:

 proc sqr (x): auto = x*x
 echo sqr(10)
 echo sqr(1.2)
 # -->
sqr is implicitly generic, and is constructed twice, first for int and then for float. Comparing a similar thing in Boo reveals a key difference:
 def sqr (x):
     return x*x

Here the type of x is duck, where Boo switches to late binding.

Both archieve the same result; sqr can be passed anything that knows how to multiply with itself, but Nimrod wants to generate the best possible code, at the cost of more code generation. The more general way of declaring generic functions goes like:

 proc sqr[T] (x: T): T = x*x

Another example of Nimrod being conservative about your memory needs would be declaring a very large array of strings. In languages where string is a value type like C++ and Go, this would contain valid strings, but in Nimrod the entries are nil until explicitly initialized. So string values can be nil (like Java) which can be a fertile source of run-time errors, but the decision on how much heap to throw at the data structure is left to you, which is a very C-like design decision. Strings in Nimrod (however) are mutable and do copy by value.

Generics make it easy to write operations over containers. Here is map with an anonymous procedure:

   a = [1, 2, 3, 4]
   b = map(a, proc(x: int): int = 2*x)
 for x in b: echo x
 # --->

Anonymous procedures are a little verbose (as they are in Go), but there is a trick. We use a template which is a higher-order generic that rewrites expressions, much like a preprocessor macro in C/C++:

 template F(T: typedesc, f:expr):expr =
     (proc(x:T):T = f)
 b = map(a, F(int, 2*x))

Nimrod achieves the power of the C preprocessor in an elegant fashion, integrated into the language itself. The when statement works with compile-time constants and only generates code for the correct condition, much like a #if chain.

 when sizeof(int) == 2:
   echo("running on a 16 bit system!")
 elif sizeof(int) == 4:
   echo("running on a 32 bit system!")
 elif sizeof(int) == 8:
   echo("running on a 64 bit system!")
   echo("cannot happen!")

LIke if, it can be used in an expression context:

 const dirsep = when hostOS == "windows": '\\' else: '/'

A clever use is to conditionally add testing code to a module when it's compiled and run as a program. These tests can be as detailed as you like, because they will not bloat the compiled library.

 # sqr.nim
 proc sqr *[T](x: T): T = x*x
 when isMainModule:  # predefined constant
     assert(sqr(10) == 100)

As you might expect by now, Nimrod does not provide run-time reflection like Java or Go because it would burden code that does not need it - again, this is C++'s "Don't Pay for what you Don't use". But there is compile-time reflection, implemented by the typeinfo module, which acts as a static equivalent of Go's reflect package.

Second Impressions

There's no doubt that finding errors as early as possible using a compiler (or some other static code analysis tool) is better than finding them later as run-time errors. In dynamic languages we are always at the mercy of a spelling mistake. But static compilation has a cost in time (build times do matter) and in complexity.

Having done about a thousand lines of working Nimrod code, I feel I can express an opinion on the language. Most code is straightforward and free of explicit type annotations, and the compiler quickly gives good error messages. Run-time errors come with a useful stack trace, and mostly come from nil references. It's commonly thought that nillable values are a big mistake (C.A.R Hoare once called it his "billion dollar mistake") but a nil string value is much better at trashing a program. And this is good - fail hard and early!

However, you do need to understand some things to interpret the error messages correctly:

 let a = [1,2,3]
 echo a
 nimex/errors.nim(2, 6) Error: type mismatch: got (Array constructor[0..2, int])
 but expected one of:
 system.$(x: TEnum): string
 system.$(x: int64): string
 system.$(x: string): string
 system.$(x: uint64): string
 system.$(x: T): string
 system.$(x: int): string
 system.$(x: char): string
 system.$(x: T): string
 system.$(x: bool): string
 system.$(x: cstring): string
 system.$(x: float): string

You have to know that echo uses the 'stringify' operator $ on its arguments - then we can interpret this error as being "I don't know how to make a string from an array". The compiler then helpfully presents all the overloaded versions of $ active in this program. Of course, this is scary to people from a dynamic background who were beguiled by Nimrod's surface 'Python-like' syntax. Coming from a C++ background, I'm prepared for this way of doing things, and know that the solution looks like this (quote operators in backticks to define them):

 proc `$`[T](a: openarray[T]): string =
     var s = ""
     for e in a:
         s.add(' ')
     return s

(Mutable strings take some getting used to). This solution will work for any arrays or sequences with elements that understand $, and is very efficient, because Nimrod iterators over sequences are zero-overhead - effectively plain loops over elements.

There is a non-trivial learning curve; a motivated polyglot can learn enough in a week to be productive, but polyglots aren't so common. A new language comes with a shortage of tutorial material/books and a small community. This means that Google is not your friend, and last I checked there were two questions on Stackoverflow, one of which concerned a Brainfuck interpreter. There does however seem to be some action on Github.

A language thrives (like any life form) when it finds a niche in which it is competitive. For Lua, that has been providing a lightweight, powerful yet accessible embeddable scripting language. It has been adopted by many game developers as a way of not writing everything in C++, which is productive in two important ways: small changes to game logic do not need expensive rebuilds and don't require restarting the game; plus lesser mortals can contribute. Professional game programmers tend not to do things simply because they are cool, and so there is a market for Lua skills.

Nimrod is a good fit where C and C++ are often used. We've seen that 'userland' utilities like head can be efficiently implemented in Nimrod, and the resulting executables are typically less than a 100kb and usually have no external dependencies. This makes it a good fit for CGI since they will load as fast as C. With Go, people found statically-linked executables a good way to manage the problem of managing dependencies on deployed machines. Nimrod provides this without Go's approach of reimplementing the whole C runtime.

But the server niche requires well-tested frameworks and libraries, which can only happen with wider adoption. Thus there is a vicious circle that any new language must face; use comes from maturity, and maturity comes from use.
It's well suited to data processing and numerical tasks; operator overloading makes standard mathematical notation possible, and generics make algorithms efficient. Here again having some choice of existing libraries would make all the difference. However, it is relatively easy to bind to C libraries (since the compiler output is C) and there is a c2nim tool for processing C headers.

A particularly cool application is for embedded systems. Here the realities are merciless; embedded processors are everywhere and need to be cheap, and you can't get much memory for pennies. As a consequence, C dominates this field, and it's nasty. I can honestly say that this is my least favourite kind of programming; the preprocessor hacks alone would make Dijkstra lie down and cry. Here Andreas describes how Nimrod can be compiled with a stripped-down library with no OS support, and compiled on a 16bit AVR processor. Nimrod is probably the only new language which has the minimal attitude and metaprogramming capability to be an effective contender in this space, which is traditionally the last bastion of C.

Garbage collection is something that's often used to separate system and application languages. It's hard to add it to an existing language, and hard to remove it from a language, since it is so damn convenient.
A kernel has to manage every byte so that the userland can afford to waste memory; game programmers hate compulsory 'stop the world' GC which tends to happen when you're doing something more important. And embedded controllers often don't even have malloc. See how Nimrod's Garbage Collector works; it is low-overhead, uses reference counting and can be switched off temporarily (unlike with the Dalvik VM on Android)

In summary, Nimrod is a very rich and powerful statically-typed language which relentlessly uses compile-time metaprogramming to achieve its goals of delivering compact and efficient programs. Whether it finds its niche is an open question, but it deserves to be given a chance, and is well worth exploring further.

Update: The title is definitely a mistake, because Pascal represented a simplification of existing practice and was intended as a teaching language.  If I had said Object Pascal then it wouldn't be so bad, since that grew into a genuinely useful language for building large systems.  But Nimrod is influenced by many other languages, so any 'Nimrod is like X'  will always be a simplfication; it is what it is.

It's been pointed out that Javascript's problem is not lack of a concatenation operator, but implicit conversions: C++ lacks the separation also but would never confuse concatenating strings with adding numbers.  There is a similar implicit conversion in Lua (one of the few warts in its design) but the operators are separate, as they should be.