Why learn Another Language?
The first answer is: because it's fun. Just as a botanist is excited to find a new plant, programming language nerds like trying out new languages. Secondly, any new language uses new strategies for dealing with the basic problems of communicating algorithms to computers and intents to other programmers. So it is the most sincere form of criticism: a working implementation to constrast with the approaches taken by other languages. There's far too much armchairing and bikeshedding involved in discussions about languages, and you have to admire a guy who has spent a sizeable chunk of his life trying something new like Nimrod's author, Andreas Rumpf.
If you're not a language nerd, a new language might provide a solution to an actual computing problem you are facing. (Who would have guessed?)
For this exercise, I'm assuming a Unix-like system, but pre-compiled installers for Nimrod on Windows are available.
First, download and build Nimrod from here. It only takes a few minutes, and after making the suggested symlink
nimrod will be on your path. In that directory, you will find a most useful
examples folder, and the documentation is
doc/manual.html for the manual,
doc/tut1.html for the tutorial,2
doc/lib.html for the standard library.
Here is a slightly non-trivial Hello-world application, just to test the compiler:
# hello.nim: Hello, World! var name = "World" echo("Hello " & name & '!')
Compiling involves the simple incantation
nimrod c hello.nim, which will generate a very chatty record of the compilation, and an executable
hello. This has no external dependencies apart from
libc and comes at about 130Kb; with
nimrod c -d:release hello.nim the compiler agressively removes unneeded code and we are down to 39Kb.
This is the first take-home about Nimrod; it compiles into native code using the available C compiler and so can take advantage of all the optimization work that's gone into beasts like GCC and MSVC. There is no special runtime, so these executables can be shared with your colleagues without fuss. In the library documention
doc/lib.html, 'pure' libraries will not introduce extra dependencies. Whereas (for instance) the
re regular expression library currently implies an external dependency on PCRE.
The verbosity is interesting the first few times, and thereafter becomes tedious. I've defined these bash aliases to get cleaner output:
$ alias nc='nimrod c --verbosity:0' $ alias ncr='nimrod c -d:release --verbosity:0'
Training a programmer's editor to present Nimrod code nicely is not difficult; using Python highlighting works well since the languages share many keywords. The main thing to remember is that Nimrod does not like tabs (thank the Gods!) here
are some SciTE property definitions which you can put into your user configuration file (
Options|Open User Options File); now
F5 means 'compile if needed and run' and
F7 just means 'compile'.
After a few invocations to get all the tools in memory, this compilation takes less than 200ms on this rather elderly machine. So the second take-home is that the compiler is fast (although not as fast as Go) and definitely faster than C++ or Scala. In particular, syntax errors will be detected very quickly.
A First Look
This code looks very much like a typical 'scripting' language, with hash-comments, explicitly-declared variables and string operations like concatenation (
+ meaning two very different things.)
However, this is not dynamic typing:
# types.nim var s = "hello" var i = 10 s = i $ nc types examples/types.nim(4, 5) Error: type mismatch: got (int) but expected 'string'
s is statically-typed as 'string',
i is typed as 'int', and no sane conversion should ever make an integer into a string implicitly. Nimrod does local type inference which examines the expression on the right-hand side and uses that type for the declared variable, just like the same statement would do in Go. Another good thing, since a variable cannot change type underneath you and you really need as many errors to happen at compile-time. The resulting code is also much more efficient than dynamically-typed code.
The next program looks very much like Python:
# args.nim import os
for i in 0..ParamCount(): echo(ParamStr(i))
$ nc args Hint: operation successful (14123 lines compiled; 0.374 sec total; 12.122MB) [SuccessX] $ ./args one two three ./args one two three
But beware of surface resemblences; sharks and orcas look much the same, but are very different animals. The language that Nimrod reminds me of here is Rodrigo 'Bamboo' de Oliveira's Boo, the second-greatest programming language to come from Brazil. His comment is "We also love the Monty Python TV show! - but Boo is not Python". So Pythonistas should not assume that they can automatically skip the first semester with Nimrod. The first difference to note is that
import brings all functions from the module into the current scope.
Apart from basic syntax, built-in functions like
repr work mostly as you would expect from Python. Slicing is supported, but note the different syntax:
var S = "hello dolly" var A = [10,20,30,40] var B = A[1..2] echo(len(A)," ",len(B)," ",len(S)) echo(repr(A)) for x in B: echo(x) # ---> 4 2 11 [10, 20, 30, 40]
Type inference is fine and dandy, but is not letting us have the full picture. The 'lists' in square brackets are arrays, and they are fixed size.
The Return of Pascal
To a first approximation, an orca is a wolf in shark's clothing. Simularly, the language that Nimrod most matches in nature is Pascal:
# pascal.nim type TChars = range['A'..'C'] TCharArray = array[TChars,int]
var ch: TCharArray
for c in 'A'..'C': ch[c] = ord(c)
for c in low(ch)..high(ch): echo("char ",c,' ',ch[c]) # ---> char A 65 char B 66 char C 67
Paws have become flippers (
= instead of
:=, no semicolons or
begin..end blocks) but this is classic Pascal typing, with subranges and array types declared over arbitrary ordinal types. So accessing
ch['Z'] is a compile error 'index out of bounds'. Also, 'Z' is of type
char and "Z" is of type
string - quite distinct as they are in C as well. Like Pascal, arrays are always bounds checked, but this can be disabled by pragmas. The
T convention for naming types should be familiar with anyone who was a Borland fan.
Please note that variables are case-insensitive! Underscores are ignored as well. (This may well change.)
Another indication that Nimrod comes from the Niklas Wirth school is that functions are called procedures, whether they return something or not.
proc sqr(x: float): float = x*x
echo(sqr(10)) # --> 1.0000000000000000e+02
You should not assume that
float means 32-bits; the manual says "the compiler chooses the processor's fastest floating point type" and this usually is
float64; there is also
float32 if you wish to be explicit, just as in Go. (The usual conversions between integers and floats are allowed, since they are widening.) In a similar way,
int always has the size of a pointer on the system (which is not true for C), and there is
XX is 8,16,32 or 64.
Also as with Pascal, arguments may be passed by reference:
# var.nim proc modifies (i: var int) = i += 1
var i = 10 for k in 1..3: modifies(i) echo(i) # ---> 11 12 13
This is a procedure that returns nothing. Every language draws a line in the sand somewhere and says "I don't think you should do that, Dave". One of Nimrod's rules is that you cannot just discard the results of a function that returns a value, unless you use the keyword
discard before it like
discard fun(), rather as we say
(void)fun(); in C.
There is fairly standard exception handling. A cool novelty is that
except can be used as standalone statements:
proc throws(msg: string) = raise newException(E_base, msg)
proc blows() = finally: echo "got it!" echo "pre" throws("blew up!") echo "post"
blows() # ---> pre got it! Traceback (most recent call last) finally.nim(10) finally finally.nim(7) blows finally.nim(2) throws Error: unhandled exception: blew up! [E_Base]
This is very similar in effect to Go's
defer mechanism, and allows for deterministic cleanup.
Tuples are Structs
It's often better to take the Python strategy and return multiple results using a tuple
# tuple.nim type TPerson = tuple [name: string, age:int]
proc result(): TPerson = ("Bob",42)
var r = result() echo(r) # default stringification echo (r.name) # access by field name var (name,age) = r # tuple unpacking echo (name,"|",age) # ---> (name: Bob, age: 42) Bob Bob|42
Different tuple-types are equivalent if they have the same fields and types in order ('structural equivalence'). Nimrod tuples are mutable, and you should think of them more as akin to C's struct.
Functions defined over a type have a most curious and interesting property. Contining with
tuple.nim we write a silly accessor function:
proc name_of(t: TPerson): string = t.name
echo(name_of(r)) echo(r.name_of()) # ---> Bob Bob
That last line is something to think about: we've got something like 'methods' by just using the function after the dot, as if it were a field; in fact you typically leave off the
() in this case and have something very much like a read-only property.
'List' was a Bad Name Anyway...
I mentioned that
[10,20] is a fixed-size array, which is the most efficient representation. Sequences can be extended, like C++'s
vector or Python's
# Using seq constructor and append elements var ss = @["one","two"] ss.add("three")
# using newSeq, allocate up front var strings : seq[string] newSeq(strings, 3) strings = "The fourth" strings = "assignment" strings = "would crash" #strings = "out of bounds"
Using sequences of strings and the
parseopt module, here is a simple implementation of the BSD
head utility. The release executable is 58Kb, which is an order of magnitude smaller than the equivalent Go stripped executable. It's only 54 lines, but a little big to be an inline example. The
case statement is very Pascal-like:
case kind of cmdArgument: files.add(key) of cmdLongOption, cmdShortOption: case key of "help", "h": showUsage(nil) of "n": n = parseInt(val) of "version", "v": echo("1.0") return
parseopt isn't fully GNU compatible: in particular, you have to say
./head -n=3 head.nim rather than
-n 3. The code style is a bit low-level for my taste; compare with lapp;
a well-behaved command-line tool must always provide its usage, so why not reuse that text to describe flags, arguments and their types? This style works particularly well with dynamic languages, but it can be done with Nimrod.
# head.nim import lapp
let args = parse""" head [flags] filename -n: (default 10) number of lines -v,--version: version <files> (default stdin...) """
let n = args["n"].asInt files = args["files"].asSeq
proc head(f: TFile, n: int) = var i = 0 for line in f.lines: echo(line) i += 1 if i == n: break
if len(files) == 1: head(files.asFile,n) else: for f in files: echo("----- ",f.fileName) head(f.asFile,n)
Associative arrays are the key here, plus a variant value type.
lapp ensures that numerical flags are correctly converted, files are opened (and afterwards closes them on a exit hook set with
addQuitProc). There are some conventions to be followed:
- flags may have a short alias; the long name is always used to access the value
- flags are
boolvalues that default to
- parameters are enclosed in
stringvalues with no default
- you can specify the type explicitly:
, or set the default and have the type infered from that:stdin
andstdout` have their usual meanings
One of the really cool things about type inference is that so many of the implementation details are hidden from users of a library. This is obviously good for the user, who has less to remember, but also for the library implementer, who has freedom to change the internal details of the implementation. It leads to a style which looks and feels like dynamic code, but is strictly typed with meaningful compile-time errors.
Here the type of
args is irrelevant; it is an associative array between flag/argument names and some unspecified value type, which has known fields. (In fact, this version of
lapp only exports
parse, the fields, and a suitable definition of
 from 'tables')
'class' is not a transferable idea
People tend to reason from simularity, so the naive nature watcher constructs a false homomorphism between sharks and orcas. I fell into this trap, assuming 'inheritance' means 'polymorphism using virtual method tables'. Nimrod's optimization attitude is showing here: "Nimrod does not produce a virtual table, but generates dispatch trees. This avoids the expensive indirect branch for method calls and enables inlining". That's right, procedures are always statically dispatched. If you want methods, you need a different construct, multi-methods:
# class.nim type TPerson = object of TObject name: string age: int
proc setPerson(p: ref TPerson, name: string, age:int) = p.name = name p.age = age
proc newPerson(name: string, age:int): ref TPerson = new(result) result.setPerson(name,age)
method greeting(p: ref TPerson):string = "Hello " & p.name & ", age " & $p.age
type TGerman = object of TPerson
proc newGerman(name: string, age:int): ref TGerman = new(result) result.setPerson(name,age)
method greeting(p: ref TGerman):string = "Hallo " & p.name & ", " & $p.age & " Jahre alt"
var bob = newPerson("Bob",32) var hans = newGerman("Hans",30)
proc sayit(p: ref TPerson) = echo p.greeting
sayit(bob) sayit(hans) # ---> Hello Bob, age 32 Hallo Hans, 30 Jahre alt
Here we are making objects which are references (by default they are value types, like tuples, unlike java), initialized with the standard procedure
new. Note the Pascal-like special variable
result in procedures!
As expected, you may pass Germans to
sayit, because a German is a person, but
greeting has to be declared as a method for this to work; if it were a
proc, we would get a warning about the second
greeting being unused, and Germans are then addressed in English.
The cool thing about multi-methods is that they restore symmetry; a traditional polymorphic call
a.foo(b) is only polymorphic in
a. This makes sense in a language where dot method notation is just sugar for procedure calls where the first argument matches the type.
Consider this, where no type is given for the argument of
proc sqr (x): auto = x*x
echo sqr(10) echo sqr(1.2) # --> 100 1.4399999999999999e+00
sqris implicitly generic, and is constructed twice, first for
intand then for
float. Comparing a similar thing in Boo reveals a key difference:
def sqr (x): return x*x
Here the type of
duck, where Boo switches to late binding.
Both archieve the same result;
sqr can be passed anything that knows how to multiply with itself, but Nimrod wants to generate the best possible code, at the cost of more code generation. The more general way of declaring generic functions goes like:
proc sqr[T] (x: T): T = x*x
Another example of Nimrod being conservative about your memory needs would be declaring a very large array of strings. In languages where
string is a value type like C++ and Go, this would contain valid strings, but in Nimrod the entries are
nil until explicitly initialized. So string values can be
nil (like Java) which can be a fertile source of run-time errors, but the decision on how much heap to throw at the data structure is left to you, which is a very C-like design decision. Strings in Nimrod (however) are mutable and do copy by value.
Generics make it easy to write operations over containers. Here is
map with an anonymous procedure:
var a = [1, 2, 3, 4] b = map(a, proc(x: int): int = 2*x) for x in b: echo x # ---> 1 4 6 8
Anonymous procedures are a little verbose (as they are in Go), but there is a trick. We use a template which is a higher-order generic that rewrites expressions, much like a preprocessor macro in C/C++:
template F(T: typedesc, f:expr):expr = (proc(x:T):T = f)
b = map(a, F(int, 2*x))
Nimrod achieves the power of the C preprocessor in an elegant fashion, integrated into the language itself. The
when statement works with compile-time constants and only generates code for the correct condition, much like a
when sizeof(int) == 2: echo("running on a 16 bit system!") elif sizeof(int) == 4: echo("running on a 32 bit system!") elif sizeof(int) == 8: echo("running on a 64 bit system!") else: echo("cannot happen!")
if, it can be used in an expression context:
const dirsep = when hostOS == "windows": '\\' else: '/'
A clever use is to conditionally add testing code to a module when it's compiled and run as a program. These tests can be as detailed as you like, because they will not bloat the compiled library.
# sqr.nim proc sqr *[T](x: T): T = x*x
when isMainModule: # predefined constant assert(sqr(10) == 100)
As you might expect by now, Nimrod does not provide run-time reflection like Java or Go because it would burden code that does not need it - again, this is C++'s "Don't Pay for what you Don't use". But there is compile-time reflection, implemented by the
typeinfo module, which acts as a static equivalent of Go's
There's no doubt that finding errors as early as possible using a compiler (or some other static code analysis tool) is better than finding them later as run-time errors. In dynamic languages we are always at the mercy of a spelling mistake. But static compilation has a cost in time (build times do matter) and in complexity.
Having done about a thousand lines of working Nimrod code, I feel I can express an opinion on the language. Most code is straightforward and free of explicit type annotations, and the compiler quickly gives good error messages. Run-time errors come with a useful stack trace, and mostly come from nil references. It's commonly thought that nillable values are a big mistake (C.A.R Hoare once called it his "billion dollar mistake") but a
nil string value is much better at trashing a program. And this is good - fail hard and early!
However, you do need to understand some things to interpret the error messages correctly:
let a = [1,2,3] echo a # ERROR nimex/errors.nim(2, 6) Error: type mismatch: got (Array constructor[0..2, int]) but expected one of: system.$(x: TEnum): string system.$(x: int64): string system.$(x: string): string system.$(x: uint64): string system.$(x: T): string system.$(x: int): string system.$(x: char): string system.$(x: T): string system.$(x: bool): string system.$(x: cstring): string system.$(x: float): string
You have to know that
echo uses the 'stringify' operator
$ on its arguments - then we can interpret this error as being "I don't know how to make a string from an array". The compiler then helpfully presents all the overloaded versions of
$ active in this program. Of course, this is scary to people from a dynamic background who were beguiled by Nimrod's surface 'Python-like' syntax. Coming from a C++ background, I'm prepared for this way of doing things, and know that the solution looks like this (quote operators in backticks to define them):
proc `$`[T](a: openarray[T]): string = var s = "" s.add('[') for e in a: s.add($e) s.add(' ') s.add(']') return s
(Mutable strings take some getting used to). This solution will work for any arrays or sequences with elements that understand
$, and is very efficient, because Nimrod iterators over sequences are zero-overhead - effectively plain loops over elements.
There is a non-trivial learning curve; a motivated polyglot can learn enough in a week to be productive, but polyglots aren't so common. A new language comes with a shortage of tutorial material/books and a small community. This means that Google is not your friend, and last I checked there were two questions on Stackoverflow, one of which concerned a Brainfuck interpreter. There does however seem to be some action on Github.
A language thrives (like any life form) when it finds a niche in which it is competitive. For Lua, that has been providing a lightweight, powerful yet accessible embeddable scripting language. It has been adopted by many game developers as a way of not writing everything in C++, which is productive in two important ways: small changes to game logic do not need expensive rebuilds and don't require restarting the game; plus lesser mortals can contribute. Professional game programmers tend not to do things simply because they are cool, and so there is a market for Lua skills.
Nimrod is a good fit where C and C++ are often used. We've seen that 'userland' utilities like
head can be efficiently implemented in Nimrod, and the resulting executables are typically less than a 100kb and usually have no external dependencies. This makes it a good fit for CGI since they will load as fast as C. With Go, people found statically-linked executables a good way to manage the problem of managing dependencies on deployed machines. Nimrod provides this without Go's approach of reimplementing the whole C runtime.
But the server niche requires well-tested frameworks and libraries, which can only happen with wider adoption. Thus there is a vicious circle that any new language must face; use comes from maturity, and maturity comes from use.
It's well suited to data processing and numerical tasks; operator overloading makes standard mathematical notation possible, and generics make algorithms efficient. Here again having some choice of existing libraries would make all the difference. However, it is relatively easy to bind to C libraries (since the compiler output is C) and there is a
c2nim tool for processing C headers.
A particularly cool application is for embedded systems. Here the realities are merciless; embedded processors are everywhere and need to be cheap, and you can't get much memory for pennies. As a consequence, C dominates this field, and it's nasty. I can honestly say that this is my least favourite kind of programming; the preprocessor hacks alone would make Dijkstra lie down and cry. Here Andreas describes how Nimrod can be compiled with a stripped-down library with no OS support, and compiled on a 16bit AVR processor. Nimrod is probably the only new language which has the minimal attitude and metaprogramming capability to be an effective contender in this space, which is traditionally the last bastion of C.
Garbage collection is something that's often used to separate system and application languages. It's hard to add it to an existing language, and hard to remove it from a language, since it is so damn convenient.
A kernel has to manage every byte so that the userland can afford to waste memory; game programmers hate compulsory 'stop the world' GC which tends to happen when you're doing something more important. And embedded controllers often don't even have
malloc. See how Nimrod's Garbage Collector works; it is low-overhead, uses reference counting and can be switched off temporarily (unlike with the Dalvik VM on Android)
In summary, Nimrod is a very rich and powerful statically-typed language which relentlessly uses compile-time metaprogramming to achieve its goals of delivering compact and efficient programs. Whether it finds its niche is an open question, but it deserves to be given a chance, and is well worth exploring further.