Friday 9 December 2011

Ten Programming Languages

This is a personal list of programming languages that influenced me. It isn't a representative survey, nor is it complete (being something of a programming language slut I've tried everything, once). There are several common threads in this discussion. One is that I've always enjoyed interactive prompts; there is nothing better for exploring a language or a new library. (And any language can be made interactive.) The other is that I'm allergic to long build times, which probably indicates borderline ADD or at least a lack of rigour.

Another theme is over-specialization, like FORTRAN, or paradigm-driven design, like Java. Such languages find it hard to move with the times and can end up in evolutionary cul-de-sacs.

One language-nut thing I never did was design a new language. No doubt very entertaining, but it feels like adding a new brick to the Tower of Babel. The hardest part of writing a program is to make it clear to other humans, since computers aren't fussy once you feed them correct syntax. Every good programmer needs to be a polyglot and pick the right language for the job; there is no one programming language that stretches comfortably across all domains.

FORTRAN

This is the grand daddy, and I can even give you a pop culture reference.

This was the language we were taught for first year CS at Wits University, Johannesburg. In fact, it was its last year before Pascal became the teaching language. but FORTRAN remained in my life as a scientific programmer for many years. It's widely considered a relic of the computing Stone Age, a dinosaur, although dinosaurs were sophisticated animals that dominated the world for many millions of years, and are still represented in their modern form as birds. So the clunky FORTRAN IV of my youth became Fortran 90. The old reptile is still used for high-performance computing.

This is the style of code which got people onto the Moon. The fixed format comes straight from the limitations of punch cards, with code having to start in column seven and end at column 72.

 F = FLOW
 IF (F .GE. FHIGH) GOTO 2
 C = 5.0 * (F - 32) / 9.0
 WRITE(6,"2F8.3"),F,C
 F = F + FSTEP
 GOTO 1
 CONTINUE

It has evolved into a modern-looking language (programmers have since learnt not to shout).

 real f,low,high
 low = 0
 high = 100
 step = 10
 f = low
 do while ( f <= high )
    c = 5.0 * (f - 32) / 9.0
    write(*,fmt="(2F8.3)") f,c
    f = f + step
 end do

Lots of old dinosaur code is still found in the wild, however! Trying to discover what it does in order to reimplement it is a challenging exercise, which is why keeping algorithms only in code is a bad idea, particularly if the code is written in Olde Fortran by scientists and engineers.

It remains popular, because many scientists still learn it at their mother's knee and it's a simpler language than C (despite the best efforts of the Fortran 90 committee.) This simplicity, plus built-in array operations, makes it easier for the compiler to optimize common numerical operations. And there's always the inertia of the numerical community and the availability of so many libraries in Fortran.

Pascal

Judy Bishop's second-year course in Pascal helped me to recover from Fortran and taught me structured programming.

Nicklaus Wirth's book was probably my first intellectual experience in the field. Pascal was always intended to be a teaching language, and Modula was supposed to be the version for grown-ups. Classic Pascal was a (deliberately) restricted language, and Brian Kerighan famously wrote Why Pascal is Not My Favourite Programming Language. But then you must remember that compared to FORTRAN IV it was like awakening from the primal ooze.

I remember reading somewhere (probably from the LOGO experiments) that when kids are exposed to a programming environment, they do not spontaneously structure their programs. They readily grasp the idea of giving an action a name, and then repeating it, but do not refactor complicated actions into a series of simpler actions. Such things have to be taught, and this was the single biggest thing I learnt about programming at university. After all, students have to be taught to write, since organizing language in the form needed to communicate effectively does not emerge from their native spoken language skills.

Subsequent paradigm shifts in programming education have happened, like object-orientation and the move to Java, but I can't help feeling that basic structuring skills are neglected when students' minds are immediately poured into classes. OOP downplays functions, but they are important building blocks and people need to know how to work with them. Perhaps the mental development of programming students should follow historical development more closely?

In the late Eighties, Borland sprung on the scene with Turbo Pascal, which was a marvelous, fast and practical implementation. This was my workhorse until I discovered C, but later I used Delphi for Windows GUI development and still consider it to be one of the best options available.

One of Borland Pascal's distinctive features was a very smart linker. The equivalent of object files had all the necessary type information, so that compiling and linking programs was fast even on modest hardware.

LISP

The only interactive language on the Wits mainframe was LISP, which started a lifelong love of conversational programming languages and introduced to me the notion that programs could manipulate symbols. (And by extension, manipulate programs.)

I found an old copy of John McCarthy's LISP 1.5 manual in the library, and my head was seriously opened. Recursion is a beautiful thing, and I first saw it in LISP.

Unlike others, I didn't pick up LISP later, but it planted seeds that flowered when I used Lua much later.

C

My first C experience resulted in a crashed machine, which happens when you are spoilt by Pascal's run-time array bounds checking. C is the ultimate adult programming language, and does not get in your way if you insist on shooting yourself in the foot. Again, Turbo C was the entry drug and I've been hooked ever since; I could replace my mixture of Pascal and x86 Assembly with one language.

C is not a particularly good language for GUI development. I've used it for both Windows and GTK+ development, and it gets tedious (and remains unforgiving.)

A characteristic feature of C is that the language itself is very small; everything is done with libraries. With earlier languages, doing I/O was built in; Fortran's write and Pascals writeln are part of the language and are statements; printf is a library function. So C is the language of choice for the embedded world, where programmers still worry about getting a program into 4K.

The joy of C is knowing pretty much exactly what each bit of code does, without worrying about hidden magic or operator overloading.

C++

As a meta-observation, none of the languages in this list are considered particularly cool or state-of-the-art, except LISP which has continuously evolved since 1962. This is because they are pragmatic tools for getting things done. In this respect, C++ gets a lot of respect, because it raises C to a higher level, while keeping the abstraction penality as low as possible. Learning the language was challenging (using Stanley Lippman's book) but very rewarding. Building GUIs is a good fit to object-oriented design and I went through the usual rite of passage, which was creating a minimal Windows API binding.

A fair amount of my work was maintaining C++ MFC systems. Microsoft's MFC framework is widely considered one of the worst class libraries in the known universe, and for many years the word 'Visual' in 'Visual Studio' was a cruel joke. One of its many sins was encouraging storing data as persistent object binary archives; this breaks any attempt at easy continuous evolution of a program's structure by tying it directly to the structure of the persistent store.

However, bad programs and frameworks can happen to good languages, and I developed an increasingly obsessive relationship with C++. The core idea of the obsession was that people learn best by conversational interaction with a programming language, and led to the development of UnderC and a book, C++ by Example organized around this principle. This gave me the interactive prompt that I had been missing, and that I hypothesized that students were missing.

The other idea is that a good deal of the awkwardness of big C++ development comes from an antiquated build model inherited from C. C++ compilers aren't actually slow, but they have to make a heroic effort to cope with enormous header files. With an interpreter you can keep a program alive, and only recompile the bits that need to change, reducing the need for frequent rebuild-the-world steps and the necessity to get back to the program state you need to test. Furthermore, with an interpreter a language like C++ can serve many of the roles occupied by 'scripting languages', so that there is no need for Osterhout's Dichtomy. (There are big assumptions running around wild in this paragraph, and they should be subject of a more specialized article.)

Getting the interpreter to parse the full STL was my Waterloo, and I subsequently developed a dislike for the language and its sprawling complexity. In truth it was too big a job for one primary developer, and some UnderC design deficiencies proved too great an obstacle to progress. In particular, I was not storing an internal AST representation of templates, but rather keeping their bodies as text. Blame that on my youthful decision to do physics rather than CS, I suppose.

UnderC remains bound to x86 by some assembler shenanigans, and a logical next step would be to use libffi to get more platform independence, at least for linking to C. The grand vision that it could consume any C++ shared library was briefly viable, but the continuous changes in the C++ ABI and name-mangling schemes makes that an endless Red Queen race.

I became increasingly out of step with the direction that modern C++ was taking, with growing use of templates leading to a language that was increasingly slow to compile, and often generating some of the most obscure error messages ever presented to unfortunate users.

C++ remains the hardcore choice for many applications (although I suspect that often it's premature optimization) and seems to have a long life ahead of it. I may even forgive it enough to give it another chance one day.

C#

C# is an interesting language, because it is the direct result of a commercial dispute with religious undertones. Microsoft had brought over Anders Heljsberg (the chief architect of Delphi) from Borland, to work on their Visual J++ product. This dialect implemented delegates, which are method pointers similar to Delphi's events. This was not only heretical, but against the Java licensing conditions, and for this and more egregious sins Sun won a lawsuit against Microsoft.

By this time, Microsoft had the Java bug, but to keep moving in the direction they wanted to go they needed to fork the language, and C# was born.

C# felt like a natural fit to GUI programming for me since I knew Delphi well, and anybody who has used both the Delphi VCL and Windows.System.Forms will know the family resemblance.

Unlike its estranged cousin Java, C# has kept evolving and has never been particularly afraid of breaking binary compatibility. Its model of genericity is efficient for native types (unlike collections in Java which can only contain boxed primitives), it has type inference and anonymous functions.

A consequence of this constant evolution is that .NET people are not seeking 'a better C#' in the same way that programmers in the JVM ecosystem are.

The flip side is that the .NET world has always been hooked into Microsoft's own ADD and need to sell new tools, with old acronyms dying like flies and the constant need to chase new novelties.

All in all, a great girl. But a pity about the family, as they say.

Boo

Boo isn't a well-known language, but has some interesting features. Syntactically rather like Python, except it is a statically-typed compiled .NET language with aggressive use of type inference. It also has support for compile-time metaprogramming and so can do 'smart macros'.

I did a Scintilla-based editor in Boo, and learnt its strengths and weaknesses well (Writing editors is a common hobby among programmers, since we have such a personal relationship with our tools.)

Ultimately the disillusioning realization was that the extra expressiveness of the language did not particuarly bring much advantage over using C# directly. To do Sciboo in C# would have required maybe five times as much code, but that extra code would have compiled a good deal faster than the equivalent Boo code and would have had come with much better tool support.

Writing a program in a new programming language is taking a bet on an outsider, and Sciboo would probably have caught more traction if it had been done in C#.

Still, Rodrigo "Bamboo" de Oliveira's Boo remains the second-greatest programming language to come out of Brazil.

Java

The Java moment for programmers is when they feel the advantages of a simplified C++ with an extensive library and GUI toolkit, without the need to worry about memory management and having rich run-time type information available. I had already had this epiphany with C#, but increasingly appreciated the cross-platform JVM and the excellent tooling provided by Eclipse. Ultimately, it's the ecosystem that matters, the language, libraries and tools.

Java is not a fashionable language, for various reasons. It was deliberately designed as a restricted language built around object-oriented principles, a purified C++, much as Pascal was a simplification of earlier big languages like Algol 68. A programming language built around a particular programming paradigm is like an overspecialized species in a particular ecological niche. The big dinosaurs flourished in a different climate, hotter, wetter and with higher oxygen levels. The little rats running around their feet adapted to the changed conditions and eventually conquered the world. Together with its dogmatic design (the fundamental unit was always the class) came an attitude that the designers knew best, and that is fundamentally off-putting to hackers. Modern (or at least more recently mainstream) ideas like functional programming are inherently hard to implement on such a platform.

Some of Scala's woes come from the fact that a simple little anonymous function has to be implemented as a separate class, and classes don't come cheap on the JVM; even the tiniest classes take up about half a kilobyte, mostly metadata. (The slowness of scalac means that I personally would be unlikely to enjoy Scala, despite its attractive features, as I concluded with Boo.)

If one works within the limitations, exploits its dynamic features, and ignores the endless moralizing and pontification, Java can still be a useful and efficient solution to many problems. 'Expressiveness' is not a virtue that can be used in isolation.

Lua

Lua and LISP are the only dynamic languages in this list. Of the dynamic languages, Lua has the smallest implementation and the fastest performance. Lua was designed to be easily embeddable in a host C/C++ program, and often used in big commercial games. The language implementation is about 160K, and is about two times faster than Python. LuaJIT is a heavily optimized JIT implementation that can often give compiled C a good run, which is worth bringing up when people repeat the old assertion that statically typed languages have an inherent performance advantage.

I'm not a natural candidate for learning a 'gaming' scripting language (since computers are too interesting without needing games) but I learnt Lua scripting for the SciTE editor, and naturally moved onto the desktop.

In many ways, Lua is the C of dynamic languages: lean, fast and without batteries. The team at PUC-Rio do the core language, and the community provides the batteries, much as Linus Torvalds does the Linux kernel and GNU, etc provide the userland. The analogy is not exact, since core Lua provides math, io and a very capable string library. Lua is based on the abstract platform defined by C89, and so if you need things like directory operations or sockets you need extensions outside the core. Lua Distributions are available that provide similar experience to the 'big' scripting languages, but they are not 'canonical' or 'blessed'.

One reason that Lua is used as an extension language is that the syntax is conventional and unthreatening. I often come across engineers or scientists who just cannot read curly-brace C-like languages, and they naturally start counting at one.

A common criticism from people used to Python or Ruby is that there is no actual concept of 'class' in the language. Lua provides more general metaprogramming mechanisms that makes it easy to implement a 'class' system in a few lines of code. There are in fact a number of distinct (and imcompatible) ways to do OOP, which does lead to interoperability issues.

But trying to shoehorn everything into an existing concept like 'classes' is a sign of not 'getting' a language. In Lua functions are first-class citizens and are proper closures, making a more functional style possible. Lua's single data structure is the table, which is an associative array; like JavaScript, m["key"]==m.key so that you can use tables like structs.

LuaJIT is a concrete proof that serious performance and dynamic typing are not orthogonal. However, the main challenge for dynamic languages is what used to be called 'programming-in-the-large'; how to compose large systems from subsystems and reason about them. It's easy to produce mad unmaintainable messes in dynamic languages.

It's obvious that dynamic function signatures contain less information than their static equivalent: sqr(x) is less self-documenting than sqr(double x). So for dynamic languages, there is need to document the actual types so that users of a module can use it effectively. The compiler never reads comments, of course, and no compiler in the world can make useful comments mandatory. It's a matter of discipline and having structured ways to express types as documentation.

Having explicit types also allows tools like IDEs to provide intelligent browsing and code completion. A surprising amount of information can be extracted by static analyis (for instance, David Manura's LuaInspect) but you still need in addition a machine-readable way to provide explicit type annotations.

A very interesting generalization of Lua is Fabien Fleutot's Metalua, which is an implementation which allows language extensions to be written in Metalua. For instance, this is a Metalua module which allows the following code:

 @lua
 --- normalized length of foo string.
 function len(s :: string) :: number
 ...
 end

This inserts type assertions into the code (although this can be disabled). This is reminiscent of the direction taken by Google Dart.

My own modest approach with LDoc involves extended JavaDoc-style tags:

 @lua
 --- normalized length of foo string.
 -- @tparam string s the foo string
 -- @treturn number length
 function len(s)
 ...
 end

Go

Go is modern, bold attempt to build a better C. It is garbage-collected, but compiles to native standalone executables, and has built-in strings, maps and CSP-style concurrency organized around 'goroutines' and channels. Go functions may return multiple values, like Lua. Type inference means that code outside function and struct declarations rarely needs explicit types.

Such boldness is unlikely to be universally popular, of course. For diehard C fans, 'a better C' is a nonsensical idea. For Java/C++ fans, the lack of classic OOP is a problem. The advocates of Go consider this lack to be a strength; polymorphism is supported by interfaces (as in Java) but there's no inheritance in the usual sense; you can 'borrow' implementation from a type using embedding. Inheritance has its own issues, after all, such as the Fragile Base Class problem, where a class is at the mercy of its subclasses (or vice versa, depending on the interpretation.)

Likewise, classic exception handling is replaced by handle-errors-at-source and panicking (with optional restore) when that is not feasible.

It's natural to miss the comforts of home, but learning a new language often involves unlearning the mental habits of an earlier language.

Go feels like a dynamic language, not only in time getting from Save to Run, but by the lack of explicit typing. Interfaces work as a user of a dynamic language would expect: if a function expects an interface that only contains a Read() []string method, then any object that provides such a method is acceptable. It is rather like compile-time duck-typing, and eases the interdependency problem that plagues languages like Java. But these things are all vigorously checked (Go does not do implicit typecasts); typos tend not to blow up prograns at run-time.

The language fits easily in the head. Goroutines+channels are easier and less problematic than traditional threading models. Go is clearly designed to do serious work on the server, and the standard library provides good support for networking and protocols, so that's where most of the adoption will take place.

A criticism from the C++ perspective is that too much functionality is built-in, and the language lacks the abstraction mechanism to further extend itself. For instance, Go's maps cannot be defined using Go itself. But making maps a primitve type remains a good pragmatic decision for Go, because std::map and friends in C++ impacts on end users in terms of the large amount of template headers needed to be parsed and the resulting unhelpful errors if anything does wrong.

I've personally used it for processing binary data files where I would otherwise use C, and it feels productive in this role. I miss the standard C promotion rules, so that an int will not be promoted to a float64 in the right context - this means a lot of explicit typecasting. Also, 'variable not used' errors are irritating when prototyping. But mostly I'm enjoying myself, and learning a new language. Which ultimately is what this article is all about.

1 comment: