Thursday, 28 August 2025

Two Kinds of Shells

Two Different Command-line Shells

The Unix shell

The first Unix shell was the Thompson Shell from 1973 which already looks very familiar (here's an example) The if command would jump to labels, like in Assembly language, so certainly not a great programming language at this point. But an excellent shell:

Indirection - sending the output of a command into a file:

$ prog > myfile.txt

Piping - sending output through filters:

$ prog | sort
$ cat big.txt | head -n 10

If a command has a return code of 0, then it is successful; commands can write to standard output, and standard error; by default redirection and piping works with standard output.

Note big.txt may be indeed ginormous in the last example, but only enough will be read to show the first 10 lines

Originally Unix was developed with electromechnical teletypes, which are noisy and slow. encouraging short names (like ls and cd etc). C was first written on teletypes with a line editor - vi only appears in the late 70s. So the terminal was first very physical, then running on a monochrome monitor over a serial link, and finally the multicoloured glory of modern terminal 'emulators' (Nice overview)

The Bourne shell first appears in 1979. By this time people needed scripting, and the new shell was much better for this. Bear in mind that C, with all its beauty and sharp edges, is not a very approachable language for once-offs and little utilities, as it is low-level with a very basic standard library.

for i in `seq 1 10`
do
  if test $i -gt 5
  then
     echo "larger $i"
  fi
done

The weirdness of fi and esac is because Stephen Bourne was an Algol 68 fan, although he decided against using od to terminate do loops (or maybe they had a 'Really, Stephen?' conversation at Bell Labs)

Nearly everything is done with external commands - seq, test and echo. All the shell is doing here is expanding the variable i. String interpolation also happens in double-quoted strings.

The Unix principle involves composition of specialized commands, each doing their one job very well.

To port the Unix shell necessarily means porting the 'little languages' (the domain-specific languages) that made the shell so powerful: grep, sed, awk and so on. (Perl consolidated these tools into an alledgely coherent whole, but this happens almost a decade later)

sh has been re-implemented many times (like the GUN bash) and for other operating systems. It is not entirely a good fit for Windows (although until PowerShell arrived there was nothing better) mostly because it is more expensive to shell out a command than in Unix; Windows prefers its native threading model (which arrives pretty late in Unix/Linux history). The Unix tradition of 'everything is a file' does not cover Windows functionality like the Registry, etc.

Typing in a Shell, Writing a Script

What makes a language both a good shell and good for scripting? Even if the language has a good interactive prompt, it is not usually convenient as a shell because there are too many key presses involved (particularly shift-key):

# Python >>> exec("prog", "-f", os.environ['HOME']").out("temp.txt")
# Shell  $ prog -f $HOME > temp.txt

Even with library support the extra parentheses and commas are going to slow the shell user down; there are more keystrokes, and many of them are punctuation.

sh is an excellent shell, but is it a good scripting language?

Brian Kernighan wrote a famous paper entitled "Why Pascal is not my favourite programming language" and it would not be difficult to write a companion piece for standard POSIX shell. In that paper, the main criticism is that the type system is too rigid; in the case of the Bourne shell there is only one type, text. A string might contain a number, and then you would have to compare it in a different way. Lists are done in an ad-hoc way with space-separated words. It is easy to mess things up, and even easier to be judged - any attempt to write a shell script will bring forth rock-dwelling critics.

When dealing than anything except one-liners, error handling is crucial. Bash has a scary default mode where it just keeps going whether errors happen or not. So you have to code very defensively, as in Go, always checking return codes and explicitly deciding what to do.

A lot of non-trivial shell is converting one ad-hoc text format into another. The classic text mangling tools are good, but they have a learning curve and in fact most of the skills needed to be competent at shell are outside the shell itself; it is mostly an 'empty shell'.

There has been a move for newer commands to optionally produce JSON output to be parsed in a standard way by other commands; there remans a nice presentation for human users, but machine users don't have to bother parsing this. The jc project aims to convert the output of popular commmand-line tools into JSON, and jq provides a powerful DSL for processing JSON.

Nushell

So an idea emerged in Microsoft early this century: what if data passed through shell pipelines, not as a particular serialized text format like JSON, but as raw .NET objects? The data could be operated on with the methods and properties of these objects, and at the end of the pipeline, the objects would be converted into a default presentation for human users. Microsoft Powershell was first released in 2006 and becoming part of Windows with vs 2.0 in 2009.

It was a hit, because frankly the situation with Windows admin was a mess. Grown adults reduced to clicking on buttons, or forced to work with some of the most clunky command-line tools known to humanity, accessed with a uniquely brain-dead command shell.

I'm not really a fan, since administering Windows is not where I like to be, and I still think it's a rrevolutionary idea held back by a second-class implementation. It is the slowest shell to start, easily 500ms on a decent machine, since all those .NET assemblies have to be pulled in at startup.

The idea of a cross-platform shell organized around the data-pipe principle remained powerful, and Nushell started happening in 2019.

All values in Nushell have a type; the main types are: - numbers (int and float are distinct) - strings - lists - records (corresponding to Javasript objects or Python dicts) - tables lists of records with the same keys

By default, you get a pretty view of tables (this is themeable, if you find the default a bit heavy) - can instead convert the data to YAML etc. In Nushell ls creates a table:

/work/dev/llib> ls
╭───┬─────────────────┬──────┬─────────┬──────────────╮
│ # │      name       │ type │  size   │   modified   │
├───┼─────────────────┼──────┼─────────┼──────────────┤
│ 0 │ LICENSE.txt     │ file │  1.4 kB │ 3 years ago  │
│ 1 │ build           │ file │    33 B │ 3 years ago  │
│ 2 │ build-mingw.bat │ file │    60 B │ 3 years ago  │
│ 3 │ examples        │ dir  │  4.0 kB │ 6 months ago │
│ 4 │ llib            │ dir  │  4.0 kB │ 3 months ago │
│ 5 │ llib-p          │ dir  │  4.0 kB │ 3 years ago  │
│ 6 │ readme.md       │ file │ 39.9 kB │ 3 years ago  │
│ 7 │ tests           │ dir  │  4.0 kB │ 3 months ago │
╰───┴─────────────────┴──────┴─────────┴──────────────╯
/work/dev/llib> # render the table in YAML
/work/dev/llib> ls | to yaml
- name: LICENSE.txt
  type: file
  size: 1486
  modified: 2022-02-04 16:49:34.466300249 +00:00
- name: build
  type: file
  size: 33
  modified: 2022-02-04 16:49:34.466300249 +00:00
- name: build-mingw.bat
  type: file
  size: 60
  modified: 2022-02-04 16:49:34.466300249 +00:00
- name: examples
  type: dir
  size: 4096
  modified: 2025-02-08 19:07:39.755974376 +00:00
- name: llib
  type: dir
  size: 4096
  modified: 2025-05-04 15:44:33.971085666 +00:00
- name: llib-p
  type: dir
  size: 4096
  modified: 2022-02-04 16:49:34.470300295 +00:00
- name: readme.md
  type: file
  size: 39915
  modified: 2022-02-04 16:49:34.470300295 +00:00
- name: tests
  type: dir
  size: 4096
  modified: 2025-05-08 17:10:16.706859509 +00:00

Piping the table into the describe command gives you the actual type of the data created by ls (the Powershell equivalent is Get-ChildItem | Get-Member -MemberType Property)

/work/dev/llib> ls | describe
table<name: string, type: string, size: filesize, modified: datetime> (stream)

get extracts a column as a list:

/work/dev/llib> ls | get name
╭───┬─────────────────╮
│ 0 │ LICENSE.txt     │
│ 1 │ build           │
│ 2 │ build-mingw.bat │
│ 3 │ examples        │
│ 4 │ llib            │
│ 5 │ llib-p          │
│ 6 │ readme.md       │
│ 7 │ tests           │
╰───┴─────────────────╯

help <cmd> gives help with examples and help commands gives the whole lot - 618 on my system! And these are builtins and plugins, not executables. You can of course call external commands, but this shell is very full-featured out of the box, which explains why its 22mb on my system. It has built-in sqlite, http is a built-in command, and with the polars plugin (part of the standard distribution) it can do dataframe manipulation, read Parquet files, etc.

/work/dev/llib> cat LICENSE.txt | lines | take 5
╭───┬──────────────────────────────────────────────────────────────────────╮
│ 0 │ -------------------------------------------------------------------- │
│ 1 │ Copyright (c) 2013 Steve Donovan                                     │
│ 2 │ All rights reserved.                                                 │
│ 3 │                                                                      │
│ 4 │ Redistribution and use in source and binary forms, with or without   │
╰───┴──────────────────────────────────────────────────────────────────────╯

The nushell language (called Nu) was developed using Rust by fans, so it looks very much like a scripting variant of Rust - this is the equivalent of the shell example earlier:

for i in 1..10 {
    if $i > 5 {
        print $"larger ($i)"
    }
}

Normal comparison operators are available, the range iterator is builtin. The strangest thing is the string interpolation syntax $"...."

Nushell is not available on any random machine you might ssh into, so using it as your shell requires justification. There is always an investment of time and energy needed.

First, it makes simple queries on data easy, and commands return data. There is an actual filesize type which can be written with the usual postfixes:

/work/dev/llib> ls | where size > 10kb
╭───┬───────────┬──────┬─────────┬─────────────╮
│ # │   name    │ type │  size   │  modified   │
├───┼───────────┼──────┼─────────┼─────────────┤
│ 0 │ readme.md │ file │ 39.9 kB │ 3 years ago │
╰───┴───────────┴──────┴─────────┴─────────────╯

There is a Unix find command for going over a directory tree, which I can never remember how to use. But this ls command can take a glob pattern meaning 'everything under this directory':

/work/dev/llib> ls **/* | where size > 50kb
╭───┬──────────────────────────────┬──────┬──────────┬──────────────╮
│ # │             name             │ type │   size   │   modified   │
├───┼──────────────────────────────┼──────┼──────────┼──────────────┤
│ 0 │ examples/example.db          │ file │  11.1 MB │ 7 months ago │
│ 1 │ examples/json.db             │ file │  11.3 MB │ 7 months ago │
│ 2 │ examples/pkgconfig/pkgconfig │ file │  51.7 kB │ 3 years ago  │
│ 3 │ examples/web/simple          │ file │  62.0 kB │ 3 years ago  │
│ 4 │ examples/web/use-select      │ file │  71.8 kB │ 3 years ago  │
│ 5 │ llib/libllib.a               │ file │ 327.7 kB │ 3 months ago │
│ 6 │ tests/test-json              │ file │  59.8 kB │ 5 months ago │
│ 7 │ tests/test-pool              │ file │  56.6 kB │ 5 months ago │
│ 8 │ tests/test-template          │ file │  64.0 kB │ 5 months ago │
╰───┴──────────────────────────────┴──────┴──────────┴──────────────╯

It is then easy to apply a command to each one of these files:

/work/dev/llib> ls **/* | where size > 50kb | get name |  each { path parse  }
╭───┬────────────────────┬───────────────┬───────────╮
│ # │       parent       │     stem      │ extension │
├───┼────────────────────┼───────────────┼───────────┤
│ 0 │ examples           │ example       │ db        │
│ 1 │ examples           │ json          │ db        │
│ 2 │ examples/pkgconfig │ pkgconfig     │           │
│ 3 │ examples/web       │ simple        │           │
│ 4 │ examples/web       │ use-select    │           │
│ 5 │ llib               │ libllib       │ a         │
│ 6 │ tests              │ test-json     │           │
│ 7 │ tests              │ test-pool     │           │
│ 8 │ tests              │ test-template │           │
╰───┴────────────────────┴───────────────┴───────────╯

Second, the pipeline model makes function application go from left to right; the usual f(g(h(x))) is right to left from the argument. It is easier to successively refine the result by applying extra operations: easier to read and to edit in an interactive shell.

Why should you consider using it for shell scripting? Apart from the straightforward syntax and sensible try..catch error handling, for me it's how elegant it is to write self-documenting custom commands:

# Greet guests along with a VIP
#
# Use for birthdays, graduation parties,
# retirements, and any other event which
# celebrates an event # for a particular
# person.
def vip-greet [
  vip: string        # The special guest
   ...names: string  # The other guests
] {
  for $name in $names {
    print $"Hello, ($name)!"
  }

  print $"And a special welcome to our VIP today, ($vip)!"
}

And help vip-greet will work as expected.

That's pretty classy.