Saturday, 9 April 2016

stdio or iostreams? - A Modest Alternative

An Awkward Choice

The standard C++ iostreams library is generally easier to use for output than stdio, if you don't particularly care about exact formatting output and have no resource constraints. No one can say that stdio is pretty.. Portable use of printf requires the use of ugly PRIu64 style macros from and noisy calls to c_str(). It's fundamentally weakly-typed and can only work with primitive types.

However, using iostreams gets ugly fast, with a host of manipulators and formatting specifiers, all tied together with <<. There are also some serious misfeatures - beginners learn to obsessively spell 'end of line' as std::endl but never learn that it also flushes the stream, until their program proves to be an utter dog at I/O. Generally standard C++ streams are slower and allocate far too much to be considered seriously for embedded work. In the immortal words of the readline(3) man page, under the Bugs section "It's too big and too slow".

Embedded systems which can benefit from the abstraction capabilities of C++ don't necessarily have space for the large sprawling monster that the standard iostreams has become. In this realm, printf (or a stripped-down variant) still rules.

It is true that we no longer depend on simple text i/o as much as the old-timers did, since many structured text output standards have emerged. We still lean on stdio/iostreams to generate strings, since string manipulation is still relatively poor compared to other languages. With some classes of problems, debug writes are heavily used. And all non-trivial applications need logging support.

The library I'm proposing - outstreams - is for people who want or need an alternative. It presents another style of interface, where overloading operator() leads to a fluent and efficient style for organizing output. It is still possible to use printf flags when more exact formatting is required. Building it on top of stdio gives us a solid well-tested base.

It is not, I must emphasize, a criticism of the standard C++ library. That would be the technical equivalent of complaining about the weather.[0]

Chaining with operator() for Fun and Profit

By default, the standard outstreams for stdout and stderr define space as the field separator which saves a lot of typing when making readable output with little fuss.

double x = 1.2;
string msg = "hello";
int i = 42;

outs (x) (msg) (i) ('\n');
// --> 1.2 hello 42
outs("my name is")(msg)('\n');
// --> my name is hello

Chaining calls like this has advantages - most code editors are aware of paired parentheses, and some actually fill in the closing bracket when you type '('. It's easy to add an extra argument to override the default formatting, in a much more constant-friendly way than with printf:

const char *myfmt = "%6.2f";
double y = 0;
outs(x,myfmt)(y,myfmt)
    (msg,quote_s)('\n');
// -->   1.20  0.00 'hello'

Another useful property of operator() is that standard algorithms understand callables:

vector<int> vi {10,20,30};
for_each(vi.begin(), vi.end(), outs);
outs('\n');
// -> 10 20 30

Containers

The for_each idiom is cute but outstreams provides something more robust where you can provide a custom format and/or a custom separator:

// signatures
//   Writer& operator() (It start, It finis,
//      const char *fmt=nullptr, char sepr=' ');
//  Writer& operator() (It start, It finis, 
//     char sepr);

outs(vi.begin(),vi.end(),',')('\n');
// --> 10,20,30

string s = "\xFE\xEE\xAA";
outs(s.begin(),s.end(),hex_u)('\n');
// --> FEEEAA

In the C++11 standard there is a marvelous class called std::intializer_list which is implicitly used in bracket initialization of containers. We overload it directly to support brace lists of objects of the same type; there is also an convenient overload for std::pair:

// signature: 
// Writer& operator() (
//    const std::initializer_list<T>& arr,
//    const char *fmt=nullptr, char sepr=' ');
outs({10,20,30})('\n');
// --> 10 20 30

// little burst of JSON
outs('{')({
    make_pair("hello",42),
    make_pair("dolly",99),
    make_pair("frodo",111)
},quote_d,',')('}')();
// --> { "hello":42,"dolly":99,"frodo":111 }

// which will also work when iterating
// over `std::map`

This works because the quote format only applies to strings - anything else ignores it.

Writing to Files and Strings: Handling Errors

The Writer class can open and manage a file stream:

Writer("tmp.txt")(msg)('\n');
// -> tmp.txt: hello\n

C++ manages the closing of the file automatically, as with the iostreams equivalent.

Of course, there's no general guarantee that 'tmp.txt' can be opened for writing: outstreams borrow a trick from iostreams and convert to a bool:

Writer wr("tmp.txt");
if (wr) {
   wr(msg)('\n');
} else {
   errs(wr.error())('\n');
}

It's straightforward to build up strings using this style of output. (Mucking around with sprintf can be awkward and error-prone. Clue: always spell it snprintf) Here is the ever useful generalized string concatenation pattern:

StrWriter sw;
int arr[] {10,20,30};
sw(arr,arr+3,',');
string vis = sw.str();
// -> vis == "10,20,30"

The Problem with "Hello World"

It is perfectly possible to construct a generic 'print' function in modern C++. With variadic templates it can be done elegantly in a type-safe way. The definition is recursive; printing out n items is defined as printing the first value, then printing the last n-1 values; printing 1 value uses outs.

template <typename T>
Writer& print(T v) {
   return outs(v);
}

template <typename T, typename... Args>
Writer& print(T first, Args... args) {
   print(first);
   return print(args...);
}

...
int answer = 42;
string who = "world";

print("hello",who,"answer",answer,'\n');
// -> hello world answer 42

This is cute, but although the implementation shows the flexibility of modern C++, the result shows the limitations of print as a utility. Fine for a quick display of values, but what if you need an integer in hex, or a floating-point number to a particular precision, and so forth? Where print exists in languages - Python 3, Java and Lua have this style - there is this problem of formating. One approach is to define little helper functions and generally lean on string handling; for instance it's easy with StrWriter to define to_str:

template <typename T>
string to_str(T v, 
    const char *fmt=nullptr)
{
   StrWriter sw;
   sw(v,fmt);
   return sw.str();
}
...
print
    ("answer in hex is 0x" + to_str(answer,"X"))
    ('\n');
--> answer in hex is 0x2A

You see this kind of code quite frequently with Java, and it sucks for high performance logging because of the cost of creating and disposing all the temporary strings. Java has since acquired an actual printf equivalent (probably provoked by its competitive younger sister C#) and both Python and Lua programmers use some kind of format to make nice strings. Not to say that to_str isn't a useful function - it's more flexible than std::to_string - but it will have a cost that you might not always want to pay.

Another approach is to create little wrappers, like a Hex class and so forth. So you get code like this: print("answer in hex",Hex(answer))();. The namespace becomes cluttered with these classes, like how std is full of things like hex,dec and so forth. A compromise is to add just another function which wraps a value and a format. This isn't bad for performance. since the wrapper type just transfers references to the values; you can see for yourself in print.cpp.

The other approach is the one taken by iostreams - define some special values which control the formatting of the next value, and so forth[1]. It can be done, but it's messy and makes the concept rather less appealing. It's a nice example of the Hello World Fallacy [2]where the easy stuff is attractively easy and the hard stuff is unnecessarily hard. And I maintain that print and iostreams fall into exactly that space.

This implementation of print does have the nice properly that it's easy to overload for new types, which is not the case for outstreams.

Quick Output And Logging

Use of the preprocessor in modern C++ is considered generally a Bad Thing, for good reasons. Macros stomp on everything, without respect for scope, and C++ provides alternative mechanisms for nearly everything (inlining, constant definition, etc) that C relies on the preprocessor to provide. But it isn't an absolute evil, if macros are always written as UPPER_CASE and so clearly distinct from scoped variables.

Here's a case where developer convenience outweighs ideological purity; dumping out variables. Sometimes symbolic debuggers are too intrusive or simply not available. It's a nice example of old macro magic combined with new operator overloading.

#define VA(var) (#var)(var,quote_d)
#define VX(var) (#var)(var,hex_u)
...
string full_name;
uint64_t  id_number;
char ch = ' ';
...
outs VA(full_name) VA(id_number)
    VX(ch) ('\n');
// --> full_name "bonzo the dog" id_number 666 ch 20

Here is a trick which allows you to completely switch off tracing, with little overhead. (Many loggers will suppress output if the level is too low, but any expressions will still be evaluated.)

// if the FILE* is NULL, then a Writer 
// instance converts to false
// can say logs.set(nullptr) to
// switch off tracing
#define TRACE if (logs) logs ("TRACE")
...
TRACE VA(obj.expensive_method()) ('\n');

I mentioned that logging was something that all serious programs need. It's tempting to write your own logger, but it's tricky to get right and this wheel has been invented before.

We use log4cpp where I work, but only its mother would consider it to be elegant and well-documented:

plogger->log(log4cpp::Priority::DEBUG,
    "answer is %d",42);

It does have an iostreams-like alternative interface but it's a bit clumsy and half-baked. However, it is very configurable and handles all the details of creating log files, rolling them over, writing them to a remote syslog, and so forth.

It is easy to wrap this in a Writer-derived class. In fact, it's easier to derive from StrWriter and override put_eoln, which is used by the 'end of line' empty operator() overload. Normally it just uses write_char to put out '\n', but here we use it to actually construct the call to log4cpp:

using PriorityLevel
    = log4cpp::Priority::PriorityLevel;
class LogWriter: public StrWriter {
    PriorityLevel level;
public:
    LogWriter(PriorityLevel level)
        : StrWriter(' '),level(level) {}

    virtual void put_eoln() {
        plogger->log(level,
            "%s",str().c_str());
        clear();
    }
};

By just exposing references to error, warn etc to Writer, the rest of your program does not have any dependencies on the log4cpp headers - just in case you do want to drop in your own replacement that directly uses syslog. Look at testlog.cpp and how it uses logger.cpp to encapsulate the details of logging.

Thereafter, use as before:

error("hello")(42)('\n');
warn("this is a warning")('\n');
debug("won't appear with defaults)('\n');
// -->
// 2016-04-10 09:32:11,552 [ERROR] hello 42
// 2016-04-10 09:32:11,553 [WARN] this is a warning

Costs and Limitations

In a simple test (writing a million records to a file with five comma-separated fields) outstreams seems to be about 10% more expensive, as we can expect from needing more calls to stdio. (Equivalent test for iostreams shows it seriously lagging behind). The library itself is small, so if your system has vfprintf (or equivalent) then it's a easy dependency. If the macro OLD_STD_CPP is defined it compiles fine for C++-03, without support for initializer lists.

There is a fundamental problem with operator() here - it can only be defined as a method of a type, unlike operator<<. So adding your own types requires overriding a special function and using a template version of operator() with a additional first const char* argument to avoid the inevitable Template Overloading Blues.

As a more traditional alternative, if a type implements the Writeable interface, then things work cleanly. (In any case, how a class wants to write itself out is its own private business.) Writeable provides a handy to_string using the implementation of the overriden write_to method.

Some may find this notation too compressed - << is nice & spacy by comparison. It is certainly less noisy than stdio, since format strings are optional. Sticky separators can be annoying (controlling them properly was probably the most tricky bit of the implementation) but for most applications, they seem appropriate - they can always be turned off.

UPDATE

Some did find the notation too compressed - in particular () seems to vanish. So now ('\n') is completely equivalent to ().

The cppformat also looks like a good alternative. It resolves the specification problem with variadic print by allowing for a format, either traditional or Python-style.

Some commenters felt that operator-chaining was completely old-hat and that variadic templates were obviously superior, which is a matter of taste.

Internationalization represents a problem, since the word order may change in translation - Python-style does solve this issue.

[0] Although that doesn't mean we have to like all of it. And using it is not compulsory.

[1] But don't make them sticky like with iostreams - there are Stackoverflow questions about how to print out in hex and then how to stop printing out in hex.

[2] This fallacy is an independent rediscovery of the exact phrase from an earlier article, but the feeling on Reddit was that the first guy's website was ugly and hence inherently inferior.

2 comments:

  1. Interesting stuff. Just one correction: the cppformat library (https://github.com/cppformat/cppformat) doesn't build on IOStreams. It implements its own formatting for most types and falls back to printf for floating-point. This allows to achieve higher performance and smaller compile code size compared to IOStreams.

    ReplyDelete