An Awkward Choice
The standard C++ iostreams library is generally easier to use for output than stdio, if you don't particularly care about exact formatting output and have no resource constraints. No one can say that stdio
is pretty.. Portable use of printf
requires the use of ugly PRIu64
style macros from c_str()
. It's fundamentally weakly-typed and can only work with primitive types.
However, using iostreams gets ugly fast, with a host of manipulators and formatting specifiers, all tied together with <<
. There are also some serious misfeatures - beginners learn to obsessively spell 'end of line' as std::endl
but never learn that it also flushes the stream, until their program proves to be an utter dog at I/O. Generally standard C++ streams are slower and allocate far too much to be considered seriously for embedded work. In the immortal words of the readline(3) man page, under the Bugs section "It's too big and too slow".
Embedded systems which can benefit from the abstraction capabilities of C++ don't necessarily have space for the large sprawling monster that the standard iostreams has become. In this realm, printf
(or a stripped-down variant) still rules.
It is true that we no longer depend on simple text i/o as much as the old-timers did, since many structured text output standards have emerged. We still lean on stdio/iostreams to generate strings, since string manipulation is still relatively poor compared to other languages. With some classes of problems, debug writes are heavily used. And all non-trivial applications need logging support.
The library I'm proposing - outstreams - is for people who want or need an alternative. It presents another style of interface, where overloading operator()
leads to a fluent and efficient style for organizing output. It is still possible to use printf
flags when more exact formatting is required. Building it on top of stdio
gives us a solid well-tested base.
It is not, I must emphasize, a criticism of the standard C++ library. That would be the technical equivalent of complaining about the weather.[0]
Chaining with operator() for Fun and Profit
By default, the standard outstreams for stdout and stderr define space as the field separator which saves a lot of typing when making readable output with little fuss.
double x = 1.2; string msg = "hello"; int i = 42; outs (x) (msg) (i) ('\n'); // --> 1.2 hello 42 outs("my name is")(msg)('\n'); // --> my name is hello
Chaining calls like this has advantages - most code editors are aware of paired parentheses, and some actually fill in the closing bracket when you type '('. It's easy to add an extra argument to override the default formatting, in a much more constant-friendly way than with printf
:
const char *myfmt = "%6.2f"; double y = 0; outs(x,myfmt)(y,myfmt) (msg,quote_s)('\n'); // --> 1.20 0.00 'hello'
Another useful property of operator()
is that standard algorithms understand callables:
vector<int> vi {10,20,30}; for_each(vi.begin(), vi.end(), outs); outs('\n'); // -> 10 20 30
Containers
The for_each
idiom is cute but outstreams provides something more robust where you can provide a custom format and/or a custom separator:
// signatures // Writer& operator() (It start, It finis, // const char *fmt=nullptr, char sepr=' '); // Writer& operator() (It start, It finis, // char sepr); outs(vi.begin(),vi.end(),',')('\n'); // --> 10,20,30 string s = "\xFE\xEE\xAA"; outs(s.begin(),s.end(),hex_u)('\n'); // --> FEEEAA
In the C++11 standard there is a marvelous class called std::intializer_list
which is implicitly used in bracket initialization of containers. We overload it directly to support brace lists of objects of the same type; there is also an
convenient overload for std::pair
:
// signature: // Writer& operator() ( // const std::initializer_list<T>& arr, // const char *fmt=nullptr, char sepr=' '); outs({10,20,30})('\n'); // --> 10 20 30 // little burst of JSON outs('{')({ make_pair("hello",42), make_pair("dolly",99), make_pair("frodo",111) },quote_d,',')('}')(); // --> { "hello":42,"dolly":99,"frodo":111 } // which will also work when iterating // over `std::map`
This works because the quote format only applies to strings - anything else ignores it.
Writing to Files and Strings: Handling Errors
The Writer
class can open and manage a file stream:
Writer("tmp.txt")(msg)('\n');
// -> tmp.txt: hello\n
C++ manages the closing of the file automatically, as with the iostreams equivalent.
Of course, there's no general guarantee that 'tmp.txt' can be opened for writing: outstreams borrow a trick from iostreams and convert to a bool
:
Writer wr("tmp.txt"); if (wr) { wr(msg)('\n'); } else { errs(wr.error())('\n'); }
It's straightforward to build up strings using this style of output. (Mucking around with sprintf
can be awkward and error-prone. Clue: always spell it snprintf
) Here is the ever useful generalized string concatenation pattern:
StrWriter sw; int arr[] {10,20,30}; sw(arr,arr+3,','); string vis = sw.str(); // -> vis == "10,20,30"
The Problem with "Hello World"
It is perfectly possible to construct a generic 'print' function in modern C++. With variadic templates it can be done elegantly in a type-safe way. The definition is recursive; printing out n items is defined as printing the first value, then printing the last n-1 values; printing 1 value uses outs
.
template <typename T> Writer& print(T v) { return outs(v); } template <typename T, typename... Args> Writer& print(T first, Args... args) { print(first); return print(args...); } ... int answer = 42; string who = "world"; print("hello",who,"answer",answer,'\n'); // -> hello world answer 42
This is cute, but although the implementation shows the flexibility of modern C++, the result shows the limitations of print
as a utility. Fine for a quick display of values, but what if you need an integer in hex, or a floating-point number to a particular precision, and so forth? Where print
exists in languages - Python 3, Java and Lua have this style - there is this problem of formating. One approach is to define little helper functions and generally lean on string handling; for instance it's easy with StrWriter
to define to_str
:
template <typename T> string to_str(T v, const char *fmt=nullptr) { StrWriter sw; sw(v,fmt); return sw.str(); } ... print ("answer in hex is 0x" + to_str(answer,"X")) ('\n'); --> answer in hex is 0x2A
You see this kind of code quite frequently with Java, and it sucks for high performance logging because of the cost of creating and disposing all the temporary strings. Java has since acquired an actual printf
equivalent (probably provoked by its competitive younger sister C#) and both Python and Lua programmers use some kind of format
to make nice strings. Not to say that to_str
isn't a useful function - it's more flexible than std::to_string
- but it will have a cost that you might not always want to pay.
Another approach is to create little wrappers, like a Hex
class and so forth. So you get code like this: print("answer in hex",Hex(answer))();
. The namespace becomes cluttered with these classes, like how std
is full of things like hex
,dec
and so forth. A compromise is to add just another function which wraps a value and a format. This isn't bad for performance. since the wrapper type just transfers references to the values; you can see for yourself in print.cpp
.
The other approach is the one taken by iostreams - define some special values which control the formatting of the next value, and so forth[1]. It can be done, but it's messy and makes the concept rather less appealing. It's a nice example of the Hello World Fallacy [2]where the easy stuff is attractively easy and the hard stuff is unnecessarily hard. And I maintain that print
and iostreams fall into exactly that space.
This implementation of print
does have the nice properly that it's easy to overload for new types, which is not the case for outstreams.
Quick Output And Logging
Use of the preprocessor in modern C++ is considered generally a Bad Thing, for good reasons. Macros stomp on everything, without respect for scope, and C++ provides alternative mechanisms for nearly everything (inlining, constant definition, etc) that C relies on the preprocessor to provide. But it isn't an absolute evil, if macros are always written as UPPER_CASE and so clearly distinct from scoped variables.
Here's a case where developer convenience outweighs ideological purity; dumping out variables. Sometimes symbolic debuggers are too intrusive or simply not available. It's a nice example of old macro magic combined with new operator overloading.
#define VA(var) (#var)(var,quote_d) #define VX(var) (#var)(var,hex_u) ... string full_name; uint64_t id_number; char ch = ' '; ... outs VA(full_name) VA(id_number) VX(ch) ('\n'); // --> full_name "bonzo the dog" id_number 666 ch 20
Here is a trick which allows you to completely switch off tracing, with little overhead. (Many loggers will suppress output if the level is too low, but any expressions will still be evaluated.)
// if the FILE* is NULL, then a Writer // instance converts to false // can say logs.set(nullptr) to // switch off tracing #define TRACE if (logs) logs ("TRACE") ... TRACE VA(obj.expensive_method()) ('\n');
I mentioned that logging was something that all serious programs need. It's tempting to write your own logger, but it's tricky to get right and this wheel has been invented before.
We use log4cpp where I work, but only its mother would consider it to be elegant and well-documented:
plogger->log(log4cpp::Priority::DEBUG, "answer is %d",42);
It does have an iostreams-like alternative interface but it's a bit clumsy and half-baked. However, it is very configurable and handles all the details of creating log files, rolling them over, writing them to a remote syslog, and so forth.
It is easy to wrap this in a Writer-derived class. In fact, it's easier to derive from StrWriter
and override put_eoln
, which is used by the 'end of line' empty operator()
overload. Normally it just uses write_char
to put out '\n', but here we use it to actually construct the call to log4cpp:
using PriorityLevel = log4cpp::Priority::PriorityLevel; class LogWriter: public StrWriter { PriorityLevel level; public: LogWriter(PriorityLevel level) : StrWriter(' '),level(level) {} virtual void put_eoln() { plogger->log(level, "%s",str().c_str()); clear(); } };
By just exposing references to error
, warn
etc to Writer
, the rest of your program does not have any dependencies on the log4cpp headers - just in case you do want to drop in your own replacement that directly uses syslog
. Look at testlog.cpp
and how it uses logger.cpp
to encapsulate the details of logging.
Thereafter, use as before:
error("hello")(42)('\n'); warn("this is a warning")('\n'); debug("won't appear with defaults)('\n'); // --> // 2016-04-10 09:32:11,552 [ERROR] hello 42 // 2016-04-10 09:32:11,553 [WARN] this is a warning
Costs and Limitations
In a simple test (writing a million records to a file with five comma-separated fields) outstreams seems to be about 10% more expensive, as we can expect from needing more calls to stdio. (Equivalent test for iostreams shows it seriously lagging behind). The library itself is small, so if your system has vfprintf
(or equivalent) then it's a easy dependency. If the macro OLD_STD_CPP
is defined it compiles fine for C++-03, without support for initializer lists.
There is a fundamental problem with operator()
here - it can only be defined as a method of a type, unlike operator<<
. So adding your own types requires overriding a special function and using a template version of operator()
with a additional first const char*
argument to avoid the inevitable Template Overloading Blues.
As a more traditional alternative, if a type implements the Writeable
interface, then things work cleanly. (In any case, how a class wants to write itself out is its own private business.) Writeable
provides a handy to_string
using the implementation of the overriden write_to
method.
Some may find this notation too compressed - <<
is nice & spacy by comparison. It is certainly less noisy than stdio, since format strings are optional. Sticky separators can be annoying (controlling them properly was probably the most tricky bit of the implementation) but for most applications, they seem appropriate - they can always be turned off.
UPDATE
Some did find the notation too compressed - in particular ()
seems to vanish. So now ('\n')
is completely equivalent to ()
.
The cppformat also looks like a good alternative. It resolves the specification problem with variadic print
by allowing for a format, either traditional or Python-style.
Some commenters felt that operator-chaining was completely old-hat and that variadic templates were obviously superior, which is a matter of taste.
Internationalization represents a problem, since the word order may change in translation - Python-style does solve this issue.
[0] Although that doesn't mean we have to like all of it. And using it is not compulsory.
[1] But don't make them sticky like with iostreams - there are Stackoverflow questions about how to print out in hex and then how to stop printing out in hex.
[2] This fallacy is an independent rediscovery of the exact phrase from an earlier article, but the feeling on Reddit was that the first guy's website was ugly and hence inherently inferior.
Looks good. I will try.
ReplyDeleteInteresting stuff. Just one correction: the cppformat library (https://github.com/cppformat/cppformat) doesn't build on IOStreams. It implements its own formatting for most types and falls back to printf for floating-point. This allows to achieve higher performance and smaller compile code size compared to IOStreams.
ReplyDelete