The Ugly Brother
My favourite Irish joke (and I'm Irish enough to tell it) concerns a posh gent who is lost in Dublin. He asks a bystander, "Tell me my good man, how can I get to the National Museum?". Who replies: "Well sir, I wouldn't go from here". This is the received opinion about scanf, the ugly brother of printf
.
It tends to appear only in the kind of beginner tutorals where programs ask the user to provide two numbers and then add them up. Serious people avoid scanf
because it's hard to use properly: easy to confuse, and tricky to get meaningful errors. Beginners read the tutorials and then ask questions about how to bullet-proof scanf
and get told by serious people not to use it.
scanf
's glamorous cousin std::cin
is easier to use, because it matches against strongly-typed non-const references - reading a constant is a compile error. But error handling is not straightforward.
Besides, sometimes you don't have the resources nor the inclination to use iostreams, as I discussed in my last article
Line by Line
A common strategy is to read a text file, line by line. Here std::getline
is superior to fgets
; it allocates more buffer if needed and trims end-of-line characters.
The outstreams library provides a very similar interface:
string line; stream::Reader in("myfile.txt"); while (in.getline(line)) { do_something(line); } if (! in) { stream::errs(in.error())('\n'); } // if failed: // --> No such file or directory
Always a good idea to check for errors; if the error state is set, then no further read operations will take place, so it's perfectly fine to check afterwards.
So far, very much like iostreams's istream
interface, except you get a sensible error without having to call perror
yourself (which belongs to stdio
, the irony)
From now on I'll assume that you have broken down and said 'using namespace stream'. Here's a one-liner which populates a container:
vector<string> lines;
Reader in("myfile.txt");
in.getlines(lines);
// errors!?
The container can be anything which understands push_back
- a std::list
would work just as well here. There is an optional second argument to getlines
which gives the maximum number of lines to grab. (It's not how these things are typically organized; you would pretend that Reader
was a container - or write an adapter - and use std::copy
and std::back_inserter
. This is more general but significantly uglifies the common case.)
Then there is a readall
method which grabs all of the file and puts it into a std::string
. The weakness of the standard string - that it knows nothing about character encoding - becomes a strength when you treat it as sliceable, appendable bag of bytes.
Having captured the lines, those serious people will typically then use serious conversion functions like std::stoul
etc to actually extract values and get errors.
A Comedy of Errors
Things become less than simple with both stdio
and iostreams
when reading items one by one. scanf
returns the number of items successfully processed, or EOF
if we run out of stream. If the number scanned is too big for the type, garbage results. (The man page does claim that it will set errno
if a conversion resulted in a value out of range, but it appears to be lying.) So serious use involves lots of checking.
I've tried to tame these errors by putting a facade around the actual scanf
calls, rather as outstreams wrapped printf
:
double x; string s; short b; StrReader is ("4.2 boo 42"); is (x) (s) (b);
You always should be aware of errors: files may be mistyped, mangled by the network, or maliciously altered.
is.set("x4.2 boo 42"); is (x) (x) (b); errs(is.error())('\n'); // --> error reading double // at 'x4.2' is.set("x4.2 boo 344455555"); is (x) (x) (b); errs(is.error())('\n'); // --> error converting int16 // --> out of range // 344455555 is.set("x4.2 boo x10"); is (x) (x) (b); errs(is.error())('\n'); // --> error reading int64 at // 'x10'
('int64' for reading a short? Because I cannot assume that the read value is in range.)
There's an even more thorough way to handle errors; there is an error struct which can be 'read' from a stream.
Reader::Error err; if (! is (x) (x) (b) (err)) { // err.errcode EOF or errno // err.msg as returned by error() // err.pos position in file } // safe one-liner: // capture the error state // before reader object dies if (Reader("tmp.txt") .getline(line) (err) ) { // cool } else { // bummer! But at least // you know _where_ }
So, in summary so far, Reader
(and its string-oriented cousin StrReader
) provides a safer way to use stdio
for input, with better error handling, without the baroque contortions of iostream
errors.
Why no exceptions? It can be a matter of taste; it's often forbidden in embedded coding and it can be argued that paying attention to the error where it happens leads to better code. You are completely free to throw your own exception of course after checking for an error, but it then it won't be some generic 'file not found' exception which makes no sense several stack traces down.
Some useful tricks
It's possible to read a file of numbers directly:
outstreams$ cat numbers.txt 10 20 30 40 50 60 70 80 ... int i; Reader in("numbers.txt"); while (in (i)) { outs(i); } outs(eol); // -> 10 20 30 40 50 60 70 80
Of course, the read could be replaced by in >> i
with istream
and it would work in the same way, except that any errors would not give as much information.
What if we only wanted the first three numbers on each line?
in (i) (j) (k) (); outs(i)(j)(k)(eol); // -> 10 20 30 in (i) (j) (k) (); outs(i)(j)(k)(eol); // -> 50 60 70
The no-argument overload of the call operator is equivalent to the skip
method, which generally takes the number of lines to skip.
Binary files
The read
method comes in two flavours. The first is given a buffer with its size, and returns the number of bytes actually read.
Reader self ("rx-read"); self.setpos(0,'$'); auto endp = self.getpos(); outs("length was") (endp)(eol); // --> length was 42752 self.setpos(0,'^'); auto buff = new uint8_t [endp]; auto res = self.read(buff,endp); outs("read")(res) ("bytes")(eol); // --> read 42752 bytes
setpos
is a little eccentric, but will make sense to people who use regular expressions - '^' means 'start of file' and '$' means 'end of file'; anything else means current position.
(The std::istream
method of the same name and signature returns the stream, which is consistent, but awkward if you're interested in the bytes actually read.)
The other form of read
is a template method that takes one argument and returns the stream. It's useful for objects that have fixed size at compile time, like structs and arrays
MyHeader st; MyInfo info; inf.read(st).read(info); // error state will be EOF // if we didn't read all
Commands
Executing and capturing the output of external problems is a pain point in C++ for me. So naturally I wanted to capture the output of that excellent old dog popen
and make it easier to use.
CmdReader ls("ls *.cpp"); vector<string> cpp_files; ls.getlines(cpp_files); string uname = CmdReader("uname").line(); if (uname == "Linux") { outs("that's a relief")(eol); }
CmdReader
derives from Reader
- all it need do is override close_handle
so that pclose
is called instread of fclose
. It has an extra convenience method line
for grabbing first line of output. stdout
and stdin
are merged.
If you aren't particularly interested in the output and just success/failure, there are a few convenient patterns:
// quick cool/uncool check string res = CmdReader( "true", cmd_ok ).line(); if (res != "OK") { outs("very weird shell") (eol); } // actual return code int retcode; CmdReader("false",cmd_retcode) (retcode);
Wrapping Up
We have lost some scanf
functionality by breaking up the format into little bits.
However, if you already have a means to split the input stream into parts, then Reader
machinery can be re-used to do the actual conversion - see templ-read.cpp:
int n; double x; string s; auto parts = {"42","5.2","hello"}; auto rdr = make_parts_reader( parts.begin(), parts.end() ); rdr (n) (x) (s);
This pattern can be made to work with any source of strings, like regular expression matches, database queries, and so forth.
I tried to show that scanf
can be tamed, and it then becomes more pleasant and reliable to use. This library is not a heavy dependency (instream
and outstream
do not depend on each other and are each a single source file) and provides a middle option when choosing between stdio
and iostreams
.
Highly structured input formats are best read with the correct tools - we do have JSON, CSV, config-file, etc parsers. I suspect the problem here is that there is no browsable and discoverable repository of useful C++ libraries like Cargo for Rust, etc.
No comments:
Post a Comment