My (old) new command line processing software
(Updated 2015 05 22 for C++, hence the "old new"), see “The C++ Way,” below for latest
I got bored of writing the same getopt / switch
/case
shit over and over again.
So I made a little mkmain script that would populate a blank directory with a Makefile,
a main.c (with the getopt magic), and some blank header and C files.
This worked for decades.
Recently, I was introduced to some nasty Perl code that worked as a front end, and it "knew" about certain option interactions. Of course, I hate Perl, so I wanted to write this in C. Here's a comparison, illustrating most of the features.
Original Code
The original stock code:
while ((opt = getopt (argc, argv, "h:H:l:n:p:v")) != -1) { switch (opt) { case 'h': // -h and -H are mutually exclusive if (optH) { fprintf (stderr, "%s: -h is incompatible with -H, and -H already specified\n", progname); exit (EXIT_FAILURE); } opth = optarg; // but are a plain ordinary assignment break; case 'H': if (opth) { fprintf (stderr, "%s: -H is incompatible with -h, and -h already specified\n", progname); exit (EXIT_FAILURE); } optH = atoi (optarg); // a plain ordinary number break; case 'l': optl = atoi (optarg); if (optl > 255) { fprintf (stderr, "%s: -l value too big, maximum is 255\n"); exit (EXIT_FAILURE); } break; case 'n': fatal (optn != NULL, "the -n option can only be specified once"); optn = optarg; break; case 'p': optp = atoi (optarg); break; case 'v': optv++; if (optv > 1) { printf ("Verbosity is now %d\n", optv); } break; default: usageError (); break; } }
Of course, this is entangled with my other private library functions, like fatal() (which tests the predicate and, if true, writes the string to stderr (and optionally syslog) and exists with a failure).
New and Improved Code
The new code is much shorter. If it's shorter and the readability is better (or at least the same), it's a win.
fatal (options_init (argc, argv, "h:H:l:n:p:v") != OPTIONS_OK, usage_message); options_mutex ("hH"); options_once ("n"); options_integer ("Hp", &optH, &optp); options_integer_max ("l", 255, &optl); options_string ("hn", &opth, &optn); options_count ("v", &optv);
10 lines instead of 41 lines. Sold!
Notice how, using varargs, I'm able to handle several options in one call (look at the usage of options_integer, in the example above).
API Description
// returns OPTIONS_OK if command line has only the listed "options" extern int options_init (int argc, char **argv, char *options); // takes int *, sets to true/false if flag present/not present extern void options_boolean (char *bools, ...); // takes int *, returns counts of flags extern void options_count (char *bools, ...); // takes void (*f) (char *), calls out to "f" with each option in "list" extern void options_callout (char *list, ...); // takes int *, sets to atoi value of option extern void options_integer (char *ints, ...); // takes int *, as above, with maximum check extern void options_integer_max (char *ints, int maximum, ...); // takes int *, as above, but with minimum check extern void options_integer_min (char *ints, int minimum, ...); // takes int *, as above, but with range check. extern void options_integer_range (char *ints, int minimum, int maximum, ...); // dies if not all options specified extern void options_mandatory (char *list); // dies if more than one specified extern void options_mutex (char *mutex); // dies if any specified more than once extern void options_once (char *list); // returns index into argv [] where parameters start extern int options_parmstart (void); // takes char **, sets to pointer to argv argument extern void options_string (char *strings, ...);
When I get around to it, I'll describe these, but you look pretty smart and should be able to figure them out. The really cool thing is the varargs stuff — makes everything much more compact.
Long Options
No, I don't do long options. I also don't smoke crack. Coincidence?
I guess I can't be mayor of Toronto. Oh well.
But seriously; my take on long options is this — GENERALLY SPEAKING, if you have that many options, you're doing it wrong.
Future
It would be nice to perhaps have a "validator", like:
extern void options_string_validator (char *strings, int (*validator) (char *arg), ...); extern void options_integer_validator (char *ints, int (*validator) (char *arg), ...);
In this manner, the validator gets called with each detected option (first argument). It returns a true/false (indicating its acceptance/rejection, resp.) of the argument. Again, varargs rule here!
The C++ Way
Thinking about C++ and option processing, I really want to handle the command line:
-a val1 -v -b val2 -b val3 -v -b val4 -a val5 -x val6 -i 7 -i 8 -i 9 -vv
and somehow magically end up with:
- a vector of a's with the string values val1 and val5,
- a list of b's with the string values val2, val3 and val4,
- a list of i's with the int values 7, 8, and 9,
- a string x with the value val6, and
- an int v with the value 4.
I wanted the source code for callers to the options handler to look sharp (elegant) too:
vector <string> opta; // a vector of -a string values list <string> optb; // a list of -b string values vector <int> opti; // a vector of -i integer values string optx; // a plain ordinary string of -x int optv; // a plain ordinary integer representing the count of -v options int main (int argc, char *argv []) { Options opt = {argc, argv, "a:b:i:x:v"); opt.fetch ("abixv", opta, optb, opti, optx, optv); }
“LOL,” you might be thinking, but you'd be wrong.
There are several major challenges to tackle here. One is that I need to be able to iterate across a list of arguments which have various types (this is the printf problem on steroids, because some of my elements could be containers, not just scalars).
Secondly, I want to handle all “containers” in one lump; I don't want to hand-code something for vector, something else for list, and so on — these guys all have a push_back function, and that's all I should need.
Thirdly, I wanted flags to automatically increment the integer variable rather than set the value (for example, optv from the sample code above).
A “flag” is a dash followed by a letter, but with no value, for example “-v” to increase verbosity; and an “option” is a dash followed by a letter, followed by an optional space, followed by a value, for example “-p 77” to set the port number to the value 77.
So, tackling the first part, I started with the following prototype code:
#include <iostream> #include <string> #include <vector> #include <list> using namespace std; static int global = 0; list <string> l; vector <string> v; string s; void f (const string s); template <typename... Args> void f (const string s, string &value, Args&... args); template <typename T, typename... Args> void f (const string s, T& value, Args&... args); void f (const string s) { cout << "[" << __FILE__ << ":" << __LINE__ << "]\n"; } template <typename T, typename... Args> void f (const string s, T& value, Args&... args) { for (auto p : s) { cout << "now doing [" << p << "]\n"; value.push_back ("<container> of string" + to_string (global++)); return f (s.substr(1), args...); } } template <typename... Args> void f (const string s, string &value, Args&... args) { for (auto p : s) { cout << "now doing [" << p << "]\n"; value = "plain ordinary string" + to_string (global++); return f (s.substr(1), args...); } } int main () { f ("lvvslv", l, v, v, s, l, v); cout << "l is "; for (auto p : l) cout << p << " "; cout << "\n"; cout << "v is "; for (auto p : v) cout << p << " "; cout << "\n"; cout << "s is " << s << "\n"; }
The code above doesn't match the general requirements; it's currently (20150522) a proof of concept to see what's possible. Basically, what I've done is created three versions of f — one that just takes a plain ordinary string (do I even need that one? Probably not!), one that takes two strings and a parameter pack, and one that takes a string, a template class, and a parameter pack.
The output of the above program is:
now doing [l] now doing [v] now doing [v] now doing [s] now doing [l] now doing [v] [x.cpp:20] l is <container> of string0 <container> of string4 v is <container> of string1 <container> of string2 <container> of string5 s is plain ordinary string3
Bwuhahahaha.
Stay tuned. I think I can optimize this further using traits — if it's a container item, use push_back, otherwise just assign. traits should also let me handle the case of option -x being an integer. Perhaps even a list of integers.
I just have to shake my head at the genius of C++ — parameter packs, recursive expansion of compile-time computable function calls, etc. It's a glorious age we live in.
2015 05 23
All that aside, I now have working code for the arguments processing:
class Options { public: /* constructor */ Options (int argc, const char *argv[], string optstring); /* constructor */ Options (void) : argstart {0} { }; // yes, this needs to be templatized, but my brain just isn't big enough yet... // see: http://www.generic-programming.org/languages/cpp/techniques.php // I'd like it to be { scalar, container } X { string, int, double } void fetch (const string s); template <typename... Args> void fetch (const string s, string &value, Args&... args); template <typename... Args> void fetch (const string s, int &value, Args&... args); template <typename... Args> void fetch (const string s, double &value, Args&... args); template <typename... Args> void fetch (const string s, vector <string> &value, Args&... args); template <typename... Args> void fetch (const string s, list <string> &value, Args&... args); template <typename... Args> void fetch (const string s, vector <int> &value, Args&... args); template <typename... Args> void fetch (const string s, list <int> &value, Args&... args); template <typename... Args> void fetch (const string s, vector <double> &value, Args&... args); template <typename... Args> void fetch (const string s, list <double> &value, Args&... args); private: int argstart; vector <tuple <char, string, string>> options; string flags; string opts; string progname; };
As you can see in the above, I have the constructor which takes argc, argv, and the getopt-style options string optstring, and initializaes some of the private members with pre-parsed versions. Mainly, options is a vector of tuple consisting of a single letter option (the char), provision for long option names (the first string), and the value (if any, in the second string).
The fetch function goes through and template matches the given argument. It stores the second string's value into the value argument (by reference), and tail-end recurses to handle the next parameter pack. Here's an example of one of the functions:
template <typename... Args> void Options::fetch (const string s, vector <int>& value, Args&... args) { char letter = s [0]; for (auto p : options) { if (letter == get<0>(p)) { value.push_back (stoi (get<2>(p))); } } return fetch (s.substr(1), args...); }
Fairly simple code.
Where I ran into complications was in trying to further specialize the handler functions. Ideally, I only want to write 6 functions — 2 (scalar versus container) times 3 (string, int, and double). Currently, I have to write one function for each combination of container and target type and three functions for the scalar target types. The whole point of the STL, though, is that all the container types appear “similar,” and can be picked by the end-user as appropriate. If I was to handle the complete set in this manner, I'd need to write (array + deque + forward_list + list + ...) times the three fundamental data types. Way too much typing!
2015 05 25
Success!
After some help from the C++ group on LinkedIn, and some more searching, I now have this magical code:
class Options { public: // the final (bogus) one void fetch (const string s) { } // the scalar version template <typename Containee, typename... Args> void fetch (const string s, Containee &value, Args&... args) { for (auto p : options) { if (s [0] == get<0>(p)) { value = fetch_value (get<2>(p), value); break; } } return fetch (s.substr (1), args...); } // the scalar version template <typename... Args> void fetch (const string s, string &value, Args&... args) { for (auto p : options) { if (s [0] == get<0>(p)) { value = get<2>(p); break; } } return fetch (s.substr (1), args...); } // the container version template <typename Containee, template <typename, typename...> class Container, typename... Args> void fetch (const string s, Container<Containee> &value, Args&... args) { for (auto p : options) { if (s [0] == get<0>(p)) { Containee v; v = fetch_value (get<2>(p), v); value.push_back(v); } } return fetch (s.substr (1), args...); } private: int fetch_value (string value, int &result) { return (stoi (value)); } double fetch_value (string value, double &result) { return (stod (value)); } string fetch_value (string value, string &result) { return (value); } };
There are several inter-related concepts at play here, which is why it took a while to figure out:
- variadic functions (parameter packs)
- variadic templates
- template templates
- specialization
- tuples
Variadic
The fun thing about my Options class's fetch function is that it takes a variable number of arguments. Recall from above that I wanted to be able to support code like the following:
vector <string> opta; // a vector of -a string values list <string> optb; // a list of -b string values vector <int> opti; // a vector of -i integer values string optx; // a plain ordinary string of -x int optv; // a plain ordinary integer representing the count of -v options int main (int argc, char *argv []) { Options opt = {argc, argv, "a:b:i:x:v"); opt.fetch ("abixv", opta, optb, opti, optx, optv); }
Obviously, the number and type of the arguments is determined by the programmer at compile time.
The first “trick,” therefore, is to figure out how to declare a function that takes a variable number of arguments of various types. In C, the poster child for this is of course printf.
There are three overloaded declarations in the class that deal with this:
// the generic scalar version template <typename Containee, typename... Args> void fetch (const string s, Containee &value, Args&... args); // the scalar version specialized for strings template <typename... Args> void fetch (const string s, string &value, Args&... args); // the container version template <typename Containee, template <typename, typename...> class Container, typename... Args> void fetch (const string s, Container<Containee> &value, Args&... args);
As you can see, all three versions have an initial parameter of const string s — this is the option specifier string (the “abixv” part from the code sample above).
The second parameter is either a templatized argument, a string argument, or a container argument.
The simplest is the string version:
// the scalar version specialized for strings template <typename... Args> void fetch (const string s, string &value, Args&... args);
This specialized templated function matches anything that calls it with a const string and a string reference. The third argument is called a “parameter pack.” It represents a pack (set, list, however you want to think of it) of parameters. There can be zero or more parameters in the parameter pack.
Calling:
opt.fetch ("x", optx);
Results in the parameter pack being empty, and the two strings (“a” and optx) being passed to the function. Alternatively, calling:
opt.fetch ("xvi", optx, optv, opti);
Results in the parameter pack containing the rest of the arguments (that is, “optv, opti”).
Finalization
So now, my “prototypical main” now looks like this:
int optv = 0; // global verbosity (default off) int main (int argc, char **argv) { try { Options opt = {argc, (const char **) argv, "v"}; opt.fetch ("v", optv); } catch (exception &e) { cerr << "Error detected during command line processing: " << e.what() << "\n\n"; cerr << usage_message; } exit (EXIT_SUCCESS); }
And I just add to it as I need more command line variables.
Multiple Parameters
The latest trick is a uniform “comma” splitter. What I mean by that is, sometimes you have a command line option that you'd like to repeat:
utility -g group1 -g group2 -g group3 ...
Which is sometimes more conveniently expressed as:
utility -g group1,group2,group3 ...
But we don't preclude mixing the two modes:
utility -g group1 -g group2,group3 ...
The container version of the parameter fetch will easily fetch option “g”:
vectoroptg; ... opt.fetch ("g", optg);
But this will result in optg being (in the last case) a vector of two strings, namely “group1” and “group2,group3.”
The function normalize fixes up optg for us:
normalize (optg);
It's templated, and implemented as follows:
template <template <typename, typename...> class Container> void normalize (Container<string> &value, char delim = ',') { Container <string> output; for (auto i : value) { // use stdlib to tokenize the string stringstream ss (i); string item; while (getline (ss, item, delim)) { output.push_back (item); } } value = output; }
It takes a default parameter delim which allows you to split on things other than commas if you like.
It's a little heavy-handed in terms of efficiency, but I think it's ok because it only gets used during command line processing — which should be a tiny component of the actual run time of the program.