My (old) new command line processing software

(Updated 2015 05 22 for C++, hence the "old new"), see “The C++ Way,” below for latest

I got bored of writing the same getopt / switch/case shit over and over again. So I made a little mkmain script that would populate a blank directory with a Makefile, a main.c (with the getopt magic), and some blank header and C files.

This worked for decades.

Recently, I was introduced to some nasty Perl code that worked as a front end, and it "knew" about certain option interactions. Of course, I hate Perl, so I wanted to write this in C. Here's a comparison, illustrating most of the features.

Original Code

The original stock code:

while ((opt = getopt (argc, argv, "h:H:l:n:p:v")) != -1) {
    switch (opt) {
    case    'h':            // -h and -H are mutually exclusive
        if (optH) {
            fprintf (stderr, "%s:  -h is incompatible with -H, and -H already specified\n", progname);
            exit (EXIT_FAILURE);
        }
        opth = optarg;      // but are a plain ordinary assignment
        break;
    case    'H':
        if (opth) {
            fprintf (stderr, "%s:  -H is incompatible with -h, and -h already specified\n", progname);
            exit (EXIT_FAILURE);
        }
        optH = atoi (optarg);   // a plain ordinary number
        break;
    case    'l':
        optl = atoi (optarg);
        if (optl > 255) {
            fprintf (stderr, "%s:  -l value too big, maximum is 255\n");
            exit (EXIT_FAILURE);
        }
        break;
    case    'n':
        fatal (optn != NULL, "the -n option can only be specified once");
        optn = optarg;
        break;
    case    'p':
        optp = atoi (optarg);
        break;
    case    'v':
        optv++;
        if (optv > 1) {
            printf ("Verbosity is now %d\n", optv);
        }
        break;
    default:
        usageError ();
        break;
    }
}

Of course, this is entangled with my other private library functions, like fatal() (which tests the predicate and, if true, writes the string to stderr (and optionally syslog) and exists with a failure).

New and Improved Code

The new code is much shorter. If it's shorter and the readability is better (or at least the same), it's a win.

fatal (options_init (argc, argv, "h:H:l:n:p:v") != OPTIONS_OK, usage_message);
options_mutex ("hH");
options_once ("n");
options_integer ("Hp", &optH, &optp);
options_integer_max ("l", 255, &optl);
options_string ("hn", &opth, &optn);
options_count ("v", &optv);

10 lines instead of 41 lines. Sold!

Notice how, using varargs, I'm able to handle several options in one call (look at the usage of options_integer, in the example above).

API Description

// returns OPTIONS_OK if command line has only the listed "options"
extern int    options_init (int argc, char **argv, char *options);

// takes int *, sets to true/false if flag present/not present
extern void   options_boolean (char *bools, ...);

// takes int *, returns counts of flags
extern void   options_count (char *bools, ...);

// takes void (*f) (char *), calls out to "f" with each option in "list"
extern void   options_callout (char *list, ...);

// takes int *, sets to atoi value of option
extern void   options_integer (char *ints, ...);

// takes int *, as above, with maximum check
extern void   options_integer_max (char *ints, int maximum, ...);

// takes int *, as above, but with minimum check
extern void   options_integer_min (char *ints, int minimum, ...);

// takes int *, as above, but with range check.
extern void   options_integer_range (char *ints, int minimum, int maximum, ...);

// dies if not all options specified
extern void   options_mandatory (char *list);

// dies if more than one specified
extern void   options_mutex (char *mutex);

// dies if any specified more than once
extern void   options_once (char *list);

// returns index into argv [] where parameters start
extern int    options_parmstart (void);

// takes char **, sets to pointer to argv argument
extern void   options_string (char *strings, ...);

When I get around to it, I'll describe these, but you look pretty smart and should be able to figure them out. The really cool thing is the varargs stuff — makes everything much more compact.

Long Options

No, I don't do long options. I also don't smoke crack. Coincidence?

I guess I can't be mayor of Toronto. Oh well.

But seriously; my take on long options is this — GENERALLY SPEAKING, if you have that many options, you're doing it wrong.

Future

It would be nice to perhaps have a "validator", like:

extern void options_string_validator (char *strings, int (*validator) (char *arg), ...);
extern void options_integer_validator (char *ints, int (*validator) (char *arg), ...);

In this manner, the validator gets called with each detected option (first argument). It returns a true/false (indicating its acceptance/rejection, resp.) of the argument. Again, varargs rule here!

The C++ Way

Thinking about C++ and option processing, I really want to handle the command line:

-a val1 -v -b val2 -b val3 -v -b val4 -a val5 -x val6 -i 7 -i 8 -i 9 -vv

and somehow magically end up with:

I wanted the source code for callers to the options handler to look sharp (elegant) too:

vector <string> opta;     // a vector of -a string values
list <string>   optb;     // a list of -b string values
vector <int>    opti;     // a vector of -i integer values
string          optx;     // a plain ordinary string of -x
int             optv;     // a plain ordinary integer representing the count of -v options

int
main (int argc, char *argv [])
{
    Options opt = {argc, argv, "a:b:i:x:v");

    opt.fetch ("abixv", opta, optb, opti, optx, optv);
}

“LOL,” you might be thinking, but you'd be wrong.

There are several major challenges to tackle here. One is that I need to be able to iterate across a list of arguments which have various types (this is the printf problem on steroids, because some of my elements could be containers, not just scalars).

Secondly, I want to handle all “containers” in one lump; I don't want to hand-code something for vector, something else for list, and so on — these guys all have a push_back function, and that's all I should need.

Thirdly, I wanted flags to automatically increment the integer variable rather than set the value (for example, optv from the sample code above).

A “flag” is a dash followed by a letter, but with no value, for example “-v” to increase verbosity; and an “option” is a dash followed by a letter, followed by an optional space, followed by a value, for example “-p 77” to set the port number to the value 77.

So, tackling the first part, I started with the following prototype code:

#include <iostream>
#include <string>
#include <vector>
#include <list>

using namespace std;

static int global = 0;

list <string> l;
vector <string> v;
string s;

void f (const string s);
template <typename... Args> void f (const string s, string &value, Args&... args);
template <typename T, typename... Args> void f (const string s, T& value, Args&... args);

void f (const string s)
{
    cout << "[" << __FILE__ << ":" << __LINE__ << "]\n";
}


template <typename T, typename... Args>
void f (const string s, T& value, Args&... args)
{
    for (auto p : s) {
        cout << "now doing [" << p << "]\n";
        value.push_back ("<container> of string" + to_string (global++));
        return f (s.substr(1), args...);
    }
}

template <typename... Args>
void f (const string s, string &value, Args&... args)
{
    for (auto p : s) {
        cout << "now doing [" << p << "]\n";
        value = "plain ordinary string" + to_string (global++);
        return f (s.substr(1), args...);
    }
}

int
main ()
{
    f ("lvvslv", l, v, v, s, l, v);
    cout << "l is "; for (auto p : l) cout << p << " "; cout << "\n";
    cout << "v is "; for (auto p : v) cout << p << " "; cout << "\n";
    cout << "s is " << s << "\n";
}

The code above doesn't match the general requirements; it's currently (20150522) a proof of concept to see what's possible. Basically, what I've done is created three versions of f — one that just takes a plain ordinary string (do I even need that one? Probably not!), one that takes two strings and a parameter pack, and one that takes a string, a template class, and a parameter pack.

The output of the above program is:

now doing [l]
now doing [v]
now doing [v]
now doing [s]
now doing [l]
now doing [v]
[x.cpp:20]
l is <container> of string0 <container> of string4
v is <container> of string1 <container> of string2 <container> of string5
s is plain ordinary string3

Bwuhahahaha.

Stay tuned. I think I can optimize this further using traits — if it's a container item, use push_back, otherwise just assign. traits should also let me handle the case of option -x being an integer. Perhaps even a list of integers.

I just have to shake my head at the genius of C++ — parameter packs, recursive expansion of compile-time computable function calls, etc. It's a glorious age we live in.

2015 05 23

All that aside, I now have working code for the arguments processing:

class Options
{
public:
    /* constructor */   Options (int argc, const char *argv[], string optstring);
    /* constructor */   Options (void) : argstart {0} { };

    // yes, this needs to be templatized, but my brain just isn't big enough yet...
    // see:  http://www.generic-programming.org/languages/cpp/techniques.php
    // I'd like it to be { scalar, container } X { string, int, double }
    void fetch (const string s);
    template <typename... Args> void fetch (const string s, string &value, Args&... args);
    template <typename... Args> void fetch (const string s, int &value, Args&... args);
    template <typename... Args> void fetch (const string s, double &value, Args&... args);
    template <typename... Args> void fetch (const string s, vector <string> &value, Args&... args);
    template <typename... Args> void fetch (const string s, list <string> &value, Args&... args);
    template <typename... Args> void fetch (const string s, vector <int> &value, Args&... args);
    template <typename... Args> void fetch (const string s, list <int> &value, Args&... args);
    template <typename... Args> void fetch (const string s, vector <double> &value, Args&... args);
    template <typename... Args> void fetch (const string s, list <double> &value, Args&... args);

private:
    int                 argstart;
    vector <tuple <char, string, string>> options;
    string              flags;
    string              opts;
    string              progname;
};

As you can see in the above, I have the constructor which takes argc, argv, and the getopt-style options string optstring, and initializaes some of the private members with pre-parsed versions. Mainly, options is a vector of tuple consisting of a single letter option (the char), provision for long option names (the first string), and the value (if any, in the second string).

The fetch function goes through and template matches the given argument. It stores the second string's value into the value argument (by reference), and tail-end recurses to handle the next parameter pack. Here's an example of one of the functions:

template <typename... Args> void
Options::fetch (const string s, vector <int>& value, Args&... args)
{
    char letter = s [0];
    for (auto p : options) {
        if (letter == get<0>(p)) {
            value.push_back (stoi (get<2>(p)));
        }
    }
    return fetch (s.substr(1), args...);
}

Fairly simple code.

Where I ran into complications was in trying to further specialize the handler functions. Ideally, I only want to write 6 functions — 2 (scalar versus container) times 3 (string, int, and double). Currently, I have to write one function for each combination of container and target type and three functions for the scalar target types. The whole point of the STL, though, is that all the container types appear “similar,” and can be picked by the end-user as appropriate. If I was to handle the complete set in this manner, I'd need to write (array + deque + forward_list + list + ...) times the three fundamental data types. Way too much typing!

2015 05 25

Success!

After some help from the C++ group on LinkedIn, and some more searching, I now have this magical code:

class Options
{
public:
    // the final (bogus) one
    void fetch (const string s) { }

    // the scalar version
    template <typename Containee, typename... Args>
        void
        fetch (const string s, Containee &value, Args&... args)
        {
            for (auto p : options) {
                if (s [0] == get<0>(p)) {
                    value = fetch_value (get<2>(p), value);
                    break;
                }
            }
            return fetch (s.substr (1), args...);
        }

    // the scalar version
    template <typename... Args>
        void
        fetch (const string s, string &value, Args&... args)
        {
            for (auto p : options) {
                if (s [0] == get<0>(p)) {
                    value = get<2>(p);
                    break;
                }
            }
            return fetch (s.substr (1), args...);
        }

    // the container version
    template <typename Containee, template <typename, typename...> class Container, typename... Args>
        void
        fetch (const string s, Container<Containee> &value, Args&... args)
        {
            for (auto p : options) {
                if (s [0] == get<0>(p)) {
                    Containee v;
                    v = fetch_value (get<2>(p), v);
                    value.push_back(v);
                }
            }
            return fetch (s.substr (1), args...);
        }

private:
    int fetch_value (string value, int &result) { return (stoi (value)); }
    double fetch_value (string value, double &result) { return (stod (value)); }
    string fetch_value (string value, string &result) { return (value); }
};

There are several inter-related concepts at play here, which is why it took a while to figure out:

  1. variadic functions (parameter packs)
  2. variadic templates
  3. template templates
  4. specialization
  5. tuples

Variadic

The fun thing about my Options class's fetch function is that it takes a variable number of arguments. Recall from above that I wanted to be able to support code like the following:

vector <string> opta;     // a vector of -a string values
list <string>   optb;     // a list of -b string values
vector <int>    opti;     // a vector of -i integer values
string          optx;     // a plain ordinary string of -x
int             optv;     // a plain ordinary integer representing the count of -v options

int
main (int argc, char *argv [])
{
    Options opt = {argc, argv, "a:b:i:x:v");

    opt.fetch ("abixv", opta, optb, opti, optx, optv);
}

Obviously, the number and type of the arguments is determined by the programmer at compile time.

The first “trick,” therefore, is to figure out how to declare a function that takes a variable number of arguments of various types. In C, the poster child for this is of course printf.

There are three overloaded declarations in the class that deal with this:

// the generic scalar version
template <typename Containee, typename... Args>
    void fetch (const string s, Containee &value, Args&... args);

// the scalar version specialized for strings
template <typename... Args>
    void fetch (const string s, string &value, Args&... args);

// the container version
template <typename Containee, template <typename, typename...> class Container, typename... Args>
    void fetch (const string s, Container<Containee> &value, Args&... args);

As you can see, all three versions have an initial parameter of const string s — this is the option specifier string (the “abixv” part from the code sample above).

The second parameter is either a templatized argument, a string argument, or a container argument.

The simplest is the string version:

// the scalar version specialized for strings
template <typename... Args>
    void fetch (const string s, string &value, Args&... args);

This specialized templated function matches anything that calls it with a const string and a string reference. The third argument is called a “parameter pack.” It represents a pack (set, list, however you want to think of it) of parameters. There can be zero or more parameters in the parameter pack.

Calling:

opt.fetch ("x", optx);

Results in the parameter pack being empty, and the two strings (“a” and optx) being passed to the function. Alternatively, calling:

opt.fetch ("xvi", optx, optv, opti);

Results in the parameter pack containing the rest of the arguments (that is, “optv, opti”).

Finalization

So now, my “prototypical main” now looks like this:

int optv = 0;   // global verbosity (default off)

int
main (int argc, char **argv)
{
    try {
        Options opt = {argc, (const char **) argv, "v"};
        opt.fetch ("v", optv);
    }
    catch (exception &e) {
        cerr << "Error detected during command line processing:  " << e.what() << "\n\n";
        cerr << usage_message;
    }
    exit (EXIT_SUCCESS);
}

And I just add to it as I need more command line variables.

Multiple Parameters

The latest trick is a uniform “comma” splitter. What I mean by that is, sometimes you have a command line option that you'd like to repeat:

utility -g group1 -g group2 -g group3 ...

Which is sometimes more conveniently expressed as:

utility -g group1,group2,group3 ...

But we don't preclude mixing the two modes:

utility -g group1 -g group2,group3 ...

The container version of the parameter fetch will easily fetch option “g”:

vector  optg;
...
opt.fetch ("g", optg);

But this will result in optg being (in the last case) a vector of two strings, namely “group1” and “group2,group3.”

The function normalize fixes up optg for us:

normalize (optg);

It's templated, and implemented as follows:

template <template <typename, typename...> class Container>
void
normalize (Container<string> &value, char delim = ',')
{
    Container <string> output;

    for (auto i : value) {
        // use stdlib to tokenize the string
        stringstream ss (i);
        string item;
        while (getline (ss, item, delim)) {
            output.push_back (item);
        }
    }
    value = output;
}

It takes a default parameter delim which allows you to split on things other than commas if you like.

It's a little heavy-handed in terms of efficiency, but I think it's ok because it only gets used during command line processing — which should be a tiny component of the actual run time of the program.