Immediate Mode Option Parser: Small, Simple, Elegant

01 Jun 2025

I was recently reading Tony Finch's getopt() but smaller article where he turns getopt into an "immediate/imperative" mode api and cuts the code-size in the process. This made me realize that I've been using similar style option parser for quite some time now. I haven't written about it since it's rather small in code size and straightforward to use (more so than getopt or getopt_long) - nothing mind blowing or novel. But that's precisely the reason why it deserves some spotlight; despite being so small and easy, it's barely used.

getopt duplication problem

If you aren't familiar with getopt, here's how the usage roughly looks like for a program that accepts -a flag and -b flag which requires an argument also:

for (int opt; (opt = getopt(argc, argv, "ab:")) != -1;) {
  switch (opt) {
    case 'a': aflag = 1; break;
    case 'b': bflag_arg = optarg; break;
    case '?': // unknown flag
      exit(1);
  }
}

Because getopt doesn't know which flags the program accepts, you need to tell it via the optstring argument "ab:". The : after b indicates it accepts an argument. But because the interface is half-declarative and half-imperative, you get the worst of both worlds and then you need to duplicate the checks inside the switch statement. With getopt_long the duplication issue gets even worse as the flags would need to duplicated again in the longopt array.

Immediate mode

The idea with an immediate mode API here is to avoid trying to do too much inside the function and instead simply parse & provide the flag (and argument) to the user which he can then either accept or reject. The following is a small demo at how you'd use such an API:

extern int
main(int argc, char *argv[])
{
    CLOpt o[1] = { clinit(argv) };
    // ...
}

We start by initializing the CLOpt structure. I'm passing argv but not argc, this means that the parser counts on the sentinel NULL pointer at the end of argv. It wouldn't be too difficult to adjust it to take in argc rather than relying on sentinel NULL if you wish to do so.

I could've also done the more conventional CLOpt o = clinit(argv); but then I'd have to use &o when calling the other functions since they're expecting a pointer. Using an array of single element allows me to just use o instead. This trick - while it might seem a bit retarded at first - I've found is quite useful when refactoring code out into a function, you no longer need to do the whole a.member to a->member change anymore.

    while (clnext(o)) {
        // ...
    }

Then we keep calling clnext in a loop. It will parse and prepare the next argument for us. We can then accept the argument in a traditional imperative if-else chain inside the loop:

        if (clopt(o, 'y', NULL)) {
            printf("-y: enabled!\n");
        }

This is a short option -y (with no equivalent long option) which accepts no argument.

        else if (clopt(o, 'n', "name") && clarg(o)) {
            printf("name: %s\n", o->arg);
        }

Here's option -n (with --name as long option) which also accepts an argument. But since this is an immediate mode you need to call clarg() to let the parser know about the fact that you're expecting an argument. If the call returns true, then the argument will be available via o->arg.

        else if (clopt(o, 0, "optional")) {
            if (cloptarg(o)) {
                printf("optional: %s\n", o->arg);
            } else {
                printf("optional: <default>\n");
            }
        }

Finally, we have argument --optional which accepts an optional argument with no equivalent short option. Since the argument is optional, we call cloptarg() to retrieve it. If the optional argument was provided, it will be available via o->arg similar to mandatory argument. Otherwise you can do whatever fallback you have, in the example it just prints <default> if the optional argument was not given.

    while (clnext(o)) {
        // ...
    }
    if (o->err) {
        fprintf(stderr, "ERROR: -%.*s: %s\n", o->len, o->flag, CL_ERR[o->err]);
        return 1;
    }

The loop will exit when we've parsed all option or encountered an error. In case of an error, we'll just print a message to stderr and exit. Note that o->flag is not necessarily a nul-terminated string, and so it's necessary to use %.*s and print up to o->len only. The CL_ERR array turns the error code into a human readable error message.

    printf("Remaining args: { ");
    for (; o->argv[0]; ++o->argv)
        printf("%s%s", o->argv[0], o->argv[1] ? ", " : " ");
    printf("}\n");

Otherwise if there was no error, then we can access the rest of the arguments via o->argv.

And that's all there is to it. Pretty simple, right? Here's the whole demo in one snippet:

extern int
main(int argc, char *argv[])
{
    CLOpt o[1] = { clinit(argv) };
    while (clnext(o)) {
        /****/ if (clopt(o, 'y', NULL)) {
            printf("-y: enabled!\n");
        } else if (clopt(o, 0x0, "long")) {
            printf("--long: enabled\n");
        } else if (clopt(o, 'n', "name") && clarg(o)) {
            printf("name: %s\n", o->arg);
        } else if (clopt(o, 0, "optional")) {
            if (cloptarg(o)) {
                printf("optional: %s\n", o->arg);
            } else {
                printf("optional: <default>\n");
            }
        }
    }
    if (o->err) {
        fprintf(stderr, "ERROR: -%.*s: %s\n", o->len, o->flag, CL_ERR[o->err]);
        return 1;
    }

    printf("Remaining args: { ");
    for (; o->argv[0]; ++o->argv)
        printf("%s%s", o->argv[0], o->argv[1] ? ", " : " ");
    printf("}\n");
}

Implementation details

For the implementation, here are a couple high level goals that I'd want to fulfill:

The source code can be found here. It's about 60 lines of C and achieves all of the goals outlined above. I usually avoid nul-terminated strings as they are the source of many bugs and headaches. But since this demo is meant to be for the wider audience, my implementation expects nul-terminated string, the one you'd get right out of standard C argv. The code is also dedicated to public domain, so you can use it for whatever purpose you want, or modify it to use non nul-strings etc.

Given how small the entire implementation is, I could simply walk though it line by line. However I'm instead going to outline a couple key implementation questions and how I've dealt with them. This should give you a better understanding of how to write one yourself, perhaps in a different language.

Was the flag accepted?

Since we don't keep a table of options, we need to somehow determine if the flag was accepted or not. The simplest way to do it is to let the user track it themselves by adding an else branch at the end of their if-else clopt chain. Another approach is to track this ourselves. Since it was easy enough to do and provides slightly better UX, I've opted to do the latter. Here's how it's done:

Inside of clnext() once we've set up the flags but before returning I set the err flag to ErrUnknown:

    o->err = ClErrUnknown;
    return 1;

And then inside of clopt() if we find a match then the err field gets cleared to zero. Now on the next call to clnext() if we find that o->err has not been cleared to zero, then we know that the flag was not accepted and we can stop the option parsing:

clnext(CLOpt *o)
{
    if (o->err) return 0;
    // ...
}

Are we shortopt chaining?

If the user provides us this "-abc", how should it be parsed? The answer depends entirely on whether the flag -a accepts an argument or not. If yes, then "bc" will be treated as the argument, as if user provided -a bc. If no, then the next character will be treated as a flag, as if the user provided -a -bc. This also recursively applies to -bc, it can either be flag -b with argument "c" or flags -b and -c depending on whether -b accepts an argument or not.

This shortopts "chaining" makes things a bit difficult. With longopt there's no chaining, we always move on to the next argv element. But if we're in the middle of shortopt processing then we might need to treat the next characters as flags before we can move to the next element. This is roughly how I've dealt with it:

if (...) {
    // move to the next argv
} else {
    // otherwise we're chaining shortopts, move to the next character
    ++o->flag;
}

flag is a character pointer which points to the current flag, for chaining we can simply increment the pointer. As for the if condition that I've left blank, it is supposed to detect cases where we should move to the next argv. There's 3 cases for that: first is the case when we're processing longopt; this is detected by o->len > 1. o->len tracks the length of the flag, which is always above 1 in case of longopt, so in those cases we can move to the next element. The second is when o->flag[1] is nul, i.e we've reached the end of the string on the current shortopt. For the third and final case, we need to consider if the rest of the string was already consumed as an argument. In our example, if -a accepted "bc" as the argument, then there's nothing left to chain and so we should also move to the next argv.

Was an argument provided but not accepted?

This only concerns longopt where an explicit argument was provided by =, e.g --key=value. But if the flag does not accept an argument then we need to detect that and set the appropriate error. I deal with it as follows inside of clnext(), right after the check to see if the flag itself was accepted or not:

if (o->len > 1 && o->flag[o->len] == '=' &&
    o->arg != o->flag + o->len + 1) return !(o->err = ClErrTooMany);

It's a bit terse, but it reads as follows: if the previous flag was a longopt and an argument was provided via the '=' syntax but the argument was not consumed by the usage code, then we have excess arguments.


If you recall the error printing earlier and remember that we didn't simply print o->flag with %s but limited it to o->len, the o->flag[o->len] == '=' check above along with how shortopt chaining works should give a hint as to why that was necessary. In case the user used = syntax then o->flag will contain both the flag and the argument. When printing errors we only want to show the erroneous flag, not the trailing argument as well hence we limit it to o->len. For similar reasons we don't want to output the whole string when shortopt chaining either. You could also hide these details away from the user behind some error printing function for slightly better UX.

These were some of the key design questions, the rest of the implementation is fairly straight forwards.

Argument permutation

So far so good. But one convenient feature that's currently missing is GNU style argument permutation. Since argument permutation violates POSIX, not every libc supports it, for example musl's getopt doesn't do it, and even glibc won't do it if the environment variable POSIXLY_CORRECT is defined. But nevertheless, it's a very convenient feature that allows you to append a flag to a previous command line without having to navigate backwards.

One way to implement it would be to return an enum indicating whether we found a flag or an argument. If the former everything will happen as before, if latter then the user can collect that flag onto an array or list and continue to the next iteration. Or we could do the collection ourselves inside clnext() though that'd be a bit more complicated and either require allocation or the user to provide a buffer large enough. Even more complicated would be to simply permute the argv instead. This would also change the interface such that it now no longer leaves the argv untouched. I'll leave this one as an exercise for the readers.

Closing thoughts

I've been using this immediate-mode option parsing api for a lot of my small programs (e.g sxcs, selx, nix-compress) and have generally been pretty happy with it. It's not super feature packed (e.g auto-generating help-string etc) but for small projects it's a simple, bloat-free solution that's easy to code up from scratch if needed. Having a dedicated option parser also gives you the benefit that your option parsing will work consistently across platforms (even on windows, assuming you have a "platform layer" where you convert the utf16 to utf8 and use that internally).

Tags: [ c ]



RSS Feed