I was recently reading Tony Finch's getopt() but smaller article
where he turns getopt
into an "immediate/imperative" mode api and cuts the
code-size in the process.
This made me realize that I've been using similar style option parser for quite
some time now.
I haven't written about it since it's rather small in code size and
straightforward to use (more so than getopt
or getopt_long
) - nothing mind
blowing or novel.
But that's precisely the reason why it deserves some spotlight; despite being so
small and easy, it's barely used.
If you aren't familiar with getopt, here's how the usage roughly looks like
for a program that accepts -a
flag and -b
flag which requires an argument
also:
for (int opt; (opt = getopt(argc, argv, "ab:")) != -1;) {
switch (opt) {
case 'a': aflag = 1; break;
case 'b': bflag_arg = optarg; break;
case '?': // unknown flag
exit(1);
}
}
Because getopt doesn't know which flags the program accepts, you need to tell it
via the optstring
argument "ab:"
.
The :
after b
indicates it accepts an argument.
But because the interface is half-declarative and half-imperative, you get the
worst of both worlds and then you need to duplicate the checks inside the switch
statement.
With getopt_long
the duplication issue gets even worse as the flags would need
to duplicated again in the longopt
array.
The idea with an immediate mode API here is to avoid trying to do too much inside the function and instead simply parse & provide the flag (and argument) to the user which he can then either accept or reject. The following is a small demo at how you'd use such an API:
extern int
main(int argc, char *argv[])
{
CLOpt o[1] = { clinit(argv) };
// ...
}
We start by initializing the CLOpt
structure.
I'm passing argv
but not argc
, this means that the parser counts on the
sentinel NULL
pointer at the end of argv
.
It wouldn't be too difficult to adjust it to take in argc
rather than relying
on sentinel NULL
if you wish to do so.
I could've also done the more conventional CLOpt o = clinit(argv);
but then
I'd have to use &o
when calling the other functions since they're expecting a
pointer.
Using an array of single element allows me to just use o
instead.
This trick - while it might seem a bit retarded at first - I've found is quite
useful when refactoring code out into a function, you no longer need to do the
whole a.member
to a->member
change anymore.
while (clnext(o)) {
// ...
}
Then we keep calling clnext
in a loop.
It will parse and prepare the next argument for us.
We can then accept the argument in a traditional imperative if-else chain inside
the loop:
if (clopt(o, 'y', NULL)) {
printf("-y: enabled!\n");
}
This is a short option -y
(with no equivalent long option) which accepts no
argument.
else if (clopt(o, 'n', "name") && clarg(o)) {
printf("name: %s\n", o->arg);
}
Here's option -n
(with --name
as long option) which also accepts an
argument.
But since this is an immediate mode you need to call clarg()
to let the parser
know about the fact that you're expecting an argument.
If the call returns true, then the argument will be available via o->arg
.
else if (clopt(o, 0, "optional")) {
if (cloptarg(o)) {
printf("optional: %s\n", o->arg);
} else {
printf("optional: <default>\n");
}
}
Finally, we have argument --optional
which accepts an optional argument with
no equivalent short option.
Since the argument is optional, we call cloptarg()
to retrieve it.
If the optional argument was provided, it will be available via o->arg
similar
to mandatory argument.
Otherwise you can do whatever fallback you have, in the example it just prints
<default>
if the optional argument was not given.
while (clnext(o)) {
// ...
}
if (o->err) {
fprintf(stderr, "ERROR: -%.*s: %s\n", o->len, o->flag, CL_ERR[o->err]);
return 1;
}
The loop will exit when we've parsed all option or encountered an error.
In case of an error, we'll just print a message to stderr
and exit.
Note that o->flag
is not necessarily a nul-terminated string, and so it's
necessary to use %.*s
and print up to o->len
only.
The CL_ERR
array turns the error code into a human readable error message.
printf("Remaining args: { ");
for (; o->argv[0]; ++o->argv)
printf("%s%s", o->argv[0], o->argv[1] ? ", " : " ");
printf("}\n");
Otherwise if there was no error, then we can access the rest of the arguments
via o->argv
.
And that's all there is to it. Pretty simple, right? Here's the whole demo in one snippet:
extern int
main(int argc, char *argv[])
{
CLOpt o[1] = { clinit(argv) };
while (clnext(o)) {
/****/ if (clopt(o, 'y', NULL)) {
printf("-y: enabled!\n");
} else if (clopt(o, 0x0, "long")) {
printf("--long: enabled\n");
} else if (clopt(o, 'n', "name") && clarg(o)) {
printf("name: %s\n", o->arg);
} else if (clopt(o, 0, "optional")) {
if (cloptarg(o)) {
printf("optional: %s\n", o->arg);
} else {
printf("optional: <default>\n");
}
}
}
if (o->err) {
fprintf(stderr, "ERROR: -%.*s: %s\n", o->len, o->flag, CL_ERR[o->err]);
return 1;
}
printf("Remaining args: { ");
for (; o->argv[0]; ++o->argv)
printf("%s%s", o->argv[0], o->argv[1] ? ", " : " ");
printf("}\n");
}
For the implementation, here are a couple high level goals that I'd want to fulfill:
argv
).The source code can be found here.
It's about 60 lines of C and achieves all of the goals outlined above.
I usually avoid nul-terminated strings as they are the source
of many bugs and headaches.
But since this demo is meant to be for the wider audience, my implementation
expects nul-terminated string, the one you'd get right out of standard C argv
.
The code is also dedicated to public domain, so you can use it for whatever
purpose you want, or modify it to use non nul-strings etc.
Given how small the entire implementation is, I could simply walk though it line by line. However I'm instead going to outline a couple key implementation questions and how I've dealt with them. This should give you a better understanding of how to write one yourself, perhaps in a different language.
Was the flag accepted?
Since we don't keep a table of options, we need to somehow determine if the flag
was accepted or not.
The simplest way to do it is to let the user track it themselves by adding an
else
branch at the end of their if-else
clopt chain.
Another approach is to track this ourselves.
Since it was easy enough to do and provides slightly better UX, I've opted to do
the latter.
Here's how it's done:
Inside of clnext()
once we've set up the flags but before returning I set the
err
flag to ErrUnknown
:
o->err = ClErrUnknown;
return 1;
And then inside of clopt()
if we find a match then the err
field gets
cleared to zero.
Now on the next call to clnext()
if we find that o->err
has not been cleared
to zero, then we know that the flag was not accepted and we can stop the option
parsing:
clnext(CLOpt *o)
{
if (o->err) return 0;
// ...
}
Are we shortopt chaining?
If the user provides us this "-abc"
, how should it be parsed?
The answer depends entirely on whether the flag -a
accepts an argument or not.
If yes, then "bc"
will be treated as the argument, as if user provided -a bc
.
If no, then the next character will be treated as a flag, as if the user
provided -a -bc
.
This also recursively applies to -bc
, it can either be flag -b
with argument
"c"
or flags -b
and -c
depending on whether -b
accepts an argument or
not.
This shortopts "chaining" makes things a bit difficult.
With longopt there's no chaining, we always move on to the next argv
element.
But if we're in the middle of shortopt processing then we might need to treat
the next characters as flags before we can move to the next element.
This is roughly how I've dealt with it:
if (...) {
// move to the next argv
} else {
// otherwise we're chaining shortopts, move to the next character
++o->flag;
}
flag
is a character pointer which points to the current flag, for chaining we
can simply increment the pointer.
As for the if condition that I've left blank, it is supposed to detect cases
where we should move to the next argv
.
There's 3 cases for that: first is the case when we're processing longopt; this
is detected by o->len > 1
.
o->len
tracks the length of the flag, which is always above 1 in case of
longopt, so in those cases we can move to the next element.
The second is when o->flag[1]
is nul, i.e we've reached the end of the string
on the current shortopt.
For the third and final case, we need to consider if the rest of the string was
already consumed as an argument.
In our example, if -a
accepted "bc"
as the argument, then there's nothing
left to chain and so we should also move to the next argv
.
Was an argument provided but not accepted?
This only concerns longopt where an explicit argument was provided by =
, e.g
--key=value
.
But if the flag does not accept an argument then we need to detect that and
set the appropriate error.
I deal with it as follows inside of clnext()
, right after the check to see if
the flag itself was accepted or not:
if (o->len > 1 && o->flag[o->len] == '=' &&
o->arg != o->flag + o->len + 1) return !(o->err = ClErrTooMany);
It's a bit terse, but it reads as follows: if the previous flag was a longopt
and an argument was provided via the '='
syntax but the argument was not
consumed by the usage code, then we have excess arguments.
If you recall the error printing earlier and remember that we didn't simply
print o->flag
with %s
but limited it to o->len
, the
o->flag[o->len] == '='
check above along with how shortopt chaining works
should give a hint as to why that was necessary.
In case the user used =
syntax then o->flag
will contain both the flag and
the argument.
When printing errors we only want to show the erroneous flag, not the trailing
argument as well hence we limit it to o->len
.
For similar reasons we don't want to output the whole string when shortopt
chaining either.
You could also hide these details away from the user behind some error printing
function for slightly better UX.
These were some of the key design questions, the rest of the implementation is fairly straight forwards.
So far so good.
But one convenient feature that's currently missing is GNU style argument
permutation.
Since argument permutation violates POSIX, not every libc supports it, for
example musl's getopt
doesn't do it, and even glibc won't do it if the
environment variable POSIXLY_CORRECT
is defined.
But nevertheless, it's a very convenient feature that allows you to append a
flag to a previous command line without having to navigate backwards.
One way to implement it would be to return an enum indicating whether we found a
flag or an argument.
If the former everything will happen as before, if latter then the user can
collect that flag onto an array or list and continue to the next iteration.
Or we could do the collection ourselves inside clnext()
though that'd be a bit
more complicated and either require allocation or the user to provide a buffer
large enough.
Even more complicated would be to simply permute the argv
instead.
This would also change the interface such that it now no longer leaves the
argv
untouched.
I'll leave this one as an exercise for the readers.
I've been using this immediate-mode option parsing api for a lot of my small programs (e.g sxcs, selx, nix-compress) and have generally been pretty happy with it. It's not super feature packed (e.g auto-generating help-string etc) but for small projects it's a simple, bloat-free solution that's easy to code up from scratch if needed. Having a dedicated option parser also gives you the benefit that your option parsing will work consistently across platforms (even on windows, assuming you have a "platform layer" where you convert the utf16 to utf8 and use that internally).
Tags: [ c ]