isdigit: simple function, multiple implementation

09 Jul 2022

isdigit is a very simple function which tells you weather a character is a digit or not. But the simplicity is deceiving, at least on the C standard library.

You see, pretty much all the functions in <ctype.h> is either useless or rather difficult to use correctly due to the following reasons:

  1. They're locale dependant. This means you will get different results on the same input depending on the system locale.
  2. The is* variants don't actually return a boolean (i.e 0 or 1), instead they return 0 on failure but any non-zero value on success.
    What the... I mean to be fair, this doesn't matter on most cases such as if (isdigit(ch)), however code like this: int v[2]; v[isdigit(ch)] = 0; is broken and has to be converted to v[!!isdigit(ch)].
  3. If the argument to these functions aren't representable in unsigned char (or EOF) then the behavior is undefined.
    What the fuck (?!) This means innocent looking code like: char ch = getchar(); if (isdigit(ch)) is actually UB since user might input some negative character and you must cast the argument if (isdigit((unsigned char)ch)) to avoid triggering UB.

Due to these braindeaths, it's very common for people to roll their own ctype macros, such as #define ISDIGIT(X) ((X) >= '0' && (X) <= '9'). While this is the obvious implementation, is this actually the best one?

Today I want to take a look at a couple different ways to implement the function isdigit and analyze some of the performance characteristic of them. Such nano-optimizations is unlikely to matter on most software, but this was done mostly out of scratching my own curiosity.

Some assertions

Before diving in, I'd like to assert a couple different things about the functions we're going to implement which differ from the standard C ones:

With these out of the way, let's begin.


EOF has it's own set of braindeaths which I don't want to get into on this article.


The simple version

The simple version simply checks weather ch falls between '0' to '9' or not:

int
isdigit_simple(unsigned char ch)
{
    return ch >= '0' && ch <= '9';
}

This has a couple nice properties:

Pretty much the only downside of this approach is that the code is "branch-y", which can cause performance penalties if the processor mispredicts the branch.


Assuming we change the function to take a uint32_t argument instead of an unsigned char. And of course, when I say it would "work" on Unicode codepoints, I mean that it's going to pick out the ASCII digits out of there, not digits of other languages.


The lookup table version

To overcome the branching, a rather popular approach is to have a "Look-up-table" (aka LUT) so that no branching needs to occur:

int
isdigit_lut256(unsigned char ch)
{
    static const unsigned char lut[256] = {
        ['0'] = 1, ['1'] = 1, ['2'] = 1, ['3'] = 1, ['4'] = 1,
        ['5'] = 1, ['6'] = 1, ['7'] = 1, ['8'] = 1, ['9'] = 1,
    };
    return lut[ch];
}

This solves the branching problem, however it introduces a whole bunch of other problems:

That's a lot of downsides just to avoid a branch.

The compact lookup table version

This version compresses the LUT by assigning each character it's own bit rather than a whole byte. Unlike the above two, this version is not very commonly found in practice (and for good reasons).

int
isdigit_lut32(unsigned char ch)
{
    enum { UBITS = sizeof(unsigned int) * 8 }; /* assumes CHAR_BIT == 8 */
    static const unsigned int lut[256 / UBITS] = {
        ['0' / UBITS] = 1023U << ('0' % UBITS),
    };
    return lut[ch / UBITS] & (1U << (ch % UBITS));
}

The code seem a bit scary if you're unfamiliar with bit-twiddling, but it's basically converting the byte table above to a bit table instead.

This version basically inherits all of the problem with original LUT approach:

Overall I suspect this version to be the worst out of all.

The bithack version

This version uses some basic bithacking to achieve branchlessness (is this a word?) and thus avoids all the memory access related problems found on the LUT variants. Lo and behold:

int
isdigit_bithacks(unsigned char ch)
{
    unsigned int ge0 = (unsigned int)ch + ((0xFFU - '0') + 1);
    unsigned int gt9 = (unsigned int)ch +  (0xFFU - '9');
    return (ge0 ^ gt9) >> 8;
}

Are you beholding? Have you beheld? In case you're done behelding, I'd like to explain how this variants works. While the trick used here is rather basic, I have never seen this variant be used anywhere and thus think it's worth an explanation.

The very first thing to realize is that the logic used here is kinda retarded. Instead of checking weather ch is between '0' and '9' it instead:

I say this is retarded because I don't think anyone naturally thinks like this. But it works out because:

Okay, so now that the logic is out of the way, it's time to explain the magic numbers.

Since we're talking in an unsigned char as argument, it means that when ch gets promoted to an unsigned int, the 9th bit will be clear. We use this bit as a "trap bit".

Now to prepare our "magic number". We need an integer with all the least significant 8 bits set to 1, 0xFF. Now if we want to check weather ch is greater than '9' (which is 57 in decimal) we simply need to subtract it from 0xFF.

      0xFF: 00000000 11111111
       '9': 00000000 00111001
0xFF - '9': 00000000 11000110  /* magic */

This gives us our "magic number", which is setup in such a way that if you add anything greater than '9' to it, the carry will propagate onto the 9th bit.

Let's see this in action, first let's add '4' (52 in decimal) to our magic number.

  magic: 00000000 11000110
+   '4': 00000000 00110100
----------------------------
         00000000 11111010

As expected, the 9th bit remains 0 since '4' is not greater than '9'. Let's try it with a value which is greater than '9'. I've picked 'A' for this purpose, which is 65 in decimal:

  magic: 00000000 11000110
+   'A': 00000000 01000001
---------------------------
         00000001 00000111

Tada! The 9th bit is set! This should explain how the gt9 check works. The ge0 check works the same way, but since we want to check greater or equal we add 1 to our magic number instead.

And finally the >> 8 simply clears out the garbage bits, and as a consequence gives us a nice boolean return value. And now with that out of the way, we can return back to analyzing the performance implications.


This pretty much has all the good quality we're looking for:

So what's the catch? Well, it's branchless...

While being branchless is typically a good thing since we avoid paying for misprediction penalties this also means that we cannot get the benefit of good predictions either. So the branch predictor cannot hurt us, but neither can it help us. This may end up costing us some throughput on predicable inputs.

Benchmarks

Enough hypothesis, time for some benchmarks!

Now, keep in mind that in order to benchmark properly you need to benchmark from multiple different angle (both internal and external), do enough iterations and then do proper statistical analysis. But because I'm not writing a research paper, I've opted not to do that, so take these results with a grain of salt.

System info: 3700x processor, 3200mhz 16-18-18-32 8x2 memory, GNU/Linux OS.


First I want to see how these functions perform when called just once. This is to try and see weather the hypothesis of unreliable performance of the LUT variants due to cache-miss was correct or not. The results:

Variant Avg 2% lows
simple 0.000042 0.000054
lut256 0.000379 0.000491
lut32 0.000393 0.000749
bithacks 0.000041 0.000051

The results for the "simple" and "bithacks" version are quite noisy and vary a good amount between runs even with 1024 iterations. However a couple things are consistent:


Now let's see how they perform in a loop on a 64KiB buffer filled with random bytes. Note that this is highly impractical since very rarely, if ever you will need to run a loop with only isdigit in there.

Variant Avg
simple 0.038810
lut256 0.017070
lut32 0.037540
bithacks 0.036480

These results are much more consistent. Surprising, perhaps no one, the "lut256" variant crushes it and the "simple" and "lut32" variant performs the worst.

The "bithacks" variant seems to beat the "simple" and "lut32" variant, but keep in mind that the random input makes things harder to predict and thus makes this benchmark biased against the "simple" variant.

Conclusion

Source code

All the source code as well as the benchmarking code in this article can be found here.

Tags: [ c, performance ]



RSS Feed