I'm not a fan of strlcpy(3)

03 Jun 2022

strlcpy(3) is an OpenBSD function which is often hailed as a safer replacement for strcpy(3) and/or strncpy(3). One of the obvious issues with strlcpy is that it's not safe if src isn't null-terminated. However, that's not the reason I don't like strlcpy. In fact despite knowing this, I used to think it was an okay function and better than the (thought of) alternative strncpy (which doesn't null-terminate in case of truncation).

The reason why I'm not a fan of it is because I've recently revisited this old thread where Ulrich Drepper rejects the proposal to add strlcpy to glibc. His reasoning was:

So I decided to spent a single second thinking about the problem and realized that yup, there's no reason to be using strlcpy.

What exactly is the problem?

In order to understand why a solution isn't good or optimal, one first needs to properly understand what the problem actually is. The problem we have in our hands is that we're trying to copy a "c-string" or a "null-terminated string" .

This problem can be divided into two main categories:

  1. Cases where truncation matters.
  2. Cases where truncation doesn't matter.

Strictly speaking, the C standard defines a string as:

A string is a contiguous sequence of characters terminated by and including the first null character.

So the term "null-terminated string" is redundant since if it isn't null-terminated, it's not a string according to the C standard.


Truncation doesn't matter

There are many cases where truncation doesn't matter. For example, if we're trying to display the result on some statusbar which can only hold, let's say 64 bytes, then not only is the src string getting truncated is not an issue, it's actually wanted since it makes no sense to copy more than that.

So assuming buf is a fixed length array, char buf[64]; an strlcpy call would look like the following:

strlcpy(buf, src, sizeof buf);

If you spend a single second thinking about what you've just done, you'd realize that this is a horribly inefficient solution to the problem at hand. Since strlcpy returns strlen(src), and src might very well be let's say a thousands byte long. By using strlcpy, you are going though all of those bytes in a O(n) strlen call for absolutely no reason when all you really cared about is the first 64 bytes of src.

Similarly, strncpy(3) is also not a good solution in this case because:

  1. It doesn't null-terminate in case of truncation.
  2. It unnecessarily pads the rest of dst with 0 in case src length happens to be less than dst length.

A much better solution here is using memccpy(3) instead, which will only scan the first 64 bytes of src at most and nothing past that:

if (memccpy(buf, src, '\0', sizeof buf) == NULL)
    buf[sizeof buf - 1] = '\0'; /* truncation occured, null-terminate manually. */

And in case you're worried about this being two-liners as opposed to a one-liner strlcpy call (which is a valid criticism, as that can open up room for making mistake), then you can (and should) simply turn this two liner into a wrapper function. Optionally giving it a _trunc prefix to make it clear that truncation is desired here.

mystrcpy_trunc(buf, src, sizeof buf);

Not only is this more efficient, it makes it clear that truncation is desired here just by looking at the function name. Whereas in the case of an unchecked strlcpy call it's not immediately obvious weather truncation is desired or the programmer forgot to check for it.

Truncation matters

In cases where truncation matters, for example file-path, url etc, you simply SHOULD NOT be using a fixed size buffer. You have to be able to do dynamic allocation and grow the buffer as needed.

The most simplest way of achieving this is just using strdup(3), which will do the allocation for us:

char *s = strdup(src);
if (s == NULL)
    return ERROR;
/* do stuff with s */
free(s); /* don't forget to free */

And in case you already have an allocated buffer and you'd like to reuse it, the following is what an strlcpy usage would looks like; assuming char *dbuf is a buffer that's been allocated size_t dbuflen bytes via malloc(3):

size_t srclen = strlcpy(dbuf, src, dbuflen) + 1;
if (srclen > dbuflen) {
    char *tmp = realloc(dbuf, srclen);
    if (tmp == NULL) {
        free(dbuf);
        return ERROR;
    }
    dbuf = tmp;
    dbuflen = srclen;
    strlcpy(dbuf, src, dbuflen);
}

Let's ignore just how inefficient this is for the moment, instead I'd like to point out just how schizo this entire logic is:

A much better solution here is to just use strlen(3) and memcpy(3):

size_t srclen = strlen(src) + 1;
if (srclen > dbuflen) {
    void *tmp = realloc(dbuf, srclen);
    if (tmp == NULL) {
        free(dbuf);
        return ERROR;
    }
    dbuf = tmp;
    dbuflen = srclen;
}
memcpy(dbuf, src, srclen);

This version is far more efficient than the strlcpy version. But more importantly, the logic here is much more sane and natural:

Truncation matters, but I'm stuck with a fixed size buffer

Let's not talk about how you got yourself into this situation, but instead let's talk about what you can actually do here. And pretty much the only thing you can do here, is to error out if src happens to be bigger than your buffer.

The following would be the strlcpy implementation:

if (strlcpy(buf, src, sizeof buf) >= sizeof buf)
    return ERROR; /* or call exit/abort if appropriate */

And once again, this is horribly inefficient because we don't want to know the exact length of src, we only want to know if it's bigger than our buffer size or not.

A much better way to do that is, once again, using memccpy. Which is not only much faster than strlcpy, it actually makes much more logical sense for what we're trying to do, as it won't scan src for more than sizeof buf:

if (memccpy(buf, src, '\0', sizeof buf) == NULL)
    return ERROR; /* truncation */

B-but using mem* functions on string is... bad!

The mem and str prefix is often a source of confusion for amateur programmers who get tricked into thinking that you MUST use only str* functions on strings.

In case you've forgotten what a (c-)string is, it's nothing more than an array of bytes terminated by a null character. Any attempt of trying to think of them as anything else is going to result in error sooner or later because the programmer has a fundamental misunderstanding of what a string even is.

Moreover, where exactly are these mem* functions declared? Let's take a look at the manpage:

$ man 3 memccpy
SYNOPSIS
       #include <string.h>

Now let's take a look at what the C standard has to say about <string.h>:

The header string.h declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type.

In other words, there's nothing "improper", "bad practice", "code-smell" or whatever with using the mem* family of functions for operating on strings, because strings are just an array of (null-terminated) characters afterall.

One other criticism of this article might be that memccpy(3) and strdup(3), while defined in POSIX, is not defined in the ISO C standard . But this argument makes no sense because strlcpy(3) is defined in neither, and in practice is less portable as well.


Both memccpy and strdup have been accepted into the C2x draft.


Conclusion

In the end, I'm unable to find a situation where strlcpy makes sense. In pretty much all the cases, I'd much rather use memccpy, memcpy or strdup instead. So needless to say, I'm not a fan of strlcpy.

About a week after writing this article, I came across the following related articles, all of which are good reads:

Tags: [ c ]



RSS Feed