strlcpy(3)
is an OpenBSD function which is often hailed as a safer
replacement for strcpy(3)
and/or strncpy(3)
. One of the obvious issues with
strlcpy is that it's not safe if src
isn't null-terminated. However, that's
not the reason I don't like strlcpy. In fact despite knowing this, I used to
think it was an okay function and better than the (thought of) alternative
strncpy
(which doesn't null-terminate in case of truncation).
The reason why I'm not a fan of it is because I've recently revisited this old thread where Ulrich Drepper rejects the proposal to add strlcpy to glibc. His reasoning was:
So I decided to spent a single second thinking about the problem and realized that yup, there's no reason to be using strlcpy.
In order to understand why a solution isn't good or optimal, one first needs to properly understand what the problem actually is. The problem we have in our hands is that we're trying to copy a "c-string" or a "null-terminated string" .
This problem can be divided into two main categories:
Strictly speaking, the C standard defines a string as:
A string is a contiguous sequence of characters terminated by and including the first null character.
So the term "null-terminated string" is redundant since if it isn't null-terminated, it's not a string according to the C standard.
There are many cases where truncation doesn't matter. For example, if we're
trying to display the result on some statusbar which can only hold, let's say 64
bytes, then not only is the src
string getting truncated is not an issue,
it's actually wanted since it makes no sense to copy more than that.
So assuming buf
is a fixed length array, char buf[64]
; an strlcpy call would
look like the following:
strlcpy(buf, src, sizeof buf);
If you spend a single second thinking about what you've just done, you'd realize
that this is a horribly inefficient solution to the problem at hand. Since
strlcpy returns strlen(src)
, and src
might very well be let's say a
thousands byte long. By using strlcpy, you are going though all of those bytes
in a O(n) strlen call for absolutely no reason when all you really cared about
is the first 64 bytes of src
.
Similarly, strncpy(3)
is also not a good solution in this case because:
dst
with 0 in case src
length happens
to be less than dst
length.A much better solution here is using memccpy(3)
instead, which will only scan
the first 64 bytes of src
at most and nothing past that:
if (memccpy(buf, src, '\0', sizeof buf) == NULL)
buf[sizeof buf - 1] = '\0'; /* truncation occured, null-terminate manually. */
And in case you're worried about this being two-liners as opposed to a
one-liner strlcpy call (which is a valid criticism, as that can open up room
for making mistake), then you can (and should) simply turn this two liner into
a wrapper function. Optionally giving it a _trunc
prefix to make it clear
that truncation is desired here.
mystrcpy_trunc(buf, src, sizeof buf);
Not only is this more efficient, it makes it clear that truncation is desired
here just by looking at the function name. Whereas in the case of an unchecked
strlcpy
call it's not immediately obvious weather truncation is desired or
the programmer forgot to check for it.
In cases where truncation matters, for example file-path, url etc, you simply SHOULD NOT be using a fixed size buffer. You have to be able to do dynamic allocation and grow the buffer as needed.
The most simplest way of achieving this is just using strdup(3)
, which will do
the allocation for us:
char *s = strdup(src);
if (s == NULL)
return ERROR;
/* do stuff with s */
free(s); /* don't forget to free */
And in case you already have an allocated buffer and you'd like to reuse it, the
following is what an strlcpy usage would looks like; assuming char *dbuf
is a
buffer that's been allocated size_t dbuflen
bytes via malloc(3)
:
size_t srclen = strlcpy(dbuf, src, dbuflen) + 1;
if (srclen > dbuflen) {
char *tmp = realloc(dbuf, srclen);
if (tmp == NULL) {
free(dbuf);
return ERROR;
}
dbuf = tmp;
dbuflen = srclen;
strlcpy(dbuf, src, dbuflen);
}
Let's ignore just how inefficient this is for the moment, instead I'd like to point out just how schizo this entire logic is:
A much better solution here is to just use strlen(3)
and memcpy(3)
:
size_t srclen = strlen(src) + 1;
if (srclen > dbuflen) {
void *tmp = realloc(dbuf, srclen);
if (tmp == NULL) {
free(dbuf);
return ERROR;
}
dbuf = tmp;
dbuflen = srclen;
}
memcpy(dbuf, src, srclen);
This version is far more efficient than the strlcpy version. But more importantly, the logic here is much more sane and natural:
src
is.Let's not talk about how you got yourself into this situation, but instead let's
talk about what you can actually do here. And pretty much the only thing you
can do here, is to error out if src
happens to be bigger than your buffer.
The following would be the strlcpy implementation:
if (strlcpy(buf, src, sizeof buf) >= sizeof buf)
return ERROR; /* or call exit/abort if appropriate */
And once again, this is horribly inefficient because we don't want to know the
exact length of src
, we only want to know if it's bigger than our buffer size
or not.
A much better way to do that is, once again, using memccpy. Which is not only
much faster than strlcpy, it actually makes much more logical sense for what
we're trying to do, as it won't scan src
for more than sizeof buf
:
if (memccpy(buf, src, '\0', sizeof buf) == NULL)
return ERROR; /* truncation */
The mem
and str
prefix is often a source of confusion for amateur
programmers who get tricked into thinking that you MUST use only str*
functions on strings.
In case you've forgotten what a (c-)string is, it's nothing more than an array of bytes terminated by a null character. Any attempt of trying to think of them as anything else is going to result in error sooner or later because the programmer has a fundamental misunderstanding of what a string even is.
Moreover, where exactly are these mem*
functions declared? Let's take a look
at the manpage:
$ man 3 memccpy
SYNOPSIS
#include <string.h>
Now let's take a look at what the C standard
has to say about <string.h>
:
The header string.h declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type.
In other words, there's nothing "improper", "bad practice", "code-smell" or
whatever with using the mem*
family of functions for operating on strings,
because strings are just an array of (null-terminated) characters afterall.
One other criticism of this article might be that memccpy(3)
and strdup(3)
,
while defined in POSIX, is not defined in the ISO C standard . But this
argument makes no sense because strlcpy(3)
is defined in neither, and in
practice is less portable as well.
Both memccpy
and strdup
have been accepted into the
C2x draft.
In the end, I'm unable to find a situation where strlcpy
makes sense. In pretty
much all the cases, I'd much rather use memccpy
, memcpy
or strdup
instead.
So needless to say, I'm not a fan of strlcpy
.
About a week after writing this article, I came across the following related articles, all of which are good reads:
Tags: [ c ]