Developer experiences from the trenches

Low Overhead Structured Logging in C

Mon 27 December 2021 by Michael Labbe
tags code

Sure, we call it printf debugging, but as projects get more complex, we aren’t literally calling printf. As our software is adopted we progress to troubleshooting user’s logs. If we’re granted enough success we are faced with statistically quantifying occurrences from many users logs via telemetry.

This is best served by structured logging. There are many formal definitions of structured logs, but for our purposes a simple example will do. Say you have this log line:

Player disconnected.  Player count: 12

A structured log breaks fields out to make further processing and filtering trivial. This example uses JSON but it can be anything consistent that a machine can process:

{
    "file": "src/network.c",
    "line": 1000,
    "timestamp": "1640625081",
    "msg":  "Player disconnected.",
    "player_count": 12
}

The structured version looks like a message that can be processed and graphed by a telemetry system, or easily searched and filtered in a logging tool.

Coming from backend development where structured logging is expected and common, I wanted this function in my next game client. Because client performance is finite, I sought a way to assemble these messages using the preprocessor.

In practice, a fully structured message can be logged in one single function call without unnecessary heap allocation, unnecessary format specifier processing or runtime string concatenation. Much of the overhead in traditional loggers can be entirely avoided by building the logic using the C preprocessor.

The rest of this post illustrates how you might write your own structured logger using the C preprocessor to avoid runtime overhead. Each step takes the concept further than the one before it, building on the previous implementation and concepts.

Rather than supply a one-size-fits-all library, I suggest you roll your own logger. Stop when the functionality is adequate for your situation. Change the functionality where it does not fit your needs.

Step Zero: printf()

The most basic logger just calls printf(). For modest projects, you’re done! Just remember to clean up your logs before you ship.

Step One: Wrap Printf()

It is useful to wrap printf() in a variadic macro, which can be toggled off and on to reduce spam and gain performance on a per-compilation unit basis.

As of C99, we have variadic macros. The following compiles on Visual C++ 2019, clang and gcc with --std=gnu99.

#define LOGF(fmt, ...) fprintf(stdout, fmt, __VA_ARGS__);
#define LOG(comment) puts(comment);

Why do we need a special case LOG() that is separate from LOGF()? Unfortunately, having zero variadic arguments is not C99 compliant and, in practice, generates warnings. For conciseness, the rest of this post only provides examples for LOGF(). However, a full implementation would handle both the zero argument and the non-zero argument cases wholly.

Step Two: Route Logs

Having logs print to stdout, also append to disk or submit to a telemetry ingestor are all useful scenarios. Changing where logs are emitted can be reduced to a series of preprocessor config flags.

// config
#define LOG_USE_STDIO 1
#define LOG_USE_FILE 1

// implementation
#if LOG_USE_STDIO
#  define LOG__STDIO_FPRINTF(stream, fmt, ...) \
      fprintf(stream, fmt, __VA_ARGS__);
#else
#  define LOG__STDIO_FPRINTF(stream, fmt, ...)
#endif

#if LOG_USE_FILE
#  define LOG__FILE(stream, fmt, ...)          \
      fprintf(stream, fmt, __VA_ARGS__);       \
      fflush(stream);
#else
#  define LOG__FILE(stream, fmt, ...)
#endif

Note the call to fflush() ensures the stream is correctly written to disk in the result of an unscheduled process termination. This can be essential for troubleshooting the source of the crash!

Now we have two macros LOG__STDIO_FPRINTF() and LOG__FILE() which either log to the respective targets or do nothing, depending on configuration. The double underscore indicates the macro is internal, and not part of the logging API. Replace the previous LOGF() definition with the following:

#define LOGF(fmt, ...)                             \
{                                                  \
    LOG__STDIO_FPRINTF(stdout, fmt, __VA_ARGS__);  \
    LOG__FILE(handle, fmt, __VA_ARGS__);           \
}

Opening handle prior to calling LOGF() is an exercise left up to the implementer.

Step Three: Log Levels

Limiting logging to higher severity messages is useful for reducing spam. Traditional logging packages call into a variadic function, expending resources before determining the log level does not permit the log line to be written.

We can do better — our logging system simply generates no code at all if the log level is too low.

#define LOG_LEVEL_TRACE 0
#define LOG_LEVEL_WARN  1
#define LOG_LEVEL_ERROR 2

#if !defined(LOG_LEVEL)
#    define LOG_LEVEL LOG_LEVEL_TRACE
#endif

The test for LOG_LEVEL being pre-defined allows a user to control the log level on a per-compilation unit basis before including the logging header file. This is extremely useful for getting detailed logs for only the section of code you are currently working on.

With this in mind, delete the LOGF definition, and replace it with the following:

#define LOG__DECL_LOGLEVELF(T, fmt, ...)           \
{                                                  \
    LOG__STDIO_FPRINTF(stdout, T ": " fmt, __VA_ARGS__);  \
    LOG__FILE(handle, T ": " fmt, __VA_ARGS__);           \
}

#if LOG_LEVEL == LOG_LEVEL_TRACE
#  define LOG_TRACEF(fmt, ...) LOG__DECL_LOGLEVELF("TRC", fmt, __VA_ARGS__);
#else
#  define LOG_TRACEF(fmt, ...)
#endif

#if LOG_LEVEL <= LOG_LEVEL_WARN
#  define LOG_WARNF(fmt, ...) LOG__DECL_LOGLEVELF("WRN", fmt, __VA_ARGS__);
#else
#  define LOG_WARNF(fmt, ...)
#endif

#if LOG_LEVEL <= LOG_LEVEL_ERROR
#  define LOG_ERRORF(fmt, ...) LOG__DECL_LOGLEVELF("ERR", fmt, __VA_ARGS__);
#else
#  define LOG_ERRORF(fmt, ...)
#endif

Adding or changing log levels is an exercise left up to the user. Do what works for you.

Note the obscure T ": " fmt syntax. Rather than pay the overhead of invoking snprintf(), or allocating heap memory to prefix the log line with “TRC”, “WRN” or “ERR”, we use preprocessor concatenation to generate a single format string at compile time.

At this point, you should be able to write the following code:

#define LOG_LEVEL 0
#include "log.h"

LOG_TRACEF("Hello, %d worlds!", 42);

and see the log line

TRC: Hello, 42 worlds!

on stdout, and in your log file.

Step Four: Structuring the Logs

C gives us macros for the file and line each log line was generated on. Before we go on to support specific structured fields, we can already generate a structured log with four fields:

The log level
The file the log message originated from
The line of code the log message originated from
The formatted message itself.

At the beginning of this post a JSON-structured log was used as an example of a structured log. However, there is a runtime cost to escaping logged strings in order to produce valid JSON in all cases that we would like to avoid. Consider:

LOG_TRACE("I said \"don't do it\"");

Unless processed, this would produce the ill-formatted JSON string

"I said "don't do it""

Structured Log Message Specification

Rather than generate structured JSON, another logging format is used, which can be postprocessed into JSON with a small script if necessary.

A logging message starts with the 3-byte logging level and a newline. Eg: “TRC” or “WRN”, followed by a newline.
Each subsequent field’s key is a tab following by alphanumeric characters, a colon and a single space character.
- /^\t[a-zA-Z0-9]+:\x20/
Immediately following, the structured field’s value is every byte until the three byte end-of-line sequence 0x20 0x1A 0x0A is encountered.
- /.+\x20\x1A\n/
Further fields follow until the final field, which is terminated with the two-byte sequence 0x1A 0x1A.
- /\x1A\x1A$/.
Note that the final field does not also contain the three byte end-of-line sequence.

This results in a structured log message which is machine-parsable, but is also human readable on popular Linux UTF-8 terminals, as well as DOS codepage 437 (the DOS Latin US codepage). Unless your log messages contain the highly obscure byte sequences described above (unlikely!), we do not need to do any runtime escaping, even for newlines.

Structured Log Implementation

A structured log is one long string that is concatenated in the preprocessor. Here we provide a mechanism to let the user choose between flat and structured with a preprocessor flag.

// config
#define LOG_USE_STRUCTURED 1

// implementation
#define LOG__EOL "\x20\x1A\n"
#define LOG__EOM "\x1A\x1A\n"

#define LOG__FLAT(file, line, level, msg) \
    level " " file ":" line " |" msg "\n"

#define LOG__STRUCTURED(file, line, level, msg) \
    level           LOG__EOL \
    "\tfile: " file LOG__EOL \
    "\tline: " line LOG__EOL \
    "\tmsg: "  msg  LOG__EOM "\n"

#if LOG_USE_STRUCTURED
#  define LOG__LINE(file, line, level, msg) \
      LOG__STRUCTURED(file, line, level, msg)
#else
#  define LOG__LINE(file, line, level, msg) \
      LOG__FLAT(file, line, level, msg)
#endif

Now a call to LOG__LINE() generates either a flat or structured string which can be passed into a printf-style format specifier parameter. We redefine LOG__DECL_LOGLEVELF() as follows:

#define LOG__XSTR(x) #x
#define LOG__STR(x) LOG__XSTR(x)

#define LOG__DECL_LOGLEVELF(T, fmt, ...)           \
{                                                  \
    LOG__STDIO_FPRINTF(stdout,                     \
        LOG__LINE(__FILE__, LOG__STR(__LINE__), #T, fmt), \
            __VA_ARGS__);                          \
                                                   \
    LOG__FILE(handle,                              \
        LOG__LINE(__FILE__, LOG__STR(__LINE__), #T, fmt), \
            __VA_ARGS__);                          \
}

Note the use of LOG__STR(__LINE__) is necessary to stringify __LINE__, which is an integral constant. Because we concatenate it at compile time, it must be stringified.

Step Five: Structured Logging With Custom Fields

Structured logging of custom fields pushes the boundaries of what can be done by C preprocessing. Internally, we run a precompile step that:

Defines C-style structs in a custom language
Generates a C header file with those structs
Generates a C macro that logs each field of the struct in a structured manner
Defines visualizations for it in our telemetry visualization software

Building such a script and hooking it up as a precompile step is out of scope for this post. However, it is possible to perform this work incurring a single snprintf() at runtime.

Putting It All Together

Despite all of these changes and improvements, a logging call to print to stdout will still result in a single function call. To test that, we view the preprocessed output of a LOG_TRACEF() call generated with clang -E:

// logging call in source
LOG_TRACEF("Hello %d", 42);

// output of clang's c preprocessor (newlines added)
{ 
  fprintf(stdout, "\"TRC\"" "\x20\x1A\n" 
          "\tfile: " "main.c" "\x20\x1A\n" 
          "\tline: " "10" "\x20\x1A\n" 
          "\tmsg: " "Hello %d\n" 
          "\x1A\x1A\n" "\n", 42);; ; 
};;

Success — we can see only one call being made and it’s to the library function fprintf to log the message. On Windows console it looks something like this:

example of a structured log

The little arrows are the end-of-line and end-of-message markers, essential for further machine processing.

Processing Structured Logs

As explained, further processing of logs into JSON may be a necessary step before telemetry ingestion. This Python script accepts structured logs streamed from stdin. It outputs properly formatted JSON.

Sample Code

Here is sample code assembled from the snippets in this post.