Constant Time Memory Allocation From Object Pools

Sat 27 May 2023
Michael Labbe
#code

Having linear pools of homogenous objects is good for performance, but eventually a piece of code will need to operate on or reserve a single object from the pool. This is a practical reality in gameplay code where things exist for a short time and interact in one-off situations.

What does freeing and allocating a single object from a preallocated pool mean? Let’s clearly establish the scenario:

The entire pool is heap allocated at startup, and commonly initialized to nil values. The entire pool lives at a fixed size for the session.
Allocating an object from the pool means receiving a handle to the object. The handle is valid until free is called, at which point the object in the pool can be reallocated in the future. This effectively reuses the object memory without returning it to the system.
Once all of the objects in the pool have been allocated, further allocation fails. The pool is contiguous in memory and its handles do not invalidate because of a memory relocation on realloc.

It is clear to see that allocating and freeing objects from a pool needs to be quick. It happens hundreds if not thousands of times per frame. Linearly searching the list for a free pool entry is not going to yield acceptable performance.

Further, a non-intrusive solution is needed, as the entire array is commonly iterated over during processing to avoid branching and data about freed objects cannot reliably be stored in-array.

The rest of this article explains a constant-time approach to allocating and freeing from the pool.

The Swap Delete Technique

The Swap Delete technique was first introduced to me in a GDC Canada 2010 talk titled A Dynamic Component Architecture for High Performance Gameplay. It is a talk by Insomniac’s Lead Systems Engineer, Terence Cohen. Outlined in just a few slides in a much larger talk about dynamic component systems, Swap Delete is a brilliant technique that deserves to be plucked out of his larger talk and explained in depth. Note that this technique can be used with any homogenous object pooling system, with or without a component system sitting atop.

First, let’s establish the terminology we need:

Object Pool: An array of homogenous objects.
- Handle: an array index into an object pool. Where n is a handle, pool[n] returns the nth object from the object pool. For the purpose of this article, there is no difference between a handle and an index into an array of objects.
Roster Table: A table of handles. No handle is repeated, and every handle from the pool is always in the roster table. Handles are unique indexes into the object pool.
- Roster Index: An index that looks up a handle stored in the roster table.
- Partition: A roster index that points at the roster table’s first free handle.

Consider this figure: The roster table‘s six values are handles, from 5 to 0. Here, roster[1] returns the handle 4, which is the fourth object in the pool.

Here, partition == 5, which points to the roster table field containing the value 0. Zero is the handle of an unallocated object from the pool.

The partition always points at the next object to allocate. It always separates the handles in the roster table from the allocated ones.

Constant-Time Allocation

On allocation a handle for the pool object at the partition’s address is returned. The partition is then incremented by 1.

// alloc pseudocodeassert(partition < pool.length)new_handle = roster_table[partition]new_obj = pool[new_handle]partition++

That takes care of constant time allocation, but what about deletion? Clearly, the partition needs to always point at the next free object in the roster table for this allocation scheme to work.

Constant-Time Deletion

To delete a handle, swap its roster table entry for the entry just before the partition.

In this figure, pool handle 4 from roster index [1] is being deleted. It is swapped into roster index [4], because the partition is pointing at [5].

Now, simply decrement the partition, so that it points at the newly freed roster index entry.

As you can see in the final figure, the partition points at [4], which contains pool handle 4. This is the freshly deleted pool object.

Consider that the next call to alloc will correctly return pool handle 4.

    // free pseudocode    roster_index_for_deletion = handle_to_roster_index(handle)    swap(roster_table[roster_index_for_deletion],          roster_table[partition - 1])    partition--

Implementation Details

There is a necessary implementation detail that is not mentioned in Terrence Cohen’s presentation and is also not pictured in the graphics in this article.

All calling code will pass objects in by handle, not roster index. The roster table is an implementation detail of the pool.

When an object is freed, it will be passed in by handle, but the roster index is needed to perform the swap. In order to resolve a handle into a roster index, a lookup table from handle to roster index must be created. Its fields must be swapped whenever the roster table’s values are swapped. It is a reverse lookup that must be kept in sync any time the roster table is updated.

That’s it. In total, there are three arrays that must be maintained:

The object pool
A roster table, containing handles into the pool of objects
A handle-to-roster table, mapping handles to roster indexes

The roster and handle-to-roster tables are implementation details of the pool’s allocation and free functionality; the caller never interacts with them directly. The caller solely deals with handles.

A Better Handle

It is possible that a stale handle could point to an object that used to exist but has been freed and re-allocated. This is possible because handles are reused between alloc/free events.

If heap allocated objects were used instead of a pool, stale pointers would likely crash, which is arguably preferable to stale data making its way into a running program which would subtly display strange behaviour.

Andre Weissflog wrote a thought-provoking article on using handles instead of pointers. In an update at the end of his blog post, he proposes adding a generational counter to handles. An object pooling system based on handles would do well to consider generational checks when mapping a handle into a pool, especially while debugging.

Ninja Build for Asset Baking

Sun 22 January 2023
Michael Labbe
#code

Ninja Build is typically used as a low-level source builder for CMake, Meson or GENie, but it is incredibly good at building game asset files, too.

Frogtoss Tech 3, the current-gen Frogtoss engine, uses Ninja to bake all assets (lighting, texture compression, etc). This has proven to be an extremely reliable and high performing choice. In terms of effort-for-results, it was a clear win for me.

Ninja tracks which files need updating on re-run, builds a DAG and executes all of the commands using your available cores. It supports incremental builds with great reliability and can be integrated to bake a source asset tree in a matter of hours from the start of experimentation.

The rest of this article makes the case for exploring using Ninja for baking assets.

Many developers have used Ninja to build source through CMake, or otherwise. It’s billed by the official website as supporting the low-level parts of building:

Where other build systems are high-level languages Ninja aims to be an assembler.

Many high level tools with incredible flexibility such as Meson generate Ninja files. Once Meson, GENie or CMake does this initial project scan, the Ninja file is generated and the project can be efficiently rebuilt.

Today, I am inviting you to look at Ninja quite differently than how it is advertised. In my view, Ninja is good for two things that aren’t frequently mentioned:

Ninja files are straightforward to write by hand. Sure, you wouldn’t want to do it all the time, but you can learn it quite quickly.
Ninja’s executable is easily distributable to your co-developers. It’s a much smaller dependency than Make/Bash to inflict on your Windows developers — just a single 600kB executable per platform.

For these reasons, I use Ninja even when its performance benefits are not necessary.

The official manual on writing your own ninja files is fairly short, but if you’re building assets and not source, you can cherry pick three things to get started:

How to write rules
How build statements work
(optional) How variables work

Hand-writing your first experimental Ninja build file that can build a few of your assets is an exercise that should take less than half an hour.

From there, it is pretty clear to see how one could write a script to walk their asset tree and create a ninja build file. Many source asset trees are built from convention: a certain extension in a certain directory tells you everything you need to know to produce the output asset for a specific platform.

If you write your own Ninja Build generator that consists of rules and statements, I would encourage you to not be constrained by languages that include libraries that help you build Ninja files. In my case, this was achieved in about 200 lines of Rust with no third party crates.

Clangd With Unity Builds

Sat 14 January 2023
Michael Labbe
#code

So, you’re programming in C/C++ and your editor is live-checking your code, throwing up errors as you type them. Depending on your editor you’re either getting red underlines or some sort of marker in the margins that compels you to change it. But how does this source code processing engine work, and what can you do when things go off the rails when you attempt something like a unity build?

You may have configured your editor to use a language server called clangd, or it may have been configured for you.

Clangd also offers other benefits, such as go-to-definition and code completion. It runs as a subprocess of your editor, and that means it’s doing its indexing work off the main thread as you would hope.

Clangd can be configured to work with VSCode and Emacs amongst others.

Problem 1: Clangd doesn’t understand how you build

You’re coding along and need to include a bunch of files from a library, so you add a new include path to your build with -Isome_lib/. Unfortunately, clangd doesn’t nkow that you build with this, so a bunch of ugly, incorrect compile errors pop up in your editor.

If you employ your Google-fu, you land on a page that says that Clangd lets you specify your compile flags in a yaml file called .clangd. But, wait! There is a better way. You really want to automate this, or you’ll have to maintain two sets of all your compile flags, which sounds like one of the least enjoyable things you could possibly fritter your time away on.

The good news is that Clangd, alternatively, reads in a compile database, which is just a JSON-structured list of the commands needed to compile each file in a given way. There are plenty of tools that can generate a compile database automatically. For example, if you can output Ninja Build files, you are one step away:

ninja -t compdb > compile_database.json

Clangd recursively searches up the tree to find this file, and uses it to try and compile your files. If a source file exists in your codebase but isn’t in the compile database, it will infer the flags needed to compile it from other commands in adjacent files in this directory. This is most likely what you want, and it’s a great starting point.

You should run this command as a post-build step, or after updating your source repo in order to automatically update your compile database so you never have to worry about keeping your compile flags in sync again.

Problem 2: Clangd adds annoying warnings you don’t want

Sometimes your on-the-fly error checking is too quick on the draw. It tells you about warnings you’d rather only see when you’re compiling. Warning about unused functions is one such example — you declare a function as static and intend to call it. But, before you have a chance to call it, it throws a distracting error. Let me finish typing, Clangd!

One option is to use a compile database post processor that ingests the Clang database that is spat out from Ninja, letting you make the tweaks you need. Now you have automation and customization.

I wrote a compile database post processor that you pipe your output through. Now your command looks like:

    ninja -t compdb | \        cleanup-compdb > compile_commands.json

Let’s say you want to disable the warning for unused functions. cleanup-compdb can append a flag for that to each compile command:

    ninja -t compdb |                           \       cleanup-compdb                            \       --append-arguments="-Wno-unused-function" \       > compile_commands.json

Problem 3: Clangd doesn’t understand unity builds

Let’s start by defining the problem.

You have a c “root” file that #includes one or more “sub” files which produce a single translation unit. Clangd now deftly handles the root file and locates the includes. However, Clangd has no concept of c source that is not a stand alone translation unit, so the sub files generate more benign errors than you can count.

Clangd doesn’t have support for unity builds. However, we can instruct it to resolve the necessary symbols as if the sub files were each their own translation unit for error checking purposes.

To illustrate this technique let’s define the root translation unit as root.c, and the sub units as sub0.c, sub1.c, and so on.

An additional file root.unity.h will be created that has all of the common symbols for the translation unit.

The steps to factoring your Unity build for Clangd

Create a root.unity.h file that starts with #pragma once and includes all common symbols in the translation unit, including system headers and forward declarations.
Create root.c which includes root.unity.h before anything else. Doing anything else in this file is entirely optional.
Create sub0.c and start it off with #pragma once and then include root.unity.h.
Include sub0.c at the bottom of root.unity.h.
Repeat steps 3 and 4 for any other sub-files in the unity build.

This has the following results:

For real compilations, root.c compiles and includes sub0.c and ignores the second include of root.unity.h that comes from sub0.c because that file starts with #pragma once.
For Clangd processing of root.c, the previously described compilation works similarly and everything just works.
For Clangd processing of sub0.c as the faux-main translation unit, it includes root.unity.h and all symbols resolve, thus fixing the litany of errors.

One hiccup remains. Clangd now emits this warning:

warning: #pragma once in main file

This is but a hiccup for us if we are using a compilation database postprocessor as described in problem 2. Simply append -Wno-pragma-once-outside-header to the list of warnings we want to ignore.

Conclusion

Jumping to symbols in an IDE and getting accurate compilation previews is a problem that has been imperfectly solved for decades, even with commercial plugins in Visual Studio. This blog post proposes a few tweaks to the language server Clangd which, in exchange for paying an up-front effort cost, automates the maintenance of a compilation database that can be tweaked to suit your specific needs.

In taking the steps described in this blog post, you will gain the knowhow to adapt the existing tools to a wide range of C/C++ codebases.

I propose a new technique for unity builds in Clangd that operates without the inclusion of any additional compiler flags. If you are able to factor your unity builds to match this format, it is worth an attempt.

◀ Page 4 of 13 ▶