Developer experiences from the trenches

Developer experiences from the trenches

Start of a new post

How To Crash With Kubernetes and Go

Sat 23 February 2019 by Michael Labbe
tags code 

Kubernetes is so good at maintaining a user-facing veneer of a stable service that you might not even know that you are periodically crashing until you set up log aggregation and do a keyword search for panic. You can miss crash cues because pods spin up so transparently.

Okay, so your application can crash. You are using Go. What can you do about it? In practice, here are the steps we have found useful:

  1. Log your panic record into a single log line so it can be tracked.
  2. If a panic occurred while serving a RESTful request, return 500 to prevent client timeout while continuing to serve others.
  3. Handle panic-inducing signals such as SIGSEGV gracefully.
  4. Handle Kubernetes pod pre-shutdown SIGTERM messages.

Maui

Panic-inducing Signals

If you write a C program and do not explicitly handle SIGSEGV with signal(2), the receipt of SIGSEGV terminates the offending thread.

Go is different from C. Go’s runtime has a default panic handler that catches these signals and turns them into a panic. Defer, Panic and Recover on the official blog covers the basic mechanism.

SIGSEGV (“segmentation violation”) is the most common one. Go will happily compile this SIGSEGV-generating code:

var diebad *int
*diebad++       // oh, no

The full list of panic reasons is described in the official panic.go source.

Non-Panic Inducing Signals

Not every signal produces a Go panic — not by a long shot. Linux has over 50 signals. Version 7 had 15 different signals; SVR4 and 4.4BSD both have 31 different signals. Signals are a kernel interface exposed in userspace, and a primary means for processes to contend with their role in the larger operating system.

Let’s go over the non-panic inducing signals and discuss what they mean to our Kubernetes-driven Go program:

Handling Panics in Go

We’ve established that crashing signals in Go are received by its runtime panic handler, and that we want to override this behaviour to provide our own logging, stack tracing, and http response to a calling client.

In some environments you can globally trap exceptions. For instance, on Windows in a c++ environment you can use Structured Exception Handling to unwind the stack and perform diagnostics.

Not so in Go. We have one technique: defer. We can set up a defer function near the top of our goroutine stack that is executed if a panic occurs. When there, we can detect if a panic is currently in progress. There are a number of gotchas with this technique:

We can use the latter trait to our advantage in our web service, providing a generic panic handler that logs, and a second panic handler inside the goroutine that responds to a web request that returns 500 error to the user.

Global Panic Handler

The global panic handler is your opportunity to employ your logger to use your logger to provide all relevant crash diagnostics that occur outside of responding to an HTTP request:

//
// Sample code to catch panics in the main goroutine
//
func main() {
    defer func() {
        r := recover()
        if r == nil {
            return // no panic underway
        }

        fmt.Printf("PanicHandler invoked because %v\n", r)

        // print debug stack
        debug.PrintStack()

        os.Exit(1)
    }()
}

In-Request Panic Handler

Most (if not all) Go RESTful packages use a per-request Goroutine to respond to incoming requests so they can perform in parallel. The top of this stack is under package control, and so it is up to the RESTful package maintainer to provide a panic handler.

go-restful defaults to doing nothing but offers an API to trap a panic, calling your designated callback. From there, it is up to you to log diagnostics and respond to the user. Check with your RESTful package for similar handlers.

go-restful’s default panic handler (implemented in logStackOnRecover) logs the stack trace back to the caller. Don’t use it. Write your own panic handler that leverages your logging solution and does not expose internals at a crash site to a client.

Terminating Gracefully on Request

Okay, at this point we are logging crash diagnostics, but what about amicable pod termination? Kubernetes is sending SIGTERM and because we are not yet trapping it, it is causing our process to silently exit.

Consider the case of a DB connection over TCP. If our process has open TCP connections, a TCP connection sits idle until one side sends a packet. Killing the process without closing a TCP socket results in a half-open connection. Half-open connections are handled deep in your database driver and explicit disconnection is not necessary, but it is nice.

It avoids the need for application-level keepalive round trips to discover a half-open connection. Correctly closing all TCP connections ensures your database-side connection count telemetry is accurate. Further, if a starting pod initializes a large enough database connection pool in the timeout window, it may temporarily exceed your max db connections because the half-closed ones have not timed out yet!

//
// Sample code to trap SIGTERM
//
func main() 
    sigs := make(chan os.Signal, 1)
    signal.Notify(sigs, syscall.SIGTERM)

    go func() {
        // before you trapped SIGTERM your process would
        // have exited, so we are now on borrowed time.
        //
        // Kubernetes sends SIGTERM 30 seconds before 
        // shutting down the pod.

        sig := <-sigs

        // Log the received signal
        fmt.Printf("LOG: Caught sig ")
        fmt.Println(sig)

        // ... close TCP connections here.

        // Gracefully exit.
        // (Use runtime.GoExit() if you need to call defers)
        os.Exit(0)
    }()
}

You may also want to trap SIGINT which usually occurs when the user types Control-C. These don’t happen in production, but if you see one in a log, you can quickly recognize you aren’t looking at production logs!

No Exit Left Behind

At this point we have deeply limited the number of ways your application can silently fail in production. The resiliency of Kubernetes and the default behaviours of the Go runtime can sweep issues under the rug.

With just a few small code snippets, we are back in control of our exit conditions.

Crashing is about leaving a meaningful corpse for others to find.

Start of a new post

Three Traits of Effective Programmers

Thu 05 July 2018 by Michael Labbé
tags code 

Intellect, the ability to focus in on a problem and sheer time committed to the craft of programming are critical and pretty obvious elements that make a programmer good. Having these things on your side is partly luck and partly an expensive time commitment. However, I believe there are further traits that can be developed through habit-forming practice that make a programmer excellent.

Some programmers transcend being merely good; they are highly effective. This often becomes apparent when you see them becoming the team’s de facto problem solver, or when they reliably design and implement excellent-fit solutions, topping their previous attempts.

In the teams I’ve participated in and built I have found three traits that recur in highly effective programmers. When I find even one of them they often go on to live up to great promise. Any one of them is a strong tell, and more is a sign of a programmer with serious potential to be impactful.

The first trait is intellectual curiosity. When you find someone who tinkers because they are curious about new results you are engaging someone who has internalized the impetus for pioneering solutions. Internalization of curiosity is key because it is the surest driver of tangential exploration. A programmer who has exercised solutions to problems they dreamt up themselves out of pure interest in discovery has strengthened their abilities in excess of the rigours of standard professional performance. Professional programming makes you strong enough to stand tall in full gravity. Intellectual curiosity exceeds that; it is like training with a weight belt on.

The second trait is tenacity. Tenacity is the sworn enemy of “Cool, it works! We’re done!”. Those who internalize this trait never spitball their way to a final solution. If multiplying by negative one solves the problem but they don’t know why, they remove it and figure out why the sign inversion makes everything seemingly work. Inherent to this behaviour is the inclination to traverse underneath abstractions. Making it work is no longer the quest; the search is for a deeper understanding, one that makes the answer readily apparent. Illuminate the problem with a hard-earned understanding of the facts and the rest is small muscle movements.

An example of tenacity is spending three weeks tracking down a memory leak in ostensibly mature system libraries. Working through source, compiling it yourself, pouring over machine code, examining the compiler, and then reading your processor instruction manual. Rewriting portions of libc to verify results. Thermal imaging in your data center. Whatever it takes.

The final trait is a willingness to self criticize. Most programmers eventually have the experience of looking at code from a few years back and cringing. While syntax choices evolve, the cringe truly comes from a looking-in view of a naive problem solver doing their best and missing their mark. When a priori derived solutions are mismatched with the present understanding of a problem, personal growth is felt at a gut level.

An unprompted individual who consistently criticizes their own solutions is going to blossom quickly. Any valuable solution space is enormous, and the ability to criticize from a positive vantage point is the natural habitat of an always improving programmer.

Those are the three traits I’ve seen that suggest a programmer is going to be promising and impactful. Next time I am going to ponder the question that affects your effectiveness more than anything else: How do you decide what to work on?

Start of a new post

Bootstrapping Your Linux and Mac Shells

Mon 07 August 2017 by Michael Labbe
tags code 

Between the cloud, VMs, Docker and cheap laptops, I run into more unconfigured shell environments than I ever did before. In the simple old days you used to get a computer and configure it. You reaped the productivity of that configuration for years. Nowadays environments are ridiculously disposable. The tyranny of the default has become incredibly powerful.

I decided to take the power back and create a self-installing bootstrap script which I could use to configure any new system with a Bash shell. This ended up being a one-day hack that has made my life a lot more sane. My requirements were:

  1. It must self-install and self-configure without needing to type any commands.

  2. It must be accessible everywhere from an easy-to-remember URL so I don’t need to copy/paste.

  3. It must optionally let me choose how each system is configured.

  4. It must run anywhere — inside stripped down Docker containers, etc.

To build this, I decided on using Bash scripting. Perl, Python and Ruby are not always available. Bash, while not quite as ubitquious as /bin/sh and Busybox, is close enough.

Makeself creates self extracting archives. You can download a shell script that unarchives to a tempdir, running a setup script with access to a hierarchy of files. This is exactly what we needed.

In order to centrally host the bootstrap script, I used Amazon S3. S3 buckets have notoriously long names, but Amazon gives you the ability to use a CNAME for a subdomain that you own. This means I could use a subdomain like https://bootstrap.frogtoss.com that is backed by S3, guaranteeing the bootstrap is accessible virtually anywhere in the free world.

What remained is a long day of enjoyable hacking that produced a set of very personal dotfiles, emacs tweaks and sed manipulations that converted a basic install into something as usable as my most tweaked workstation.

Now I have a chained command that is similar to the following which highly configures any Linux instance:

rm -f bootstrap.sh;
wget http://bootstrap.frogtoss.com/bootstrap.sh;
chmod +x bootstrap.sh ; sudo ./bootstrap.sh
Start of a new post

Premake for Package Maintainers

Tue 16 August 2016 by Michael Labbé
tags code 

If you are distributing a library, you have a responsibility to make it as trouble-free as possible to build and use. To do otherwise litters the Internet with a project that frustrates and consumes the time of potentially hundreds of programmers. (And when have you met a programmer with extra time?).

This presents an interesting building challenge: how do you create the most compatible build system while simultaneously burdening the user as little as possible?

The Alternatives

Evidently, a lot of people turn to GNU Make. GNU Make, however, calls out to bash on Windows and uses Unix path slashes. Good luck getting that to work seamlessly with Visual Studio. The standalone GNU Make executable that the FSF hosts is over ten years old and is riddled with bugs. It hangs on fork() in Windows 10. Do not tell your users to download this. Do not tell your users to download msys2 or Cygwin so they can build your code if you respect their time.

Another option is CMake. Sadly, its sole installer download is 55MB, bundles QT and a full documentation set, and forces your users to parse syntax that they likely don’t understand. CMake generates projects with hardcoded paths which means the generated files can’t be reused.

SCons requires Python 2 to be present. We live in a Python 3 world; asking users to install a second Python implementation side by side is just rude.

Even Better Than Running Premake

In light of these options, Premake is a breath of fresh air. It’s not perfect, but you can download a one 1MB self-contained binary which consists of a small C-based engine and bundled Lua scripts. Premake generates build scripts with relative paths and gets you up and running in most popular environments very quickly.

But, wait! You can do even better. You can run Premake for your users, checking the generated projects into the repository. Nothing generated by Premake is specific to your computer’s setup. This removes two steps from building for your users: downloading Premake and running it. Awesome!

Automating Premake

Premake has a nifty action feature, making it straightforward to script new actions, such as running itself multiple times to generate projects.

Doing so has a couple of subleties. Firstly, Premake is limited to generating project files in the directory that the premake5.lua script resides in. One directory with multiple builds in it is a mess. One option is to create a subdirectory per build type, then copy the premake5.lua script in there, run it, and then delete it. This sounds hacky, but it actually works without giving you any trouble.

Another thing to keep in mind when generating all of your Makefiles at once is that the Makefiles are operating system-specific. A Windows-based makefile will link different libraries, for instance. You need to generate build scripts-per platform.

In order to build from a subdirectory, you need to specify your paths relative from the generated subdirectory. This can be resolved succintly at the top of your premake5.lua script:

-- assumes the premake5.lua is in a subdirectory called "build" under
-- the project root.
local root_dir = path.join(path.getdirectory(_SCRIPT),"../../")
local build_dir = path.join(root_dir,"build/")

Example Code

My library, Native File Dialog, has a Premake 5 Script action called “dist” which does exactly this. If I need to update the build scripts, I type premake5 dist and they are all refreshed. My library users need only run the generated builds. Feel free to copy the code for yourself.

Zero dependency building for users and a one meg standalone dependency for the package maintainer! This is the smallest price to pay for cross platform building I’ve ever come across.

Page 1 / 6 »

rss
We built Frogtoss Labs for creative developers and gamers. We give back to the community by sharing designs, code and tools, while telling the story about ongoing independent game development at Frogtoss.