
How To Crash With Kubernetes and Go
Sat 23 February 2019
Michael Labbe
#code
Kubernetes is so good at maintaining a user-facing veneer of a stable service that you might not even know that you are periodically crashing until you set up log aggregation and do a keyword search for panic. You can miss crash cues because pods spin up so transparently.
Okay, so your application can crash. You are using Go. What can you do about it? In practice, here are the steps we have found useful:
- Log your panic record into a single log line so it can be tracked.
- If a panic occurred while serving a RESTful request, return 500 to prevent client timeout while continuing to serve others.
- Handle panic-inducing signals such as
SIGSEGV
gracefully. - Handle Kubernetes pod pre-shutdown
SIGTERM
messages.
Panic-inducing Signals
If you write a C program and do not explicitly handle SIGSEGV
with signal(2)
, the receipt of SIGSEGV
terminates the offending thread.
Go is different from C. Go’s runtime has a default panic handler that catches these signals and turns them into a panic. Defer, Panic and Recover on the official blog covers the basic mechanism.
SIGSEGV
(“segmentation violation”) is the most common one. Go will happily compile this SIGSEGV
-generating code:
var diebad *int*diebad++ // oh, no
The full list of panic reasons is described in the official panic.go source.
Non-Panic Inducing Signals
Not every signal produces a Go panic — not by a long shot. Linux has over 50 signals. Version 7 had 15 different signals; SVR4 and 4.4BSD both have 31 different signals. Signals are a kernel interface exposed in userspace, and a primary means for processes to contend with their role in the larger operating system.
Let’s go over the non-panic inducing signals and discuss what they mean to our Kubernetes-driven Go program:
-
Unignorable signals:
SIGKILL
andSIGSTOP
can’t be ignored. They are provided by the kernel as a surefire way of killing a process. If received, the process terminates without warning and we have to rely on logging coming from external sources. It is not recommended to use unignorable signals in automating your process restarts. -
Flow-related signals: Many signals can be classified as supporting thread execution. These include
SIGCONT
andSIGPIPE
. They do not interact with Kubernetes and we can safely ignore them or reserve them for any process-specific needs that come up. -
Kubernetes-Generated Signals. Kubernetes sends
SIGTERM
to PID 1 in your container thirty seconds before shutting down a pod. If you weren’t trapping this previously (and also not using a preStop hook), you are missing an opportunity to gracefully shut down your pod. By default,SIGTERM
terminates the process in a Go program. The more aggressiveSIGKILL
is sent to your pod if it is still running after the grace period.
Handling Panics in Go
We’ve established that crashing signals in Go are received by its runtime panic handler, and that we want to override this behaviour to provide our own logging, stack tracing, and http response to a calling client.
In some environments you can globally trap exceptions. For instance, on Windows in a c++ environment you can use Structured Exception Handling to unwind the stack and perform diagnostics.
Not so in Go. We have one technique: defer
. We can set up a defer
function near the top of our goroutine stack that is executed if a panic occurs. When there, we can detect if a panic is currently in progress. There are a number of gotchas with this technique:
defer
does not run ifos.Exit()
is called. Make sure all error paths out of your process callpanic
or useruntime.Goexit()
.defer
(andrecover
) operate on goroutines, not processes. If you set adefer
to run inmain
and then spawn a goroutine which panics, thedefer
will not be called.
We can use the latter trait to our advantage in our web service, providing a generic panic handler that logs, and a second panic handler inside the goroutine that responds to a web request that returns 500 error
to the user.
Global Panic Handler
The global panic handler is your opportunity to employ your logger to use your logger to provide all relevant crash diagnostics that occur outside of responding to an HTTP request:
//// Sample code to catch panics in the main goroutine//func main() { defer func() { r := recover() if r == nil { return // no panic underway } fmt.Printf("PanicHandler invoked because %v\n", r) // print debug stack debug.PrintStack() os.Exit(1) }()}
In-Request Panic Handler
Most (if not all) Go RESTful packages use a per-request Goroutine to respond to incoming requests so they can perform in parallel. The top of this stack is under package control, and so it is up to the RESTful package maintainer to provide a panic handler.
go-restful defaults to doing nothing but offers an API to trap a panic, calling your designated callback. From there, it is up to you to log diagnostics and respond to the user. Check with your RESTful package for similar handlers.
go-restful’s default panic handler (implemented in logStackOnRecover
) logs the stack trace back to the caller. Don’t use it. Write your own panic handler that leverages your logging solution and does not expose internals at a crash site to a client.
Terminating Gracefully on Request
Okay, at this point we are logging crash diagnostics, but what about amicable pod termination? Kubernetes is sending SIGTERM
and because we are not yet trapping it, it is causing our process to silently exit.
Consider the case of a DB connection over TCP. If our process has open TCP connections, a TCP connection sits idle until one side sends a packet. Killing the process without closing a TCP socket results in a half-open connection. Half-open connections are handled deep in your database driver and explicit disconnection is not necessary, but it is nice.
It avoids the need for application-level keepalive round trips to discover a half-open connection. Correctly closing all TCP connections ensures your database-side connection count telemetry is accurate. Further, if a starting pod initializes a large enough database connection pool in the timeout window, it may temporarily exceed your max db connections because the half-closed ones have not timed out yet!
//// Sample code to trap SIGTERM//func main() sigs := make(chan os.Signal, 1) signal.Notify(sigs, syscall.SIGTERM) go func() { // before you trapped SIGTERM your process would // have exited, so we are now on borrowed time. // // Kubernetes sends SIGTERM 30 seconds before // shutting down the pod. sig := <-sigs // Log the received signal fmt.Printf("LOG: Caught sig ") fmt.Println(sig) // ... close TCP connections here. // Gracefully exit. // (Use runtime.GoExit() if you need to call defers) os.Exit(0) }()}
You may also want to trap SIGINT
which usually occurs when the user types Control-C. These don’t happen in production, but if you see one in a log, you can quickly recognize you aren’t looking at production logs!
No Exit Left Behind
At this point we have deeply limited the number of ways your application can silently fail in production. The resiliency of Kubernetes and the default behaviours of the Go runtime can sweep issues under the rug.
With just a few small code snippets, we are back in control of our exit conditions.
Crashing gracefully is about leaving a meaningful corpse for others to find.

Three Traits of Effective Programmers
Thu 05 July 2018
Michael Labbé
#code
Intellect, the ability to focus in on a problem and sheer time committed to the craft of programming are critical and pretty obvious elements that make a programmer good. Having these things on your side is partly luck and partly an expensive time commitment. However, I believe there are further traits that can be developed through habit-forming practice that make a programmer excellent.
Some programmers transcend being merely good; they are highly effective. This often becomes apparent when you see them becoming the team’s de facto problem solver, or when they reliably design and implement excellent-fit solutions, topping their previous attempts.
In the teams I’ve participated in and built I have found three traits that recur in highly effective programmers. When I find even one of them they often go on to live up to great promise. Any one of them is a strong tell, and more is a sign of a programmer with serious potential to be impactful.
The first trait is intellectual curiosity. When you find someone who tinkers because they are curious about new results you are engaging someone who has internalized the impetus for pioneering solutions. Internalization of curiosity is key because it is the surest driver of tangential exploration. A programmer who has exercised solutions to problems they dreamt up themselves out of pure interest in discovery has strengthened their abilities in excess of the rigours of standard professional performance. Professional programming makes you strong enough to stand tall in full gravity. Intellectual curiosity exceeds that; it is like training with a weight belt on.
The second trait is tenacity. Tenacity is the sworn enemy of “Cool, it works! We’re done!”. Those who internalize this trait never spitball their way to a final solution. If multiplying by negative one solves the problem but they don’t know why, they remove it and figure out why the sign inversion makes everything seemingly work. Inherent to this behaviour is the inclination to traverse underneath abstractions. Making it work is no longer the quest; the search is for a deeper understanding, one that makes the answer readily apparent. Illuminate the problem with a hard-earned understanding of the facts and the rest is small muscle movements.
An example of tenacity is spending three weeks tracking down a memory leak in ostensibly mature system libraries. Working through source, compiling it yourself, pouring over machine code, examining the compiler, and then reading your processor instruction manual. Rewriting portions of libc to verify results. Thermal imaging in your data center. Whatever it takes.
The final trait is a willingness to self criticize. Most programmers eventually have the experience of looking at code from a few years back and cringing. While syntax choices evolve, the cringe truly comes from a looking-in view of a naive problem solver doing their best and missing their mark. When a priori derived solutions are mismatched with the present understanding of a problem, personal growth is felt at a gut level.
An unprompted individual who consistently criticizes their own solutions is going to blossom quickly. Any valuable solution space is enormous, and the ability to criticize from a positive vantage point is the natural habitat of an always improving programmer.
Those are the three traits I’ve seen that suggest a programmer is going to be promising and impactful. Next time I am going to ponder the question that affects your effectiveness more than anything else: How do you decide what to work on?

Bootstrapping Your Linux and Mac Shells
Mon 07 August 2017
Michael Labbe
#code
Between the cloud, VMs, Docker and cheap laptops, I run into more unconfigured shell environments than I ever did before. In the simple old days you used to get a computer and configure it. You reaped the productivity of that configuration for years. Nowadays environments are ridiculously disposable. The tyranny of the default has become incredibly powerful.
I decided to take the power back and create a self-installing bootstrap script which I could use to configure any new system with a Bash shell. This ended up being a one-day hack that has made my life a lot more sane. My requirements were:
-
It must self-install and self-configure without needing to type any commands.
-
It must be accessible everywhere from an easy-to-remember URL so I don’t need to copy/paste.
-
It must optionally let me choose how each system is configured.
-
It must run anywhere — inside stripped down Docker containers, etc.
To build this, I decided on using Bash scripting. Perl, Python and Ruby are not always available. Bash, while not quite as ubitquious as /bin/sh
and Busybox, is close enough.
Makeself creates self extracting archives. You can download a shell script that unarchives to a tempdir, running a setup script with access to a hierarchy of files. This is exactly what we needed.
In order to centrally host the bootstrap script, I used Amazon S3. S3 buckets have notoriously long names, but Amazon gives you the ability to use a CNAME for a subdomain that you own. This means I could use a subdomain like https://bootstrap.frogtoss.com
that is backed by S3, guaranteeing the bootstrap is accessible virtually anywhere in the free world.
What remained is a long day of enjoyable hacking that produced a set of very personal dotfiles, emacs tweaks and sed manipulations that converted a basic install into something as usable as my most tweaked workstation.
Now I have a chained command that is similar to the following which highly configures any Linux instance:
rm -f bootstrap.sh;wget http://bootstrap.frogtoss.com/bootstrap.sh;chmod +x bootstrap.sh ; sudo ./bootstrap.sh