Recently I was working on a Go program that kept blowing up as I cranked up the test load but was struggling to determine why. After spending far too many hours trying to track the issue, I found it to be caused by a slight misunderstanding of how to implement function timeouts efficiently.

I was using a third-party Semaphore implementation to protect access to a constrained resource. The issue turned out to be its specific implementation of the Acquire function - it had a dark, hidden, evil secret that caused hours of headaches.

There are two basic ways to implement an Acquire function in Go. Here’s the preferable one, using time.AfterFunc:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Acquire attempts to obtain a permit within the allotted time
func Acquire(timeout time.Duration) bool {
	cancel := make(chan struct{}, 1)
	t := time.AfterFunc(timeout, func() {
		close(cancel)
	})
	defer t.Stop()
	select {
	case <- cancel:
	   return false
	case <- permits:
		return true
	}
}

…and there’s the one that will summon Cthulhu because it uses time.Sleep in a Goroutine and if you do this you are a horrible, mean person:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Acquire attempts to obtain a permit within the allotted time.
func Acquire(timeout time.Duration) bool {
	cancel := make(chan struct{}, 1)
	// Don't do this:
	go func() {
		time.Sleep(timeout)
		close(cancel)
	}()
	// Seriously.
	select {
	case <- cancel:
	   return false
	case <- permits:
		return true
	}
	// I warned you!
}

Both functions use the same mechanism to obtain a permit within a time period - they open a timeout channel then perform select on both the timeout channel and the resource channel. The value returned by the Acquire function is therefore determined by whatever channel receives a value first (or nil if the channel is closed, as is the case with both timeout implementations) - true on success, or false if the timeout expired.

Both these implementations work exactly the same, bar one difference - the latter will blow up in your face as soon as it’s called frequently.

The problem is, of course, the Goroutine/time.Sleep combo. The problem with this implementation is that even if the function returns true the Goroutine continues running until time.Sleep is done. Had you called this function a few thousand times (and they’d all immediately returned true), you’d now have thousands of sleeping Goroutines occupying valuable resources.

In contrast, the time.AfterFunc implementation doesn’t suffer this problem - as soon as true is returned, the deferred t.Stop() is called, and the timeout is killed.

“Aha Dave” - you might say, pointing your index finger to the ceiling as it if were a loaded gun - “that’s ok, Goroutines are cheap!”, and you’d be right - Goroutines are incredibly cheap, but even a millionairre can bankrupt him or herself in a pound shop. I went from riches to rags with 70,000 Goroutines in less than a second.

But there’s another benefit to using time.AfterFunc - it’s cheaper than a Goroutine, because it uses the internal timer heap of the Go runtime; it doesn’t contribute to the Goroutine heap at all.

Using time.AfterFunc is exactly what the Go developers suggest - unfortunately only after demonstrating the flawed time.Sleep version first.

TL;DR: When implementing function timeouts in Go, prefer time.AfterFunc to time.Sleep.