Recently I was working on a Go program that kept blowing up as I cranked up the test load but was struggling to determine why. After spending far too many hours trying to track the issue, I found it to be caused by a slight misunderstanding of how to implement function timeouts efficiently.

I was using a third-party Semaphore implementation to protect access to a constrained resource. The issue turned out to be its specific implementation of the Acquire function - it had a dark, hidden, evil secret that caused hours of headaches.

There are two basic ways to implement an Acquire function in Go. Here’s the preferable one, using time.AfterFunc:

// Acquire attempts to obtain a permit within the allotted time
func Acquire(timeout time.Duration) bool {
    cancel := make(chan struct{}, 1)
    t := time.AfterFunc(timeout, func() {
        close(cancel)
    })
    defer t.Stop()
    select {
    case <- cancel:
       return false
    case <- permits:
        return true
    }
}

…and there’s the one that will summon Cthulhu because it uses time.Sleep in a Goroutine and if you do this you are a horrible, mean person:

// Acquire attempts to obtain a permit within the allotted time.
func Acquire(timeout time.Duration) bool {
    cancel := make(chan struct{}, 1)
    // Don't do this:
    go func() {
        time.Sleep(timeout)
        close(cancel)
    }()
    // Seriously.
    select {
    case <- cancel:
       return false
    case <- permits:
        return true
    }
    // I warned you!
}

Both functions use the same mechanism to obtain a permit within a time period - they open a timeout channel then perform select on both the timeout channel and the resource channel. The value returned by the Acquire function is therefore determined by whatever channel receives a value first (or nil if the channel is closed, as is the case with both timeout implementations) - true on success, or false if the timeout expired.

Both these implementations work exactly the same, bar one difference - the latter will blow up in your face as soon as it’s called frequently.

The problem is, of course, the Goroutine/time.Sleep combo. The problem with this implementation is that even if the function returns true the Goroutine continues running until time.Sleep is done. Had you called this function a few thousand times (and they’d all immediately returned true), you’d now have thousands of sleeping Goroutines occupying valuable resources.

In contrast, the time.AfterFunc implementation doesn’t suffer this problem - as soon as true is returned, the deferred t.Stop() is called, and the timeout is killed.

“Aha Dave” - you might say, pointing your index finger to the ceiling as it if were a loaded gun - “that’s ok, Goroutines are cheap!”, and you’d be right - Goroutines are incredibly cheap, but even a millionairre can bankrupt him or herself in a pound shop. I went from riches to rags with 70,000 Goroutines in less than a second.

But there’s another benefit to using time.AfterFunc - it’s cheaper than a Goroutine, because it uses the internal timer heap of the Go runtime; it doesn’t contribute to the Goroutine heap at all.

Using time.AfterFunc is exactly what the Go developers suggest - unfortunately only after demonstrating the flawed time.Sleep version first.

TL;DR: When implementing function timeouts in Go, prefer time.AfterFunc to time.Sleep.