For months, everything at Yusuf's company was fine. Then, suddenly, he comes in to the office to learn that overnight the log exploded with thousands of panic messages. No software changes had been pushed, no major configurations had happened- just a reboot. What had gone wrong?
This particular function was invoked as part of the application startup:
func (a *App) setupDocDBClient(ctx context.Context) error {
docdbClient, err := docdb.NewClient(
ctx,
a.config.MongoConfig.URI,
a.config.MongoConfig.Database,
a.config.MongoConfig.EnableTLS,
)
if err != nil {
return nil
}
a.DocDBClient = docdbClient
return nil
}
This is Go, which passes errors as part of the return. You can see an example where docdb.NewClient
returns a client and an err
object. At one point in the history of this function, it did the same thing- if connecting to the database failed, it returned an error.
But a few months earlier, an engineer changed it to swallow the error- if an error occurred, it would return nil
.
As an organization, they did code reviews. Multiple people looked at this and signed off- or, more likely, multiple people clicked a button to say they'd looked at it, but hadn't.
Most of the time, there weren't any connection issues. But sometimes there were. One reboot had a flaky moment with connecting, and the error was ignored. Later on in execution, downstream modules started failing, which eventually lead to a log full of panic level messages.
The change was part of a commit tagged merely: "Refactoring". Something got factored, good and hard, all right.
This post originally appeared on The Daily WTF.