12.1 Factor Apps: Logs
Hey everyone. Adam here for the next episode in the 12.1 factor app series. I’m writing more episodes addressing my amendments to the original 12 factors. After that, I’ll propose new factors.
Also, sorry about a mistake in the last episode. The podcast episode I mentioned is no longer available. The host told me he took the podcast completely offline. However, he did invite me onto his current podcast, Rails with Jason. I’ll go on his show in the coming weeks to discuss continuous delivery, deployment pipelines, preflight checks, smoke tests, and all that good stuff. Jason also said I can simulcast the episode on small batches, so that’s a goodone bonus episode for ya.
OK, enough preamble for now. Time to talk logs.
The 12 factor app states that applications should not concern themselves with storing their log stream. Simply log to standard out or standard error. This works in development because developers can see logs in their terminal. It also works in production because tooling can redirect logs or capture process streams independently of the application.
My stance on the 12 factor app is that it’s a great starting point but requires amendments. Just logging to standard out or standard error is not enough to build robust continuous delivery pipelines. We need to layer logging practices on top of the original recommendations.
So, the 12.1 factor app does three things:
1. Supports a LOG_LEVEL configuration option.
2. Uses a machine-readable format, like JSON, in production.
3. Generates time-series telemetry from logs.
Let’s consider each point.
The first point relates to the config factor. More on that in the previous episode at https://smallbatches.fm/6. Applications must support log-level configuration instead of hard coding it. Use a low log level like debug in development and info or higher in non-development environments.
Second, logs must be produced in a machine-readable format such as JSON. Oh, and no multiline logs. Multiline logs are effectively syntax errors in a log stream so, just really avoid them. Anyway, using a machine-readable format enables new use cases. Here is a few: Error logs may contain stack traces. Contextual information — such as user IDs — may be added to all log entries. Log entries can generate time-series data. Log entries may be parsed and routed to different storage systems. Warn and error logs may generate alerts. Fatal logs may page someone. You know, the list goes on and on.
The point regarding time-series telemetry warrants extra attention. Consider nginx or apache. Both output the well-known “access log” format. The format includes request latencies, response codes, and other information like the origin IP address. This single log line contains wonderfully useful telemetry! Parsing the log can generate a histogram on response latencies, also incoming request count, a percentage of satisfied requests, internal server errors, backend errors, a leaderboard on response codes, and more. That’s enough to understand how the HTTP service is operating, so there's no need for extra tools.
The same approach applies to internal telemetry. Applications can output time-series data to standard out for consumption by downstream systems. This eliminates the need for third-party libraries and external metric collection services in favor of infrastructure-level log storage and metric generation. You can see this principle in action with products like DataDog and NewRelic. Both offer centralized log storage, searching, and metric generation. Once metrics are generated, then you have access to full suite of tools around them such as graphing, monitoring, and alerting.
Alright. Let’s recap the three points:
1. Support log level configuration.
2. Log in a machine-readable format such as JSON.
3. Treat log streams as a telemetry source.
Also, these practices are especially useful in growing distributed systems since they shift responsibility out of applications and into horizontal support layers.
Anyway, that’s all for this one. Go forth and log.