A service can produce multiple logs for a single request. Furthermore, in a microservices architecture, a request may be handled by multiple services. Typically, a request:

  • hits the API gateway or BFF,
  • then it’s authenticated by another service,
  • and finally it’s received by the target service(s),
  • which may also call other services (e.g. workers or authorization services).

When you find an error in your logs, it’s useful to find all the logs related to the request to get more context. That’s where log correlation comes in.

Structured logs

The first thing you need to think about when correlating your logs is having structured logs. Structured logs allow you to assign additional properties to your log entries, which can then be parsed by your log aggregation system. This allows you to group or filter logs based on these properties. This allows you to find logs, for example:

  • from a specific date range,
  • related to a specific endpoint,
  • related to a specific client,
  • or related to a specific query.

One of the most common approaches is to use JSON or XML to store logs in files, where each line is a log entry. Log entries typically have at least three properties:

  • message,
  • log level,
  • timestamp or date.

Assigning the traceId

When we talk about logs correlation we usually mean aggregation of logs produced during processing a specific request by a distributed system.

To correlate logs, you assign the same unique property to all logs related to processing request. You can do this by generating the unique ID during the incoming request (it may be e.g. UUID, random string or number) and then passing it to other services. Such property is usually named traceId or correlationId. I prefer using traceId so I’ll use this name later in this article. It’ll allow integrating it with distributed tracing which I’ll describe in another blog post.

I think it’s also worth adding traceId to errors returned to clients and even displaying them in the UI. It’s a handy tool when client will come to you with a problem in support ticket. Thanks to this, you can easily find all the context of the exact error they come with. It’s extremely helpful, as clients usually don’t have access to some data you have, or are just not substantively prepared to give you all technical details you need for debugging (what is completely understandable).

Passing the traceId between services

Depending on the protocol you are using to establish communication between services, you can use different approaches to pass the `traceId’. In HTTP, NATS and AMQP (RabbitMQ) it’s best to assign it to a custom header (don’t use CorrelationId in AMQP for this - it’s intended to correlate requests with responses). In SQS you can use [message attributes] (https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-message-metadata.html#sqs-message-attributes). If your protocol doesn’t allow any form of metadata to be attached, it’s worth wrapping a message in an envelope to separate the message body from the metadata.

Another thing is to pass the traceId internally in the service. If you get the traceId from the header of the incoming request, you need to somehow pass it to the headers of the outgoing requests. You can do this in a number of ways. In the naive implementation, you can just pass the traceId explicitly between components:

class Controller {
	// ...

	async foo(body: Buffer, headers: Record<string, string>) {
		const traceId = headers['Trace-ID'] ?? generateUUID();

		await this.externalApiClient.bar(traceId);

		this._logger.log('Example log message', { traceId });
	}
}

class ExternalApiClient {
	async bar(traceId: string) {
		const headers = {
			'Trace-ID': traceId
		};

		this._httpClient.request(externalApiURL, '', headers);
	}
}

I see a couple of problems with this approach:

  • You’ll have to manually pass the traceId in all interactions with external services.
  • Is not resilient to changes, if you want to add other tracing metadata (e.g. timestamp, spanId etc) you’ll have to do a major refactor.
  • If you want to add the traceId to a class that doesn’t already use it, you’ll have to change its method.

In other words - we are breaking the open-closed principle here.

Using IoC containers

If your application uses IoC, it’s much easier. You can provide traceId as a request scoped dependency. Thanks to this, you can pass the traceId in the logger and in your HTTP client utility class without touching the controller. Also, you pass the traceId once in the code and don’t have to remember it later.

class Controller {
	async foo(body: Buffer, headers: Record<string, string>) {
		await this.externalApiClient.bar();

		this._logger.log('Example log message');
	}
}

class HttpClient {
	constructor(@Inject('TraceID') private _traceId) {}

	async request(
		url: string,
		body: string | Buffer,
		headers: Record<string, string | string []>
	) {
		const headers = {
			'Trace-ID': this._traceId
		};

		return await axios({
			method: 'get',
			url,
			body,
			headers
		});
	}
}

class Logger {
	constructor(@Inject('TraceID') private _traceId) {}

	log(message: string, data: Record<string, unknown>): void {
		const logEntry = {
			ts: Date.now(),
			this._traceId,
			message,
			...data
		}

		console.log(JSON.stringify(logEntry));
	}
}

Alternative solutions

If your architecture or language doesn’t support request scoped dependencies, you can alternatively create a “context” class. It’ll still be passed explicitly between clients, but by removing the primitive type, the code will be more resilient to changes.

class Controller {
	async foo(ctx: Context, body: Buffer, headers: Record<string, string>) {
		await this.externalApiClient.bar(ctx);

		this._logger.log(ctx, 'Example log message');
	}
}

class HttpClient {
	async request(
		ctx: Context,
		// ...
	) {
		const headers = {
			'Trace-ID': ctx.getTraceID(),
			// You can extend the context anytime with additional properties but you don't have to change anything in the controller in such case
			// 'Client-IP': ctx.getClientIP(),
			// 'URL': ctx.getURL(),
			// 'Host': ctx.getHost()
		};

		// ...
	}
}

In NodeJS, you can also use asynchronous context tracking. In C# - AsyncLocal. This has a similar benefit to the IoC solution - you can remove the context passing from the controller, and you’ll only pass the traceId once in the code:

class Controller {
	async foo(body: Buffer, headers: Record<string, string>) {
		await this.externalApiClient.bar();
	}
}

class HttpClient {
	async request(
		// ...
	) {
		const ctx = asyncLocalStorage.getStore();

		const headers = {
			'Trace-ID': ctx.getTraceID()
		};

		// ...
	}
}

In NodeJS, this feature is relatively new and not widely used. However, it’s already been released as stable, so it’s worth considering.