Improving Observability in a Software System
Russ Cam
27 September 2024Applying a combination of techniques and best practices to improve observability
Ensuring the performance and reliability of systems is critical. The ability to observe and measure the internal state of a system based on the data it produces, has become a fundamental practice in modern software engineering. At the heart of observability lies telemetry data— logs, metrics, and traces, often referred to as "the three pillars", each offering a different windowed perspective into system behaviour.
While many organizations already collect telemetry data through platforms like the Elastic Stack (ELK), the challenge often lies in optimizing this data to make it more actionable. Simply collecting it isn't enough— how can you extract meaningful insights efficiently? This post explores some techniques and practices to improve observability.
Specifically, we'll focus on:
Building a Single Pane of Glass for Logs, Metrics, and Tracing
Implementing Message Fingerprinting to aggregate similar logs
Using MessageTemplates for Extracting and Querying Variable Information
Standardizing telemetry data with a schema like Elastic Common Schema (ECS)
Let's dive into these concepts and how they can enhance the observability of your software system.
Single Pane of Glass: Unifying Logs, Metrics, and Tracing
When issues arise in a complex software system, isolating the root cause requires visibility into different forms of telemetry— logs, metrics, and traces. Unfortunately, these often live in separate systems, making the debugging process cumbersome and fragmented.
The concept of a Single Pane of Glass in observability refers to providing a unified interface that allows engineers to navigate seamlessly between these data types. While logs, metrics, and traces may not necessarily live in the same system (and there may be many reasons why this can be the case), you should aim for a tightly integrated experience that simplifies navigation between them.
Why It's Important
When something goes wrong in a distributed system, the symptoms manifest in different ways— through anomalous metrics, application logs, or traces of distributed requests. For instance, you might notice a spike in CPU usage (a metric), which correlates with a specific error in your application logs. Similarly, tracing can help pinpoint how a specific request flows through different services and where it might be experiencing latency or errors.
Without a single pane of glass, you would have to:
Open your metrics dashboard (e.g., Prometheus, Grafana, Datadog) to identify when the CPU spike occurred that you were alerted for.
Jump to your logging system (e.g. Elastic Stack, Splunk) to find related logs.
Navigate to your tracing tool (e.g., Jaeger, OpenTelemetry, Dynatrace) to correlate with sampled traces for that time period.
This siloed approach wastes valuable time, and increases the risk of missing critical information.
Implementing a Single Pane of Glass
While tools like Elastic Stack, Datadog, and Honeycomb provide a unified solution for logs, metrics, and traces, not all observability stacks will be fully unified. However, you can still build an integrated workflow by linking these tools together where possible by:
Cross-linking between systems
Ensure that logs, metrics, and traces are connected by a common identifier, such as a request ID or a trace ID. This way, when viewing an error log, you can easily jump to the related trace or metric.
Dashboards
Create custom dashboards that show high-level RED (Rate, Errors, Duration) metrics alongside the most critical logs and traces. This allows you to quickly visualize both overall system health and detailed information.
It is also useful to annotate time series visualizations with system deployment and feature releases, as another set of event data points that can be correlated. It's surprising (perhaps not to the seasoned veterans) how often differences in system behaviour can be observed around these times.
Contextual Linking
When a log entry mentions a specific service or component, ensure there's a direct link to related traces or metrics, allowing engineers to follow the breadcrumbs without switching context or manually searching.
In short, while these different observability data types may be collected in separate systems, the user experience should make it feel like they're all part of a cohesive whole.
Message Fingerprinting: Aggregating Similar Log Messages
Logs are invaluable for understanding the state of your system, but when it comes to identifying patterns or
diagnosing issues, it's easy to become overwhelmed by the sheer volume of logs produced by modern applications; large
distributed applications can produce hundreds of millions of log messages daily. Which warn
or error
level
log messages are happening most frequently? Which info
level messages are produced in
large volumes that are costing you a significant amount to store and process, but are no longer providing value?
Message fingerprinting is an approach to help address this by grouping together logs with the same structure but different variable data. The idea is to generate a unique fingerprint or hash for each type of log message, allowing you to easily aggregate and analyze patterns across your logs.
Example
Consider the following log entries:
User 1234 encountered error: Database connection failed.
User 5678 encountered error: Database connection failed.
produced by the following SLF4J logger in Java
var userID = 1234;
logger.info("User {} encountered error: Database connection failed.", userID);
While these log messages differ by User ID, they are essentially reporting the same issue. With message fingerprinting, you can group these logs based on a fingerprint of the raw log message format, which excludes variable data like the user ID.
Implementing Message Fingerprinting
There are a few steps to implement message fingerprinting:
Define log message templates
Ensure that log messages follow a consistent format. Use placeholders for variable information (e.g., user ID, error message) that will be ignored when generating the fingerprint. There's a good chance you're already doing this with your logging libraries of choice.
Generate fingerprints
Use a hashing algorithm to create a unique identifier for each log message based on its
structure, ignoring the variable parts. For the example above, hash the message format
User {} encountered error: Database connection failed
. A fast non-cryptographic hash function such as
MurmurHash is a reasonable choice since the function will
be called on every log event.
An example if you're using Logback is to use a Converter
package com.searchpioneer;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.core.pattern.DynamicConverter;
import com.google.common.hash.Hashing;
import java.nio.charset.StandardCharsets;
public class FingerprintConverter extends DynamicConverter<ILoggingEvent> {
@Override
public String convert(ILoggingEvent event) {
// hash the unformatted message
return hashMessage(event.getMessage());
}
private String hashMessage(String message) {
return Hashing.murmur3_32_fixed().hashString(message, StandardCharsets.UTF_8).toString();
}
}
and then register the converter to be able to use it in an encoder. Here it's registered in code
package com.searchpioneer;
import ch.qos.logback.classic.LoggerContext;
import ch.qos.logback.classic.encoder.PatternLayoutEncoder;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.core.ConsoleAppender;
import ch.qos.logback.core.CoreConstants;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.HashMap;
import java.util.Map;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
configureLogbackFingerprint();
logger.info("Application started");
try {
processBusinessLogic();
} catch (Exception e) {
logger.error("An error occurred during processing", e);
}
logger.info("Application finished successfully");
}
private static void processBusinessLogic() {
logger.debug("Starting business logic processing...");
for (int i = 0; i < 5; i++) {
logger.info("Processing item {}", i + 1);
}
if (Math.random() > 0.5) {
throw new RuntimeException("Simulated processing error");
}
logger.debug("Business logic processing completed");
}
private static void configureLogbackFingerprint() {
var context = (LoggerContext) LoggerFactory.getILoggerFactory();
context.reset();
@SuppressWarnings("unchecked")
Map<String, String> patternRuleRegistry = (Map<String, String>) context.getObject(CoreConstants.PATTERN_RULE_REGISTRY);
if (patternRuleRegistry == null) {
patternRuleRegistry = new HashMap<>();
}
context.putObject(CoreConstants.PATTERN_RULE_REGISTRY, patternRuleRegistry);
patternRuleRegistry.put("fingerprint", FingerprintConverter.class.getName());
PatternLayoutEncoder encoder = new PatternLayoutEncoder();
encoder.setContext(context);
encoder.setPattern("%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg - fingerprint:%fingerprint%n");
encoder.start();
var consoleAppender = new ConsoleAppender<ILoggingEvent>();
consoleAppender.setName("console");
consoleAppender.setEncoder(encoder);
consoleAppender.setContext(context);
consoleAppender.start();
var rootLogger = context.getLogger("ROOT");
rootLogger.addAppender(consoleAppender);
}
}
which outputs the following logs
2024-09-25 22:06:31 [main] INFO com.searchpioneer.Main - Application started - fingerprint:a99b5c06
2024-09-25 22:06:31 [main] DEBUG com.searchpioneer.Main - Starting business logic processing... - fingerprint:c8b3e191
2024-09-25 22:06:31 [main] INFO com.searchpioneer.Main - Processing item 1 - fingerprint:4e593e5c
2024-09-25 22:06:31 [main] INFO com.searchpioneer.Main - Processing item 2 - fingerprint:4e593e5c
2024-09-25 22:06:31 [main] INFO com.searchpioneer.Main - Processing item 3 - fingerprint:4e593e5c
2024-09-25 22:06:31 [main] INFO com.searchpioneer.Main - Processing item 4 - fingerprint:4e593e5c
2024-09-25 22:06:31 [main] INFO com.searchpioneer.Main - Processing item 5 - fingerprint:4e593e5c
2024-09-25 22:06:31 [main] ERROR com.searchpioneer.Main - An error occurred during processing - fingerprint:f87f72f2
java.lang.RuntimeException: Simulated processing error
at com.searchpioneer.Main.processBusinessLogic(Main.java:38)
at com.searchpioneer.Main.main(Main.java:23)
2024-09-25 22:06:31 [main] INFO com.searchpioneer.Main - Application finished successfully - fingerprint:cfe8ff06
Notice that the fingerprint generated for each log message with the format "Processing item {}"
is the same.
Aggregate logs
Once logs are fingerprinted, you can group and count similar log entries, providing a more aggregated view of what's happening in your system.
This approach helps in identifying the most pertinent recurring issues and filtering out redundant information. It's a useful operational practice to incorporate an audit of the most frequent error messages into your regular ops review.
MessageTemplates: Persisting Variable Data for Querying
While fingerprinting helps group logs with the same message format, MessageTemplates go a step further by extracting and persisting variable data from log messages as separate field values. This makes it easier to query logs based on specific criteria, such as user IDs, statuses, query terms, etc.
MessageTemplates aren't a new idea (the website and repository were created in 2016, and the idea is older still) but from our experience often underutilized in production systems.
Example
Using the same log example from above:
User 1234 encountered error: Database connection failed.
With MessageTemplates, you could define the log format as:
User {UserID} encountered error: {ErrorMessage}.
Here, {UserID}
and {ErrorMessage}
are placeholders for variable data. By persisting this variable information
as structured fields, you can easily search for logs where ErrorMessage
is "Database connection failed"
or where
UserID
is 1234
.
Implementing MessageTemplates
Structured logging
Use a structured logging library that supports message templating to do the heavy lifting for you. Instead of logging free-form text, log structured objects with variables extracted as fields. The MessageTemplates site has a list of implementations in different languages.
In Java, the Logstash Logback Encoder
provides StructuredArguments
to achieve a similar outcome to MessageTemplates:
package com.searchpioneer;
import ch.qos.logback.classic.LoggerContext;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.core.ConsoleAppender;
import net.logstash.logback.argument.StructuredArguments;
import net.logstash.logback.encoder.LogstashEncoder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
configureLogbackLogstash();
logger.info("Application started");
try {
processBusinessLogic();
} catch (Exception e) {
logger.error("An error occurred during processing", e);
}
logger.info("Application finished successfully");
}
private static void processBusinessLogic() {
logger.debug("Starting business logic processing...");
for (int i = 0; i < 5; i++) {
logger.info("Processing item {}", StructuredArguments.value("message_properties.item", i + 1));
}
if (Math.random() > 0.5) {
throw new RuntimeException("Simulated processing error");
}
logger.debug("Business logic processing completed");
}
private static void configureLogbackLogstash() {
var context = (LoggerContext) LoggerFactory.getILoggerFactory();
context.reset();
var logstashEncoder = new LogstashEncoder();
logstashEncoder.setContext(context);
logstashEncoder.start();
var consoleAppender = new ConsoleAppender<ILoggingEvent>();
consoleAppender.setName("console");
consoleAppender.setEncoder(logstashEncoder);
consoleAppender.setContext(context);
consoleAppender.start();
var rootLogger = context.getLogger("ROOT");
rootLogger.addAppender(consoleAppender);
}
}
which outputs
{"@timestamp":"2024-09-27T14:29:28.5142659+10:00","@version":"1","message":"Application started","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-09-27T14:29:28.5187664+10:00","@version":"1","message":"Starting business logic processing...","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"DEBUG","level_value":10000}
{"@timestamp":"2024-09-27T14:29:28.519767+10:00","@version":"1","message":"Processing item 1","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"INFO","level_value":20000,"message_properties.item":1}
{"@timestamp":"2024-09-27T14:29:28.5202663+10:00","@version":"1","message":"Processing item 2","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"INFO","level_value":20000,"message_properties.item":2}
{"@timestamp":"2024-09-27T14:29:28.5202663+10:00","@version":"1","message":"Processing item 3","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"INFO","level_value":20000,"message_properties.item":3}
{"@timestamp":"2024-09-27T14:29:28.5202663+10:00","@version":"1","message":"Processing item 4","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"INFO","level_value":20000,"message_properties.item":4}
{"@timestamp":"2024-09-27T14:29:28.5207658+10:00","@version":"1","message":"Processing item 5","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"INFO","level_value":20000,"message_properties.item":5}
{"@timestamp":"2024-09-27T14:29:28.5207658+10:00","@version":"1","message":"An error occurred during processing","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"ERROR","level_value":40000,"stack_trace":"java.lang.RuntimeException: Simulated processing error\r\n\tat com.searchpioneer.Main.processBusinessLogic(Main.java:36)\r\n\tat com.searchpioneer.Main.main(Main.java:21)\r\n"}
{"@timestamp":"2024-09-27T14:29:28.521766+10:00","@version":"1","message":"Application finished successfully","logger_name":"com.searchpioneer.Main","thread_name":"main","level":"INFO","level_value":20000}
Observe that the field "message_properties.item"
is included in the JSON output with the structured argument value.
Log parsing
Ensure that your logging system is set up to recognize and parse message templates from logs, or receive structured log messages with parsed fields, and can store variable data as searchable fields.
Indexing
In systems like Elastic Stack, make sure to index these fields to allow querying, filtering, and aggregating. The majority
of the time, this means indexing as a "keyword"
data type
with doc_values enabled.
In a large distributed application maintained by many teams, you often need to be judicious about how many different fields each team is allowed to log, and also avoid conflicting field names for additional data. The latter can be addressed by prefixing fields with some team identifier, and the former can often be constrained in the logging stack e.g. limit the number of fields indexed with the prefix of a given team identifier.
By using MessageTemplates, you'll gain more control over how you query logs, leading to more precise troubleshooting and monitoring.
Using a Common Schema
As observability matures, the need for consistency in how logs, metrics, and traces are structured becomes critical. If each team in an organization logs data in its own format, it can be challenging to correlate logs across different services or to understand what's happening at a system level.
This is where adopting a common schema such as the Elastic Common Schema (ECS), or OpenTelemetry's Semantic Conventions can provide immense benefits. Not only can it bring consistency across an organization, it also helps to unify logs, metrics, and traces into a single data type, allowing them to be collectively indexed and queried.
Elastic Common Schema (ECS)
The Elastic Common Schema defines a consistent format for structuring log and event data. By using ECS, you ensure that logs from different systems and services share the same field names and structure. This makes it easier to search, visualize, and analyze data across your entire stack.
For example, ECS specifies field names like:
event.action
: The action captured by the event (e.g., "login" or "file deleted").user.id
: The ID of the user involved in the event.http.request.method
: The HTTP method used in a web request.
ECS is an open source specification and is converging with OpenTelemetry Semantic Conventions over time, making it a reasonable choice. The power in having a convention is less about the rules of the convention and more about the standardization and consistency it enforces. Less time debating minutiae and more time extracting value.
Implementing ECS
Define logging guidelines
Establish ECS as the standard schema for all services and teams in your organization. Larger organizations often have dedicated observability teams to manage logs, metrics, and traces centrally which can help in establishing improved practices.
Configure loggers
Set up your logging libraries to automatically use ECS field names. Many log aggregation tools like Filebeat already support ECS out of the box.
For a Java application wishing to log events using ECS, there are ECS logging libraries to work with the most common logging libraries.
If you're using Logback, the following example integrates ECS along
with message fingerprinting and MessageTemplates, by deriving from the EcsEncoder
provided by the ECS
logging library:
package com.searchpioneer;
import ch.qos.logback.classic.spi.ILoggingEvent;
import co.elastic.logging.JsonUtils;
import co.elastic.logging.logback.EcsEncoder;
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonGenerator;
import com.google.common.hash.Hashing;
import net.logstash.logback.marker.ObjectAppendingMarker;
import java.io.IOException;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
public class EcsWithFingerprintEncoder extends EcsEncoder {
private static final JsonFactory factory = new JsonFactory();
@Override
protected void addCustomFields(ILoggingEvent event, StringBuilder builder) {
var fingerprint = hashMessage(event.getMessage());
builder.append("\"message_fingerprint\":\"");
JsonUtils.quoteAsString(fingerprint, builder);
builder.append("\",");
var messageArgs = event.getArgumentArray();
if (messageArgs != null && messageArgs.length > 0 ) {
builder.append("\"message_properties\":");
var writer = new StringBuilderWriter(builder);
try {
JsonGenerator generator = factory.createGenerator(writer);
generator.writeStartObject();
for (Object o : messageArgs) {
if (o instanceof ObjectAppendingMarker objectAppendingMarker) {
objectAppendingMarker.writeTo(generator);
}
}
generator.writeEndObject();
generator.flush();
} catch (IOException e) {
throw new RuntimeException(e);
}
builder.append(",");
}
}
private String hashMessage(String message) {
return Hashing.murmur3_32_fixed().hashString(message, StandardCharsets.UTF_8).toString();
}
private static class StringBuilderWriter extends Writer {
private final StringBuilder sb;
public StringBuilderWriter(StringBuilder sb) {
this.sb = sb;
}
@Override
public void write(char[] cbuf, int off, int len) {
sb.append(cbuf, off, len);
}
@Override
public void flush() {
}
@Override
public void close() {
}
}
}
With the encoder configured in code
package com.searchpioneer;
import ch.qos.logback.classic.LoggerContext;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.core.ConsoleAppender;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
configureLogbackEcs();
logger.info("Application started");
try {
processBusinessLogic();
} catch (Exception e) {
logger.error("An error occurred during processing", e);
}
logger.info("Application finished successfully");
}
private static void processBusinessLogic() {
logger.debug("Starting business logic processing...");
for (int i = 0; i < 5; i++) {
logger.info("Processing item {}", i + 1);
}
if (Math.random() > 0.5) {
throw new RuntimeException("Simulated processing error");
}
logger.debug("Business logic processing completed");
}
private static void configureLogbackEcs() {
var context = (LoggerContext) LoggerFactory.getILoggerFactory();
context.reset();
var ecsEncoder = new EcsWithFingerprintEncoder();
ecsEncoder.setContext(context);
ecsEncoder.setServiceName("java-logging-example");
ecsEncoder.setServiceVersion("1.0.0");
ecsEncoder.setIncludeMarkers(true);
ecsEncoder.setIncludeOrigin(true);
ecsEncoder.start();
var consoleAppender = new ConsoleAppender<ILoggingEvent>();
consoleAppender.setName("console");
consoleAppender.setEncoder(ecsEncoder);
consoleAppender.setContext(context);
consoleAppender.start();
var rootLogger = context.getLogger("ROOT");
rootLogger.addAppender(consoleAppender);
}
}
the output looks as follows
{"@timestamp":"2024-09-27T04:50:15.967Z","log.level": "INFO","message":"Application started","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":24},"function":"main"}},"message_fingerprint":"a99b5c06"}
{"@timestamp":"2024-09-27T04:50:15.971Z","log.level":"DEBUG","message":"Starting business logic processing...","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":36},"function":"processBusinessLogic"}},"message_fingerprint":"c8b3e191"}
{"@timestamp":"2024-09-27T04:50:15.973Z","log.level": "INFO","message":"Processing item 1","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":38},"function":"processBusinessLogic"}},"message_fingerprint":"4e593e5c","message_properties":{"i":1}}
{"@timestamp":"2024-09-27T04:50:15.978Z","log.level": "INFO","message":"Processing item 2","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":38},"function":"processBusinessLogic"}},"message_fingerprint":"4e593e5c","message_properties":{"i":2}}
{"@timestamp":"2024-09-27T04:50:15.978Z","log.level": "INFO","message":"Processing item 3","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":38},"function":"processBusinessLogic"}},"message_fingerprint":"4e593e5c","message_properties":{"i":3}}
{"@timestamp":"2024-09-27T04:50:15.979Z","log.level": "INFO","message":"Processing item 4","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":38},"function":"processBusinessLogic"}},"message_fingerprint":"4e593e5c","message_properties":{"i":4}}
{"@timestamp":"2024-09-27T04:50:15.979Z","log.level": "INFO","message":"Processing item 5","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":38},"function":"processBusinessLogic"}},"message_fingerprint":"4e593e5c","message_properties":{"i":5}}
{"@timestamp":"2024-09-27T04:50:15.979Z","log.level":"ERROR","message":"An error occurred during processing","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":29},"function":"main"}},"message_fingerprint":"f87f72f2","error.type":"java.lang.RuntimeException","error.message":"Simulated processing error","error.stack_trace":"java.lang.RuntimeException: Simulated processing error\r\n\tat com.searchpioneer.Main.processBusinessLogic(Main.java:42)\r\n\tat com.searchpioneer.Main.main(Main.java:27)\r\n"}
{"@timestamp":"2024-09-27T04:50:15.980Z","log.level": "INFO","message":"Application finished successfully","ecs.version": "1.2.0","service.name":"java-logging-example","service.version":"1.0.0","event.dataset":"java-logging-example","process.thread.name":"main","log.logger":"com.searchpioneer.Main","log":{"origin":{"file":{"name":"Main.java","line":32},"function":"main"}},"message_fingerprint":"cfe8ff06"}
Observe that all log messages conform to the ECS schema, and include a message fingerprint. For those log messages
with StructuredArgument
placeholders, those values have been indexed separately in "message_properties"
.
By adopting a common schema like ECS, you reduce the friction involved in analyzing logs from different sources and improve overall observability.
Conclusion
Improving observability is not just about collecting more logs. it's about making those logs, along with metrics and traces, more actionable. By unifying logs, metrics, and tracing into a single pane of glass, fingerprinting similar logs, extracting variable information with MessageTemplates, and adopting a common schema like ECS, you can significantly enhance your ability to monitor and troubleshoot your system.
These practices empower your team to identify issues faster, reduce noise in log data, and gain a comprehensive view of your software's behaviour. Whether you're working with Elastic Stack, Prometheus, Jaegar, Datadog or other observability platforms, the techniques discussed here can help you achieve better insights and, ultimately, more reliable software systems.