Security Data Pipeline Observability — Metrics You Can’t Afford to Ignore

Tarsal Team

Introduction

The security data pipeline serves as the beating heart of your cybersecurity organization, silently but efficiently pumping essential information that empowers companies to identify threats, address vulnerabilities, and maintain compliance. Disruptions to this pipeline can have far-reaching consequences, affecting various aspects of your security protocols.

At Tarsal, we don’t simply view data pipeline observability as necessary; we consider it foundational. The ability to monitor the right metrics provides security teams with control and confidence in their data-driven operations. This comprehensive guide will delve into the critical metrics imperative for establishing and maintaining a resilient and streamlined security data pipeline.

Metrics and Alerting

Expected Log Throughput

The volume of logs ingested per minute is a fundamental metric that offers immediate insights into the health of the data source and the receiving destination. If, for example, there is a sudden absence of new logs from a specific source within the past hour, this anomaly raises a clear red flag that demands thorough investigation. 

Our recommendation extends to monitoring throughput at the level of individual log sources and across the entirety of both Production and Development environments. Vigilant monitoring of log throughput is essential, with alerts set to trigger whenever deviations from the norm are detected.

Log Publish Latency

Log Publish Latency is a critical gauge of the time lapse between generating an event at its source and its subsequent arrival within your data pipeline. Consider the scenario where a security event occurs at 12:00 PM, but the pipeline receives the audit log detailing the event at 12:08 PM. The Log Publish Latency would be 8 minutes in this case.

Understanding the LPL provides invaluable insights into the performance of the sources generating audit logs. Any deviation from established norms can serve as an early warning sign that the source system may be encountering issues, allowing proactive steps to resolve these before they escalate into more significant problems. In addition, this metric is critical to security operations and engineering, such as threat detection and incident response, which rely on LPL to build scheduled queries and investigations.

Data Pipeline Latency

A modern data pipeline is a multi-layered architecture comprising transformations, enrichments, and normalizations. Each layer performs a specific function, often involving interactions with third-party services like threat intelligence APIs. Monitoring this metric is not just about improving performance; it’s about anticipating and eliminating bottlenecks that could lead to service degradation or outages.

Pipeline latency is an essential metric that gives you an overview of the efficiency of your data pipeline. For example, ingesting logs at a rate your pipeline can’t handle could lead to eventual failure or result in excessive storage costs.

Schema Drift/Mismatch

One of the most underestimated challenges in managing a data pipeline is schema drift. Any discrepancy between the incoming and expected data schema can spell disaster for downstream processes.  In an advanced setup, logs that deviate from the expected schema are often redirected to a Dead Letter Queue (DLQ), enabling teams to meticulously investigate the reasons behind their non-conformance. In addition to a DLQ, security teams should be notified immediately, as a schema drift will likely result in incorrect or missing data. Schema mismatch can sometimes result in total operational downtime without the correct tooling. 

Conclusion

Your security data pipeline is the core of your organization’s cybersecurity framework. A lapse in any of the discussed metrics could have far-reaching implications, affecting threat detection, incident management, and overall operational efficiency.

Tarsal provides immediate access to these critical metrics straight out of the box. Not only does Tarsal enable real-time alerting based on customizable conditions, but it also offers intuitive dashboards that make monitoring a breeze. Observability isn’t just a luxury feature; it’s a must-have for any organization serious about cybersecurity.