Reinventing Your Security Data Lake — A How-To Guide

Kyle Polley & The Tarsal Team

Image

Reinventing Your Security Data Lake: A How-To Guide

Introduction

Way too often, security teams treat their security data lakes like junk drawers for their audit logs, hoarding high-volume and/or rarely used data for “just in case” scenarios. Meanwhile, the logging pipeline for their Security Information and Event Management (SIEM) system receives focused attention and is equipped with advanced filtering, transformation, normalization, and enrichment features.

However, we’ve reached a crossroads. SIEMs have struggled to keep up in cost and scalability with the rising magnitude of security audit logs. Security teams need better data infrastructure to support scalable detection and response operations like threat intelligence enrichment, anomaly detection, and machine learning.

It’s time to declutter the junk drawer and elevate your data lake infrastructure from a “just-in-case” backup to a vital operational component. In this blog post, we’ll guide you through how to do that transformation, and how Tarsal can help every step of the way.

Revamping Your Security Data Lake

Step 1: Establish a Dedicated Data Ingestion System

As highlighted in our previous blog post, the foundation of an effective data lake lies in having a dedicated system, independent from your SIEM, that’s responsible for log collection, transformation, normalization, and enrichment. This standalone pipeline should funnel clean, standardized, and enhanced logs to your downstream systems: your SIEM, your data lake, or both. Centralizing data preparation also unifies detection logic across your SIEM and data lake.

Step 2: Choose your schema

In structuring your data ingestion, the first step is to determine your preferred schema. A lot of well-established schemas exist, including the Common Information Model (CIM), the Elastic Common Schema (ECS), and the Advanced Security Information Model (ASIM). Alternatively, you can tailor your schema to meet your specific requirements. At Tarsal, we advocate for the Open Cybersecurity Schema Framework (OCSF), given its expanding use across platforms.

Tarsal supports seamlessly transforming your logs to any of the above schemas, and even lets you implement multiple schemas. For example, you can automatically transform logs to ECS before sending them to your Elastic SIEM, and transform other logs to OCSF before sending them to your data lake. With Tarsal, such complex operations are executed in minutes.

Step 3: Plan Your Enrichment Processes

Once your data normalization plan is in place, the focus shifts to enriching your data. Useful enrichments include mapping IP addresses to a geolocation, doing Whois lookups for visited domains, and gathering threat intelligence data from active processes on your servers.

It’s important to understand which enrichments you need and to gather the appropriate tooling to enable them. Tarsal lets you enable enrichments across all your log sources within minutes.

Step 4: Transition SIEM’s Data Pipeline Logic to the New Engine

To upgrade your security data lake, you have to replicate your SIEM’s transformation logic into the newly separated ETL engine.This transition can be performed gradually and without affecting your SIEM, so there’s no downtime to worry about.

First, review all the ETL logic deployed in your SIEM. This involves documenting data sources, transformations, enrichments, and any interdependencies. Once this framework is understood, you can begin replicating these operations in your new ETL pipeline.

Tarsal makes it dead simple to replicate your SIEM’s ETL logic by supporting a wide range of out-of-the-box transformations and enrichments.

Step 5: Equip Your Security Team with Best-Of-Breed Data Lake Querying & Investigative Tools

A lot of a security data lake’s power is in its compatibility with best-of-breed data analysis tools. In a data lake, your security team can run ad-hoc queries with PopSQL, construct dashboards with Preset, and build dynamic or automated Incident Response playbooks with Jupyter Notebooks. More advanced tools, like SQL Chat, even allow AI-based interaction with your data!

The security data lake’s versatility far outstrips that of a SIEM’s, giving you the customizability you need to meet your team’s unique needs. With Tarsal, you have a partner to help make this transition smooth and impactful.

Conclusion

Revamping your security data lake may seem daunting, but with the right strategy, it becomes an achievable endeavor. By shifting the perspective on your data lake from a junk drawer to a cornerstone of your security operation, you can unlock tremendous value for your team. A well-tuned data lake enables advanced threat detection, offers enriched threat intelligence, and provides a customizable environment to adapt to your team’s specific needs.

Your transition becomes easier with Tarsal’s powerful ETL engine and data enrichment capabilities. Tarsal makes sure you have clean, useful, and accessible data ready to meet your detection and response requirements.

A robust security data lake is not just a nice-to-have—it’s an essential part of a mature security strategy. By investing in your data lake infrastructure now, you are future-proofing your organization and positioning yourself for security success in the evolving cybersecurity landscape.