The Modern Security Data Stack

Kyle Polley & The Tarsal Team

Image

Introduction

Effectively managing high-scale data has never been more critical for cybersecurity teams. Managing logs from hundreds of critical cloud resources like Okta, AWS, Slack, and more has become the new norm. This constant influx of data, along with the need for efficient filtering, normalization, and enrichment, has intensified the challenge of detecting threats amidst the noise.

In the past, the volume of logs to manage was relatively modest. Security teams could rely on SIEM (Security Information and Event Management) tools to ingest, analyze, and alert on data. However, as cloud usage skyrockets and SaaS applications proliferate, the magnitude of log sources and throughput has mushroomed, outpacing the capabilities of traditional SIEMs.

Interestingly, this narrative mirrors the journey of data teams who once grappled with similar problems, but found refuge in the modern data stack. With one-click tools like Fivetran and data lake infrastructure like Snowflake, data teams have successfully created an infrastructure stack that offers them unparalleled scalability and flexibility.

Tarsal aims to bring the modern data stack to security teams. Tarsal is a Fivetran-like data ingest tool that seamlessly integrates with resources like Okta, AWS, and Slack to filter, normalize, transform, and enrich security logs automatically. With Tarsal, teams focus on what truly matters: analyzing and addressing threats to their organizations. 

The Modern Data Stack

The modern data stack can be defined along four key areas: the data layer, the data lake, the orchestration/automation layer, and the data exploration tooling. The modern data stack enables infinite scale by leveraging managed infrastructure like Snowflake, Fivetran, and serverless functions. Above all, it is modular and vendor-agnostic: it uses open standards, allowing teams to use the best tool for the job.

We discuss the modern data stack in our blog post Security Data Lake — The Future of Security Log Collection and Analysis.

The Modern Security Data Stack

Inspired by the capabilities of the Modern Data Stack, we propose a similar structure for cybersecurity: the Modern Security Data Stack. The Modern Security Data Stack is conceptually constructed around the same four areas as the Modern Data Stack:

Data Layer: A solution like Tarsal can provide powerful functionality for the data layer, seamlessly ingesting and refining raw security logs from various sources. By automating the process of filtering, normalizing, transforming, and enriching logs, it eliminates the challenge of managing that data. This allows security teams to focus on analysis and tackling security threats efficiently. Check out our previous blog post where we dive into the benefits of having a dedicated data layer such as Tarsal. 

Data Lake: For the data lake, a cloud-based storage solution should be utilized to store vast quantities of structured and unstructured data. Tarsal supports a wide variety of destinations like Snowflake, BigQuery, Databricks, Azure Data Lake, etc. These data lake infrastructures were built with petabyte-scale in mind, and are designed so that you forget about data analysis restrictions.

Orchestration/Automation Layer: Apache Airflow, widely used in data stacks, is equally potent for security stacks, providing the flexibility to author, schedule, and monitor workflows programmatically.

Managed services like Astronomer.io simplify the utilization of Apache Airflow, delivering its benefits without the hassle of setup and maintenance. These services enable teams to manage workflows as code, embracing best practices such as unit testing and CI/CD, thereby assuring a robust and efficient workflow system. Security-specific orchestration tools like Tines are also a  fantastic option.

Data Exploration Tooling: The best part about the modern security data stack is that you can use whatever data analysis tooling best fits your needs. These tools help security teams dig deeper into their data, uncovering patterns, anomalies, and insights. Typically, this involves the use of notebook-style interfaces for creating and sharing documents that contain live code, visualizations, and narrative text. In our opinion, you should set up multiple different types of data exploration tools so that your team can leverage different data stack infrastructure depending on the use case. We’d recommend Hex.Tech for managed notebooks as well as Preset for dashboards.

Just like the Modern Data Stack, the Modern Security Data Stack promotes flexibility. Teams can mix and match tools based on their needs. Furthermore, if your organization already has a data team, it would be useful to explore the tools they use. There might be ways for your security team to leverage some of their tooling, fostering cross-functional synergy and resource optimization.

Conclusion

The Modern Security Data Stack, powered by solutions like Tarsal, can be deployed with one-click ease. It takes the stress out of worrying about log throughput, normalization, and analysis, and allows teams to concentrate on interpreting the data and securing their organizations. In this way, the Modern Security Data Stack provides a way forward to tackle the increasing complexity of cybersecurity management in today’s cloud-driven world.

As we move into the future, the Modern Security Data Stack will become the new standard for cybersecurity teams, not just for its scalability and flexibility, but for its capacity to streamline tasks and maximize focus on critical cybersecurity initiatives. With tools like Tarsal paving the way, we are redefining cybersecurity for the modern age.