Security Data Lake — The Future of Security Log Collection and Analysis
Kyle Polley & The Tarsal Team
Kyle Polley & The Tarsal Team
This wasn’t always the case; not too long ago, security teams primarily dealt with three types of data sources: application logs, network logs, and host logs. The entire infrastructure was managed on-premise, and the volume of data was relatively low. Today, the scenario is dramatically different: critical business services extend well beyond on-prem servers. Business communication tools, BI dashboards, and production services are distributed across Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS) providers. Each provider generates a unique audit log, burdening security teams with transforming these logs to align with their operational needs.
Unfortunately, traditional Security Information and Event Management (SIEM) systems struggle to keep up with the expanding data landscape, posing a considerable financial burden on security teams. This is exactly why we built Tarsal: to help security teams manage their data. We believe security engineers should focus on security, not on becoming experts in ETL pipelines.
Security teams are increasingly transitioning to security data lakes as they recognize traditional SIEM limitations in managing the growing volume and variety of security data. Instead of deploying a single tool that handles ETL, data storage, dashboarding, detection writing, alerting, ticketing, etc., security teams can use data lake infrastructure to deploy a collection of modular systems which excel in their singular purposes.
The shift to a security data lake offers several advantages, including scalability, cost-effectiveness, and analytical capabilities.
Security data lakes can handle large volumes of data and are designed to scale horizontally. Data can be retained indefinitely with zero additional effort. Once the audit log data pipeline is set up, the overhead and additional engineering effort necessary are minimal. With Tarsal, you can effortlessly ingest logs from your critical systems. Tarsal will do the heavy lifting of managing your data pipeline.
Unlike traditional SIEMs, security data lakes enable you to only pay for what you use. Like paying for cloud servers by the hour, security data lakes only charge you for the computer used to query and analyze data. There are little to no servers that need to be running 24/7/365. Compute can be created or destroyed depending on usage. You can further reduce costs by aggregating and partitioning your data based on either timestamp or entities like username, IP address, or hostname. With Tarsal, aggregation and partitioning comes out of the box and can be customized for your unique environment.
Unlike traditional SIEMs, security data lakes store data in an industry standard that can be used with any tool. Security engineers can query the data lake with Jupyter Notebooks to build investigative notebooks, build security-related dashboards with Apache Superset, or build workflow automation with their favorite programming language. Security engineers are no longer locked into a single vendor; they can use whichever tool best gets the job done. To aid in analytics, you can add an enrichment feature to your data layer to provide more context on your audit logs. For example, Tarsal customers can enable an enrichment feature to automatically pull IP address info from your logs and enrich it with GeoIP or Threat Intel data.
As cybersecurity operators tackle the challenges of an expanding data landscape, the security data lake has emerged as a powerful solution. Whether you already have a SIEM or are just getting started on your data lake journey, there is no better place to start than building your data layer. The data layer is the foundation of your data lake; having robust and scalable ingestion, normalization, and enrichment tooling is critical for the success of your security data lake.
While it is certainly possible to build your own data layer, you can save time and money by purchasing a reliable security data layer solution like Tarsal. With Tarsal, integrating all of your critical services into a robust and featureful ETL pipeline only takes a few minutes. Instead of managing data engineering complexities such as normalization schemas and data partitions, you and your team can focus on what really matters: using the data to prevent, detect, and respond to cyber threats.