Having explored the concepts, tools, and challenges of data ingestion in previous articles, we now turn to the critical decision-making process: choosing the right ingestion method for your unique security needs.
Effective security operations are all about getting the right data in front of analysts when they need it. While data ingestion is how that happens, an effective data ingestion strategy is more complicated than simply connecting data feeds from external sources and sending telemetry to a central dashboard.
It's important for you to consider the right ingestion method for your company's technical architecture, and this article will describe several key factors to consider, as well as how to decide on this.
Before anything else, it's vital that you understand your specific security data needs. This includes identifying the data that you need to ingest and your data sources, which could be endpoints, networks, or hybrid clouds. It's crucial to tailor your data ingestion strategy to your organization's unique risk tolerance objectives, overarching business objectives, specific industry type, and stringent compliance requirements.
For instance, a global bank handling billions in daily transactions will have vastly different data ingestion needs than a small retail business. The bank's strategy must prioritize comprehensive log retention for compliance across multiple jurisdictions, ensuring real-time data parsing for immediate threat correlation across diverse systems. Their risk tolerance for data loss or missed threats is exceptionally low, demanding a high-fidelity, extensive data lake that captures and retains raw logs for forensic analysis for 12+ months.
Framing Your Ingestion Strategy Around Business Needs: What to Ask
Before planning a data ingestion strategy, here are three questions to ask your team
Ultimately, understanding your data needs means aligning your ingestion strategy with how technical security metrics impact your revenue, operational disruption, and compliance risk. It's about ensuring your data feeds directly into quantifiable risk assessments and business-driven security decisions, moving beyond mere technical checkbox compliance.
Data ingestion into a centralized platform provides SecOps teams with a comprehensive view into their entire security architecture, and understanding what data needs to be ingested into a centralized platform like Google SecOps to provide that visibility is vital.
There are a number of different ways to ingest data into your security operations platform. Decide which method to use based on your business needs, team size, and the speed with which your SecOps teams must act. Let’s compare how the different data ingestion methods stack up:
Batch vs. real-time ingestion is a matter of scheduling. Batch ingestion involves ingesting large amounts of data at a predefined cadence. This might be daily or weekly, depending on the specific type of data being ingested. One major issue with this method is that data is not immediately available for analysis in case of an emergency.
Real-time ingestion is also called "streaming" and involves ingesting security data as it is created. The data is available quickly, sometimes within seconds (depending on the system), and can be readily used for analysis or incident response. Typically, this is the domain of streaming platforms like Apache Kafka.
When to use batch vs. real-time ingestion:
Push and pull methods of data ingestion describe how data is sent to the target system. In a push mechanism of ingestion, the data is proactively pushed from the source system into the target. A pull method of ingestion, however, means that the target system proactively fetches data from the source system.
Using either push or pull mechanisms can alter based on how often you need data types refreshed. A push method can be easier to use for real-time streaming data from external systems, whereas a pull method might be more effective for batch processing.
When to use push vs. pull mechanisms:
Agent-based and agentless data ingestion represent two distinct approaches to collecting data from various sources.
Agent-based ingestion means installing software on each data source to collect and transmit information. On the other hand, agentless ingestion involves using the existing infrastructure, APIs, and other non-invasive methods. Typically, agent-based ingestion involves tools like Chronicle Forwarder or other collection agents.
Agent-based ingestion is often more fine-grained than the agentless variety, so it will ultimately lead to more detail. Because agentless uses the existing infrastructure, however, it can be easier to set up.
When to use agent-based vs. agentless ingestion:
When it comes to data ingestion, you need to consider factors such as data volume, infrastructure, and any specific compliance requirements.
Data volume and velocity are important factors in discussions about how to ingest data.
You need to consider what infrastructure you already have in place and what resources can be allocated for security data ingestion. This can be a high volume of data that moves at high velocity, so lacking a platform that can accept multiple data sources or a partner familiar with data ingestion can complicate matters.
Data ingestion can run up against regulations like the General Data Protection Regulation (GDPR) and other compliance standards like HIPAA. Understanding the rules around complying with those rules, as well as your specific security requirements around data protection, may drive decisions with ingestion as well.
Data ingestion can be a difficult process with many competing factors. Understanding things like the resources available to you and any compliance requirements that may impact that work is a critical part of that.
Moreover, understanding what types of data ingestion are available and deciding whether to use batch or real-time processing, as well as different types of data capture, can change how easy the process really is. Ingestion doesn't have to be overly complicated, but it can be lengthy and fraught with challenges if you're not careful.
This is especially true for SecOps teams, especially those using Google SecOps. The right ingestion setup determines how quickly you gain visibility, enrich context, and respond to threats.
To understand how best to leverage hybrid cloud data and ingest it into Chronicle, make sure you check out Netenrich's Google SecOps 101 virtual bootcamp.
Your choice should depend on the urgency of your use cases and how quickly your SOC needs to respond to threats. Use batch ingestion for historical or compliance-focused data that doesn’t need immediate analysis. Use real-time ingestion when you need continuous monitoring, fast detection, or alerting.
Push ingestion offers low latency and real-time delivery, but can strain systems if too much data floods in at once. it is ideal for high-frequency or event-driven data.
Pull ingestion gives better control over timing and system load, but can introduce delays and may miss transient events if polling intervals are too broad. Use this method for scheduled syncs or batch data.
Use agent-based ingestion when you need detailed, continuous telemetry from endpoints or restricted systems. It offers deeper visibility but requires software deployment. Use agentless ingestion when quick setup or minimal impact is key. It’s best suited for environments with API access or where installing agents isn’t feasible due to policy or architecture constraints.
Netenrich helps you ingest the right telemetry, align it to Google’s Unified Data Model (UDM), and enrich it with the context your team needs from day one. We have helped enterprises cut onboarding time from weeks to hours, avoiding delays and missteps common in first-time Chronicle rollouts.
Learn more about essential Google Chronicle log types.