Enterprise SIEM solutions like Splunk and ArcSight cost six figures annually, and that’s before you factor in professional services. But here’s what most IT managers won’t tell you: you don’t need a six-figure SIEM to detect security threats, comply with regulations, and gain visibility into your infrastructure. A budget SIEM setup is not just possible—it’s becoming the standard for organizations that know where to allocate their security spend.
The challenge with building a budget SIEM isn’t capability; it’s operational discipline. Open-source tools are powerful, but they demand hands-on management. This guide walks you through building a production-ready SIEM on a shoestring budget, including architecture decisions, tool selection, and the operational practices that separate working setups from chaotic ones.
Understanding SIEM Fundamentals Before You Buy
Before you even consider open-source alternatives, understand what a SIEM actually does and what you genuinely need:
Core SIEM functions:
– Log aggregation — collecting logs from servers, applications, and network devices
– Normalization — converting different log formats into a standardized schema
– Indexing and search — making logs searchable and queryable
– Alerting — triggering notifications when suspicious patterns appear
– Compliance reporting — generating audit trails for regulatory requirements
– Forensic analysis — investigating incidents after they occur
The mistake most budget-conscious teams make is treating a SIEM setup budget decision as binary: “expensive enterprise product or nothing.” Reality is different. You need to map your specific requirements—compliance standards, log volume, number of data sources—before choosing components.
Questions to ask before implementation:
- How many GB of logs do you generate daily? (This drives storage costs)
- What compliance frameworks apply? (HIPAA, PCI-DSS, SOC 2, GDPR)
- How many security events require investigation monthly?
- What’s your acceptable incident detection latency?
- Do you have in-house Linux expertise?
The Architecture: Building Your SIEM on a Budget
A budget SIEM architecture typically consists of three layers: collection, processing, and visualization. Unlike enterprise SIEMs that bundle everything, you’ll source each layer separately and glue them together.
Log Sources → Shipper → Processing Engine → Storage → Visualization
(Servers, Apps, (Filebeat, (Elasticsearch, (Disk or (Kibana,
Network Devices) Fluentd) Splunk Light) S3) Grafana)
Why the Elastic Stack is the Budget Winner
The Elastic Stack (Elasticsearch, Logstash, Kibana) has become the de facto standard for budget SIEM implementations. Here’s why:
Elasticsearch is horizontally scalable—you add compute capacity by adding nodes, not by replacing infrastructure. Kibana gives you visualization and dashboarding that rivals enterprise products. Logstash handles log normalization. The entire stack is open-source and free for single-node clusters (though production deployments need licensing).
But here’s the critical detail: Elastic’s licensing changed in 2021. The “free tier” is limited to basic features. For production SIEM work, you’ll need either:
- Elastic Cloud (managed service) — $25-50/month for small deployments
- Self-hosted with commercial license — roughly $4-5 per GB ingested
- Open-source alternatives — Splunk Light, Graylog, or pure open-source toolchains
The cost-benefit inflection point is around 500 GB of logs per day. Below that, managed solutions win. Above that, self-hosted becomes economical.
Open-Source Alternative: The Splunk Light + ELK Hybrid
Many budget-conscious teams run Splunk Light on top of open-source log shippers. Splunk Light indexes up to 500 MB/day for free but requires a Forwarder license ($500-1000/year per instance). For reference, Splunk Enterprise costs $6,000-12,000 per year minimum.
Realistic monthly costs with Splunk Light:
– Splunk Light licenses: $100-500
– Compute (AWS, DigitalOcean, self-hosted): $200-500
– Storage: $50-200
– Total: $350-1,200/month
Compare that to:
– Elastic Cloud Standard: $2,000-8,000/month for equivalent capacity
– Splunk Enterprise: $6,000-18,000/month minimum
Building Your Budget SIEM: The Complete Setup
Step 1: Choose Your Core Engine
Option A: Elasticsearch + Logstash + Kibana (ELK Stack)
Pros:
– Completely open-source and free-tier capable
– Massive community and plugin ecosystem
– Excellent for high-volume environments (1TB+ logs/day)
Cons:
– Operational overhead—you manage scaling, backups, security patches
– Requires Linux/DevOps expertise
– Parsing complex logs requires custom Logstash filters
Option B: Splunk Light with Open-Source Shippers
Pros:
– Excellent out-of-box parsing for common applications
– Single vendor support (Splunk Inc.)
– Simpler to operationalize for smaller teams
Cons:
– Per-GB licensing costs escalate quickly
– Less transparent cost structure
– Limited functionality in “Light” edition
Option C: Graylog Open Source
Pros:
– Purpose-built SIEM (not a general log platform adapted for security)
– Cleaner UI than ELK for security operations
– Built-in GeoIP and threat intelligence
Cons:
– Smaller community than ELK
– Enterprise features cost extra
– Less mature for very large deployments
My recommendation for most organizations: Start with ELK if you have DevOps resources, Splunk Light if you don’t.
Step 2: Set Up Log Collection
Log collection is where most budget SIEM deployments fail operationally. You need a shipper that’s lightweight (won’t bog down production systems) and reliable (won’t lose logs if the SIEM is down).
Filebeat (part of Elastic ecosystem) is the industry standard:
# filebeat.yml - Basic configuration
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/auth.log
- /var/log/syslog
- /var/log/apache2/access.log
output.elasticsearch:
hosts: ["elasticsearch.internal:9200"]
protocol: "https"
username: "filebeat_user"
password: "${FILEBEAT_PASS}"
processors:
- add_host_metadata: ~
- add_docker_metadata: ~
Fluentd is another solid option if you’re collecting from containerized environments:
# fluent.conf
<source>
@type tail
path /var/log/containers/**/*json.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<match **>
@type elasticsearch
host elasticsearch.internal
port 9200
index_name logindex-%Y.%m.%d
</match>
Critical implementation detail: Configure log rotation and disk quotas on your log servers. A runaway application writing logs can fill a disk faster than your shipper can collect them, causing data loss.
Step 3: Deploy the Processing Engine
If using ELK, you need to decide between:
Lightweight setup (5-50 GB logs/day):
– Single Elasticsearch node, single Logstash instance
– 4 CPU, 8 GB RAM minimum
– Cost: $50-150/month on AWS
Medium setup (50-500 GB logs/day):
– 3-node Elasticsearch cluster (high availability)
– 2-3 Logstash instances with load balancing
– Storage: SSD for performance
– Cost: $300-800/month
Large setup (500 GB+ logs/day):
– 5+ node Elasticsearch cluster
– Dedicated master nodes
– Dedicated ingest nodes
– Separate Logstash pipeline servers
– Cost: $1,000-3,000/month
Here’s a production-ready Logstash configuration for normalization:
# logstash configuration
input {
beats {
port => 5044
ssl => true
ssl_certificate => "/etc/logstash/certs/server.crt"
ssl_key => "/etc/logstash/certs/server.key"
}
}
filter {
# Parse Apache/Nginx access logs
if [agent][type] == "filebeat" and [log][file][path] =~ "access.log" {
grok {
match => { "message" => "%{HTTPD_COMBINEDLOG}" }
remove_field => [ "message" ]
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
}
# Parse Linux authentication logs
if [log][file][path] =~ "auth.log" {
grok {
match => { "message" => "%{SYSLOGLINE}" }
}
}
# Add GeoIP enrichment
geoip {
source => "[source][ip]"
}
}
output {
elasticsearch {
hosts => ["elasticsearch.internal:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
user => "logstash_user"
password => "${LOGSTASH_PASS}"
}
}
Step 4: Configure Storage and Retention
Storage cost is typically 20-30% of your total SIEM budget. Most organizations over-provision storage “just in case.” Real-world retention policies:
- Security logs (authentication, access): 90 days
- Application logs: 30 days
- Network traffic logs: 7-14 days (huge volume)
- Archived/compliance logs: 1-7 years (cold storage, not searchable)
Use Elasticsearch Index Lifecycle Management (ILM) to automate this:
{
"policy": "logs-policy",
"phases": {
"hot": {
"min_age": "0d",
"actions": {
"rollover": {
"max_primary_shard_size": "50gb"
},
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"set_priority": {
"priority": 50
},
"forcemerge": {
"max_num_segments": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"searchable_snapshot": {},
"set_priority": {
"priority": 0
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
This configuration keeps 7 days of hot data (searchable, fully indexed), 23 days in warm tier (read-only, optimized), and deletes after 90 days. Adjust based on your compliance requirements.
Step 5: Build Detection Rules
A SIEM without detection rules is just expensive storage. Start with these high-impact, low-maintenance rules:
Rule 1: Failed Login Brute Force
{
"name": "Multiple Failed Logins from Single Source",
"query": "source.ip:* AND event.outcome:failure AND process.name:sshd",
"threshold": 10,
"timeframe": "5m",
"action": "alert"
}
Rule 2: Privilege Escalation Attempt
{
"name": "Sudo Usage by Unexpected User",
"query": "process.name:sudo AND NOT user.name:(root OR serviceaccount)",
"threshold": 1,
"timeframe": "1m",
"action": "alert"
}
Rule 3: Unusual Outbound Traffic
{
"name": "Traffic to Non-Standard Ports",
"query": "destination.port:(>32768 AND <65535) AND event.category:network AND destination.ip:* AND NOT destination.ip:10.0.0.0/8",
"threshold": 100,
"timeframe": "5m",
"action": "alert"
}
Start with 5-10 high-confidence rules. False positives destroy SIEM adoption. You can add more later as you understand your baseline.
Building on a Budget: Cost Breakdown
| Component | Small (50 GB/day) | Medium (200 GB/day) | Large (500 GB/day) |
|---|---|---|---|
| Compute | $50-100 | $200-300 | $500-800 |
| Storage (3-month retention) | $30-50 | $100-150 | $200-400 |
| Backup/Disaster Recovery | $20-30 | $50-100 | $100-200 |
| Licensing | $0-100 | $100-500 | $500-2,000 |
| Total Monthly | $100-280 | $450-1,050 | $1,300-3,400 |
These are aggressive estimates assuming self-managed ELK on cloud VMs. AWS EC2 Reserved Instances can cut compute costs 40-50% if you commit long-term.
Operational Best Practices for Budget SIEMs
Monitoring Your SIEM Itself
Your SIEM is a critical system. You need to monitor it. This is where teams fail operationally.
# Prometheus metrics to scrape from Elasticsearch
- job_name: 'elasticsearch'
static_configs:
- targets: ['elasticsearch.internal:9200']
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
Key metrics to alert on:
– Elasticsearch cluster health: Yellow or red = investigation needed
– Heap usage: >85% = add nodes or reduce retention
– Unassigned shards: Indicates node failures
– Index failures: Logstash pipeline issues
– Disk utilization: >80% = delete old indices or add storage
Automation for Operational Efficiency
Without automation, a budget SIEM becomes a full-time job. Use these to buy back operational time:
Automated index maintenance:
#!/bin/bash
# Delete indices older than 90 days
curl -X DELETE "elasticsearch.internal:9200/*-$(date -d '90 days ago' +%Y.%m.%d)*"
Automated backup:
# Create snapshots for disaster recovery
curl -X PUT "elasticsearch.internal:9200/_snapshot/backup" \
-H "Content-Type: application/json" \
-d '{
"type": "s3",
"settings": {
"bucket": "my-siem-backups",
"base_path": "elasticsearch"
}
}'
Alert noise reduction:
Use correlation rules to reduce false positives. Instead of alerting on every failed login, alert on “5+ failed logins followed by a successful login in the same 10-minute window.”
Tuning for Performance
A slow SIEM is a SIEM nobody uses. Performance tuning pays dividends:
Optimize Logstash pipelines:
– Use mutate filters to drop unnecessary fields before indexing
– Compress payloads when shipping to Elasticsearch
– Use conditional logic to skip expensive filters for irrelevant logs
Optimize Elasticsearch queries:
– Use filters (cached) instead of queries where possible
– Limit time range for interactive searches
– Create pre-aggregated dashboards instead of running expensive aggregations in real-time
Sample configuration for high-volume environments:
# logstash.conf - Performance optimized
pipeline.workers: 8
pipeline.batch.size: 250
pipeline.batch.delay: 5
filter {
# Drop noisy, low-value logs early
if [message] =~ "health check" or [message] =~ "heartbeat" {
drop { }
}
# Use conditional logic
if [log][file][path] =~ "application.log" {
# Expensive parsing only for relevant logs
grok {
match => { "message" => "..." }
}
}
}
Common Pitfalls to Avoid
Pitfall 1: Under-provisioning storage
Teams forget that storage includes: raw data + replicas + temporary space for compaction. Budget 1.5x to 2x the raw data size.
Pitfall 2: Over-collecting logs
More logs ≠ better security. Debug logs from chatty applications cost money without value. Use sampling for non-critical logs.
Pitfall 3: Skipping security hardening
An unsecured SIEM is worse than no SIEM. Require TLS for all log shipping, use authentication, implement network segmentation.
Pitfall 4: Neglecting backup
Most budget SIEMs don’t have proper backups. A disk failure means data loss. Use Elasticsearch snapshots with S3 backup.
Pitfall 5: Fire-and-forget alerting
Without a process to respond to alerts, they become noise. Define who investigates what, with SLAs.
Compliance Considerations on a Budget
Regulations like HIPAA, PCI-DSS, and SOC 2 require log retention and audit trails. A budget SIEM can still be compliant:
- Log immutability: Use Elasticsearch’s read-only mode for archived indices
- Log encryption: TLS in transit, AES-256 at rest
- Access controls: RBAC using Elasticsearch roles
- Retention: Use ILM to enforce policy automatically
- Reporting: Use Kibana dashboards to generate compliance reports
For SOC 2 Type II, the SIEM itself becomes an audit point. Document your configuration, backup procedures, and access logs.
Migration Strategy: Moving from Spreadsheets
If you’re currently tracking security events in spreadsheets or manual logs, here’s your migration path:
Phase 1 (Month 1): Deploy log collectors and basic aggregation. Get logs flowing into your SIEM.
Phase 2 (Month 2-3): Normalize logs and build basic dashboards. Team becomes familiar with tool.
Phase 3 (Month 4): Implement first detection rules. Start investigating alerts.
Phase 4 (Month 5-6): Continuous tuning based on false positives and blind spots.
Most organizations are operationally ready after 3 months. Avoid the temptation to build a “perfect” SIEM before going live. Imperfect data flowing continuously beats perfect data that never launches.
Scaling Your Budget SIEM
As your organization grows, your SIEM architecture will need to evolve:
Stage 1 (0-100 GB/day): Single server, everything on one machine. Works, but fragile.
Stage 2 (100-500 GB/day): Separate Elasticsearch and Logstash. Add redundancy.
Stage 3 (500GB+/day): Elasticsearch cluster, multiple Logstash pipelines, message queue (Kafka) to decouple ingestion from processing.
You can spend $5,000+ per month on infrastructure for a large SIEM. The key is scaling linearly with need, not over-engineering upfront.
Final Verdict: Is a Budget SIEM Right for You?
A budget SIEM is worth building if:
– Your compliance requirements include log auditing
– You have > 2-3 TB of logs per month
– Your organization has at least one person who can manage Linux systems
– You want to detect security threats faster than “when the breach is discovered”
A budget SIEM is not worth building if:
– You have < 100 GB of logs per month (cheaper to use SaaS)
– You have zero DevOps/Linux expertise and no budget to hire
– You need white-glove support from a vendor
– Your compliance requirements are extremely strict (regulated financial institutions)
For most mid-sized organizations (50-500 employees), a budget SIEM built on ELK or Splunk Light costs $300-1,500 per month and delivers 80% of the value of a $10,000+ per month enterprise solution.
The operational discipline matters more than the tool. A well-tuned budget SIEM beats an under-utilized enterprise SIEM every time.