How to Set Up a SIEM on a Budget

Enterprise SIEM solutions like Splunk and ArcSight cost six figures annually, and that’s before you factor in professional services. But here’s what most IT managers won’t tell you: you don’t need a six-figure SIEM to detect security threats, comply with regulations, and gain visibility into your infrastructure. A budget SIEM setup is not just possible—it’s becoming the standard for organizations that know where to allocate their security spend.

The challenge with building a budget SIEM isn’t capability; it’s operational discipline. Open-source tools are powerful, but they demand hands-on management. This guide walks you through building a production-ready SIEM on a shoestring budget, including architecture decisions, tool selection, and the operational practices that separate working setups from chaotic ones.

Understanding SIEM Fundamentals Before You Buy

Before you even consider open-source alternatives, understand what a SIEM actually does and what you genuinely need:

Core SIEM functions:
– Log aggregation — collecting logs from servers, applications, and network devices
– Normalization — converting different log formats into a standardized schema
– Indexing and search — making logs searchable and queryable
– Alerting — triggering notifications when suspicious patterns appear
– Compliance reporting — generating audit trails for regulatory requirements
– Forensic analysis — investigating incidents after they occur

The mistake most budget-conscious teams make is treating a SIEM setup budget decision as binary: “expensive enterprise product or nothing.” Reality is different. You need to map your specific requirements—compliance standards, log volume, number of data sources—before choosing components.

Questions to ask before implementation:

How many GB of logs do you generate daily? (This drives storage costs)
What compliance frameworks apply? (HIPAA, PCI-DSS, SOC 2, GDPR)
How many security events require investigation monthly?
What’s your acceptable incident detection latency?
Do you have in-house Linux expertise?

The Architecture: Building Your SIEM on a Budget

A budget SIEM architecture typically consists of three layers: collection, processing, and visualization. Unlike enterprise SIEMs that bundle everything, you’ll source each layer separately and glue them together.

Log Sources → Shipper → Processing Engine → Storage → Visualization
(Servers, Apps,  (Filebeat,  (Elasticsearch,  (Disk or   (Kibana,
Network Devices) Fluentd)    Splunk Light)    S3)        Grafana)

Why the Elastic Stack is the Budget Winner

The Elastic Stack (Elasticsearch, Logstash, Kibana) has become the de facto standard for budget SIEM implementations. Here’s why:

Elasticsearch is horizontally scalable—you add compute capacity by adding nodes, not by replacing infrastructure. Kibana gives you visualization and dashboarding that rivals enterprise products. Logstash handles log normalization. The entire stack is open-source and free for single-node clusters (though production deployments need licensing).

But here’s the critical detail: Elastic’s licensing changed in 2021. The “free tier” is limited to basic features. For production SIEM work, you’ll need either:

Elastic Cloud (managed service) — $25-50/month for small deployments
Self-hosted with commercial license — roughly $4-5 per GB ingested
Open-source alternatives — Splunk Light, Graylog, or pure open-source toolchains

The cost-benefit inflection point is around 500 GB of logs per day. Below that, managed solutions win. Above that, self-hosted becomes economical.

Open-Source Alternative: The Splunk Light + ELK Hybrid

Many budget-conscious teams run Splunk Light on top of open-source log shippers. Splunk Light indexes up to 500 MB/day for free but requires a Forwarder license ($500-1000/year per instance). For reference, Splunk Enterprise costs $6,000-12,000 per year minimum.

Realistic monthly costs with Splunk Light:
– Splunk Light licenses: $100-500
– Compute (AWS, DigitalOcean, self-hosted): $200-500
– Storage: $50-200
– Total: $350-1,200/month

Compare that to:
– Elastic Cloud Standard: $2,000-8,000/month for equivalent capacity
– Splunk Enterprise: $6,000-18,000/month minimum

Building Your Budget SIEM: The Complete Setup

Step 1: Choose Your Core Engine

Option A: Elasticsearch + Logstash + Kibana (ELK Stack)

Pros:
– Completely open-source and free-tier capable
– Massive community and plugin ecosystem
– Excellent for high-volume environments (1TB+ logs/day)

Cons:
– Operational overhead—you manage scaling, backups, security patches
– Requires Linux/DevOps expertise
– Parsing complex logs requires custom Logstash filters

Option B: Splunk Light with Open-Source Shippers

Pros:
– Excellent out-of-box parsing for common applications
– Single vendor support (Splunk Inc.)
– Simpler to operationalize for smaller teams

Cons:
– Per-GB licensing costs escalate quickly
– Less transparent cost structure
– Limited functionality in “Light” edition

Option C: Graylog Open Source

Pros:
– Purpose-built SIEM (not a general log platform adapted for security)
– Cleaner UI than ELK for security operations
– Built-in GeoIP and threat intelligence

Cons:
– Smaller community than ELK
– Enterprise features cost extra
– Less mature for very large deployments

My recommendation for most organizations: Start with ELK if you have DevOps resources, Splunk Light if you don’t.

Step 2: Set Up Log Collection

Log collection is where most budget SIEM deployments fail operationally. You need a shipper that’s lightweight (won’t bog down production systems) and reliable (won’t lose logs if the SIEM is down).

Filebeat (part of Elastic ecosystem) is the industry standard:

# filebeat.yml - Basic configuration
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/auth.log
    - /var/log/syslog
    - /var/log/apache2/access.log

output.elasticsearch:
  hosts: ["elasticsearch.internal:9200"]
  protocol: "https"
  username: "filebeat_user"
  password: "${FILEBEAT_PASS}"

processors:
  - add_host_metadata: ~
  - add_docker_metadata: ~

Fluentd is another solid option if you’re collecting from containerized environments:

# fluent.conf
<source>
  @type tail
  path /var/log/containers/**/*json.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  read_from_head true
  <parse>
    @type json
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </parse>
</source>

<match **>
  @type elasticsearch
  host elasticsearch.internal
  port 9200
  index_name logindex-%Y.%m.%d
</match>

Critical implementation detail: Configure log rotation and disk quotas on your log servers. A runaway application writing logs can fill a disk faster than your shipper can collect them, causing data loss.

Step 3: Deploy the Processing Engine

If using ELK, you need to decide between:

Lightweight setup (5-50 GB logs/day):
– Single Elasticsearch node, single Logstash instance
– 4 CPU, 8 GB RAM minimum
– Cost: $50-150/month on AWS

Medium setup (50-500 GB logs/day):
– 3-node Elasticsearch cluster (high availability)
– 2-3 Logstash instances with load balancing
– Storage: SSD for performance
– Cost: $300-800/month

Large setup (500 GB+ logs/day):
– 5+ node Elasticsearch cluster
– Dedicated master nodes
– Dedicated ingest nodes
– Separate Logstash pipeline servers
– Cost: $1,000-3,000/month

Here’s a production-ready Logstash configuration for normalization:

# logstash configuration
input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/logstash/certs/server.crt"
    ssl_key => "/etc/logstash/certs/server.key"
  }
}

filter {
  # Parse Apache/Nginx access logs
  if [agent][type] == "filebeat" and [log][file][path] =~ "access.log" {
    grok {
      match => { "message" => "%{HTTPD_COMBINEDLOG}" }
      remove_field => [ "message" ]
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      remove_field => [ "timestamp" ]
    }
  }

  # Parse Linux authentication logs
  if [log][file][path] =~ "auth.log" {
    grok {
      match => { "message" => "%{SYSLOGLINE}" }
    }
  }

  # Add GeoIP enrichment
  geoip {
    source => "[source][ip]"
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch.internal:9200"]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
    user => "logstash_user"
    password => "${LOGSTASH_PASS}"
  }
}

Step 4: Configure Storage and Retention

Storage cost is typically 20-30% of your total SIEM budget. Most organizations over-provision storage “just in case.” Real-world retention policies:

Security logs (authentication, access): 90 days
Application logs: 30 days
Network traffic logs: 7-14 days (huge volume)
Archived/compliance logs: 1-7 years (cold storage, not searchable)

Use Elasticsearch Index Lifecycle Management (ILM) to automate this:

{
  "policy": "logs-policy",
  "phases": {
    "hot": {
      "min_age": "0d",
      "actions": {
        "rollover": {
          "max_primary_shard_size": "50gb"
        },
        "set_priority": {
          "priority": 100
        }
      }
    },
    "warm": {
      "min_age": "7d",
      "actions": {
        "set_priority": {
          "priority": 50
        },
        "forcemerge": {
          "max_num_segments": 1
        }
      }
    },
    "cold": {
      "min_age": "30d",
      "actions": {
        "searchable_snapshot": {},
        "set_priority": {
          "priority": 0
        }
      }
    },
    "delete": {
      "min_age": "90d",
      "actions": {
        "delete": {}
      }
    }
  }
}

This configuration keeps 7 days of hot data (searchable, fully indexed), 23 days in warm tier (read-only, optimized), and deletes after 90 days. Adjust based on your compliance requirements.

Step 5: Build Detection Rules

A SIEM without detection rules is just expensive storage. Start with these high-impact, low-maintenance rules:

Rule 1: Failed Login Brute Force

{
  "name": "Multiple Failed Logins from Single Source",
  "query": "source.ip:* AND event.outcome:failure AND process.name:sshd",
  "threshold": 10,
  "timeframe": "5m",
  "action": "alert"
}

Rule 2: Privilege Escalation Attempt

{
  "name": "Sudo Usage by Unexpected User",
  "query": "process.name:sudo AND NOT user.name:(root OR serviceaccount)",
  "threshold": 1,
  "timeframe": "1m",
  "action": "alert"
}

Rule 3: Unusual Outbound Traffic

{
  "name": "Traffic to Non-Standard Ports",
  "query": "destination.port:(>32768 AND <65535) AND event.category:network AND destination.ip:* AND NOT destination.ip:10.0.0.0/8",
  "threshold": 100,
  "timeframe": "5m",
  "action": "alert"
}

Start with 5-10 high-confidence rules. False positives destroy SIEM adoption. You can add more later as you understand your baseline.

Building on a Budget: Cost Breakdown

Component	Small (50 GB/day)	Medium (200 GB/day)	Large (500 GB/day)
Compute	$50-100	$200-300	$500-800
Storage (3-month retention)	$30-50	$100-150	$200-400
Backup/Disaster Recovery	$20-30	$50-100	$100-200
Licensing	$0-100	$100-500	$500-2,000
Total Monthly	$100-280	$450-1,050	$1,300-3,400

These are aggressive estimates assuming self-managed ELK on cloud VMs. AWS EC2 Reserved Instances can cut compute costs 40-50% if you commit long-term.

Operational Best Practices for Budget SIEMs

Monitoring Your SIEM Itself

Your SIEM is a critical system. You need to monitor it. This is where teams fail operationally.

# Prometheus metrics to scrape from Elasticsearch
- job_name: 'elasticsearch'
  static_configs:
    - targets: ['elasticsearch.internal:9200']
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target

Key metrics to alert on:
– Elasticsearch cluster health: Yellow or red = investigation needed
– Heap usage: >85% = add nodes or reduce retention
– Unassigned shards: Indicates node failures
– Index failures: Logstash pipeline issues
– Disk utilization: >80% = delete old indices or add storage

Automation for Operational Efficiency

Without automation, a budget SIEM becomes a full-time job. Use these to buy back operational time:

Automated index maintenance:

#!/bin/bash
# Delete indices older than 90 days
curl -X DELETE "elasticsearch.internal:9200/*-$(date -d '90 days ago' +%Y.%m.%d)*"

Automated backup:

# Create snapshots for disaster recovery
curl -X PUT "elasticsearch.internal:9200/_snapshot/backup" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "s3",
    "settings": {
      "bucket": "my-siem-backups",
      "base_path": "elasticsearch"
    }
  }'

Alert noise reduction:
Use correlation rules to reduce false positives. Instead of alerting on every failed login, alert on “5+ failed logins followed by a successful login in the same 10-minute window.”

Tuning for Performance

A slow SIEM is a SIEM nobody uses. Performance tuning pays dividends:

Optimize Logstash pipelines:
– Use mutate filters to drop unnecessary fields before indexing
– Compress payloads when shipping to Elasticsearch
– Use conditional logic to skip expensive filters for irrelevant logs

Optimize Elasticsearch queries:
– Use filters (cached) instead of queries where possible
– Limit time range for interactive searches
– Create pre-aggregated dashboards instead of running expensive aggregations in real-time

Sample configuration for high-volume environments:

# logstash.conf - Performance optimized
pipeline.workers: 8
pipeline.batch.size: 250
pipeline.batch.delay: 5

filter {
  # Drop noisy, low-value logs early
  if [message] =~ "health check" or [message] =~ "heartbeat" {
    drop { }
  }

  # Use conditional logic
  if [log][file][path] =~ "application.log" {
    # Expensive parsing only for relevant logs
    grok {
      match => { "message" => "..." }
    }
  }
}

Common Pitfalls to Avoid

Pitfall 1: Under-provisioning storage
Teams forget that storage includes: raw data + replicas + temporary space for compaction. Budget 1.5x to 2x the raw data size.

Pitfall 2: Over-collecting logs
More logs ≠ better security. Debug logs from chatty applications cost money without value. Use sampling for non-critical logs.

Pitfall 3: Skipping security hardening
An unsecured SIEM is worse than no SIEM. Require TLS for all log shipping, use authentication, implement network segmentation.

Pitfall 4: Neglecting backup
Most budget SIEMs don’t have proper backups. A disk failure means data loss. Use Elasticsearch snapshots with S3 backup.

Pitfall 5: Fire-and-forget alerting
Without a process to respond to alerts, they become noise. Define who investigates what, with SLAs.

Compliance Considerations on a Budget

Regulations like HIPAA, PCI-DSS, and SOC 2 require log retention and audit trails. A budget SIEM can still be compliant:

Log immutability: Use Elasticsearch’s read-only mode for archived indices
Log encryption: TLS in transit, AES-256 at rest
Access controls: RBAC using Elasticsearch roles
Retention: Use ILM to enforce policy automatically
Reporting: Use Kibana dashboards to generate compliance reports

For SOC 2 Type II, the SIEM itself becomes an audit point. Document your configuration, backup procedures, and access logs.

Migration Strategy: Moving from Spreadsheets

If you’re currently tracking security events in spreadsheets or manual logs, here’s your migration path:

Phase 1 (Month 1): Deploy log collectors and basic aggregation. Get logs flowing into your SIEM.

Phase 2 (Month 2-3): Normalize logs and build basic dashboards. Team becomes familiar with tool.

Phase 3 (Month 4): Implement first detection rules. Start investigating alerts.

Phase 4 (Month 5-6): Continuous tuning based on false positives and blind spots.

Most organizations are operationally ready after 3 months. Avoid the temptation to build a “perfect” SIEM before going live. Imperfect data flowing continuously beats perfect data that never launches.

Scaling Your Budget SIEM

As your organization grows, your SIEM architecture will need to evolve:

Stage 1 (0-100 GB/day): Single server, everything on one machine. Works, but fragile.

Stage 2 (100-500 GB/day): Separate Elasticsearch and Logstash. Add redundancy.

Stage 3 (500GB+/day): Elasticsearch cluster, multiple Logstash pipelines, message queue (Kafka) to decouple ingestion from processing.

You can spend $5,000+ per month on infrastructure for a large SIEM. The key is scaling linearly with need, not over-engineering upfront.

Final Verdict: Is a Budget SIEM Right for You?

A budget SIEM is worth building if:
– Your compliance requirements include log auditing
– You have > 2-3 TB of logs per month
– Your organization has at least one person who can manage Linux systems
– You want to detect security threats faster than “when the breach is discovered”

A budget SIEM is not worth building if:
– You have < 100 GB of logs per month (cheaper to use SaaS)
– You have zero DevOps/Linux expertise and no budget to hire
– You need white-glove support from a vendor
– Your compliance requirements are extremely strict (regulated financial institutions)

For most mid-sized organizations (50-500 employees), a budget SIEM built on ELK or Splunk Light costs $300-1,500 per month and delivers 80% of the value of a $10,000+ per month enterprise solution.

The operational discipline matters more than the tool. A well-tuned budget SIEM beats an under-utilized enterprise SIEM every time.

Affiliate Disclosure: This article may contain affiliate links. If you purchase through these links, TechChimney may earn a commission at no extra cost to you. We only recommend products we believe provide genuine value.