Building Real-World Event Processing Systems
What You'll Build Today
Today you're building a production-grade log generator that demonstrates how real companies handle millions of events per second. By the end of this lesson, you'll have:
Multi-source log generator with configurable throughput (1-100k events/second)
Rate limiting and backpressure mechanisms using Redis-backed sliding window counters
Event sourcing foundation with Kafka integration for downstream processing
Circuit breaker patterns with Resilience4j for handling downstream failures
Comprehensive observability with Prometheus metrics and distributed tracing
Why This Matters: From Development to Netflix Scale
Every distributed system starts with events. Whether it's user interactions on Netflix (generating billions of viewing events), Uber ride requests (creating location streams), or Amazon order processing (triggering inventory updates), the ability to generate, throttle, and reliably deliver events is foundational.
Today's log generator isn't just about creating sample data. It's about understanding how event-driven architectures handle load, implement backpressure, and maintain system stability under stress. The patterns we implement today scale directly to systems processing millions of events per second. Netflix's viewing data pipeline, for instance, uses similar rate limiting and circuit breaker patterns to ensure their event ingestion doesn't overwhelm downstream analytics systems.
The architectural decisions we make around event generation—batching strategies, rate limiting algorithms, and failure handling—directly impact system reliability at scale. A poorly designed event generator can bring down entire data pipelines, while a well-architected one enables horizontal scaling and fault tolerance.
Understanding how to implement these patterns correctly at small scale provides the foundation for scaling to enterprise levels through horizontal partitioning and service replication.
System Design Deep Dive: Five Core Distributed Patterns
1. Configurable Rate Limiting with Sliding Window Algorithm
Traditional rate limiting often uses simple token buckets, but production systems require more sophisticated approaches. We implement a Redis-backed sliding window counter that provides precise rate control while allowing controlled bursts.
Trade-off Analysis
Fixed window algorithms can cause thundering herd problems at window boundaries. Token buckets provide smooth rate limiting but don't handle burst traffic well. Our sliding window approach gives precise rate control while allowing controlled bursts, at the cost of additional Redis operations.
Implementation Pattern
Each rate limit key maintains a sorted set in Redis with timestamps as scores. We remove expired entries and count current entries to determine if new requests can proceed. This pattern scales horizontally as multiple generator instances can coordinate through the shared Redis state.
Real-world Application
This exact pattern is used by Twitter's API rate limiting and GitHub's API quotas, where precise control over request rates is critical for maintaining service stability.
2. Event Sourcing with Immutable Log Streams
Our log generator produces events that follow event sourcing principles—each generated log entry is an immutable fact about something that happened. This isn't just academic purity; it's essential for building systems that can replay, debug, and scale.
Architectural Insight
Event sourcing solves the dual-write problem inherent in systems that need to update databases and send messages atomically. By treating the event stream as the source of truth, we eliminate complex two-phase commit scenarios.
Trade-off Analysis
Event sourcing provides complete audit trails and enables time-travel debugging, but increases storage requirements and adds complexity to query patterns. For log processing systems, this trade-off heavily favors the event sourcing approach.
Scale Implications
Netflix's data platform processes over 1 trillion events daily using event sourcing patterns. The ability to replay event streams enables their recommendation algorithms to be retrained on historical data without complex data migration.
3. Circuit Breaker Pattern for Downstream Protection
Our generator implements circuit breakers around Kafka publishing and Redis operations. This prevents cascading failures when downstream systems become unavailable.
Pattern Implementation
We use Resilience4j's circuit breaker with three states: CLOSED (normal operation), OPEN (failing fast), and HALF_OPEN (testing recovery). The circuit breaker monitors failure rates and response times, automatically transitioning between states.
Failure Mode Analysis
Without circuit breakers, a slow Kafka cluster would cause thread pool exhaustion in our generators, creating system-wide failures. With circuit breakers, we fail fast and preserve system resources for recovery.
Production Considerations
Uber's event generation system uses similar circuit breaker patterns to handle the variable load between their different geographical regions. When one region's Kafka cluster experiences issues, generators automatically degrade gracefully rather than cascading failures across regions.
4. Backpressure and Flow Control
High-throughput event generation requires sophisticated flow control to prevent overwhelming downstream systems. We implement multiple backpressure mechanisms.
Adaptive Batching
Our generator dynamically adjusts batch sizes based on downstream latency. When Kafka responds quickly, we increase batch sizes for efficiency. When latency increases, we reduce batch sizes to maintain responsiveness.
Queue Depth Monitoring
We monitor internal queue depths and implement load shedding when queues exceed thresholds. This prevents memory exhaustion and maintains system stability under extreme load.
Cooperative Scheduling
Multiple generator threads coordinate through shared metrics to ensure fair resource allocation and prevent resource starvation.
5. Observability and Performance Monitoring
Production event generators require comprehensive observability to understand performance characteristics and debug issues at scale.
Metrics Strategy
We expose key performance indicators through Micrometer: events per second, batch sizes, circuit breaker states, queue depths, and latency percentiles. These metrics feed into Prometheus for aggregation and alerting.
Distributed Tracing
Each generated event carries trace context that flows through the entire processing pipeline. This enables end-to-end latency tracking and bottleneck identification across distributed services.
Performance Implications
Our monitoring adds roughly 5% overhead to event generation throughput—an acceptable trade-off for the operational visibility it provides. LinkedIn's Kafka-based event platform uses similar observability patterns to maintain their multi-petabyte data streams.
Implementation Walkthrough: Building for Scale
Core Architecture Decisions
Our log generator follows a multi-threaded producer pattern with configurable thread pools. Each thread operates independently, coordinating only through shared rate limiting and circuit breaker state. This design enables horizontal scaling by running multiple generator instances.
Thread Pool Configuration
We size thread pools based on CPU cores and expected I/O latency. For Kafka-heavy workloads, we typically configure 2-4 threads per CPU core to account for I/O wait time.
Event Schema Design
Events follow a consistent schema with correlation IDs, timestamps, and configurable payload sizes. This enables downstream systems to implement efficient partitioning and indexing strategies.
Configuration and Extensibility
The generator accepts YAML-based configuration for:
Event generation rates (per thread, per instance)
Payload templates and randomization strategies
Kafka topic routing and partitioning
Circuit breaker thresholds and recovery timeouts
Rate limiting windows and burst allowances
This configuration-driven approach enables the same generator codebase to simulate different load patterns without code changes—essential for load testing and capacity planning.
Error Handling and Recovery
Production systems fail in unexpected ways. Our generator implements comprehensive error handling:
Transient Failure Recovery
Automatic retry with exponential backoff for temporary Kafka unavailability.
Poison Event Handling
Dead letter queues for events that consistently fail processing.
Resource Exhaustion Protection
Graceful degradation when memory or CPU limits are approached.
Production Considerations: Performance and Monitoring
Performance Characteristics
Our generator achieves 50,000+ events/second per instance on commodity hardware (4 CPU, 8GB RAM). Key performance factors:
Batch Size Optimization
Larger batches improve throughput but increase latency.
Serialization Efficiency
Avro serialization provides better performance than JSON for high-volume scenarios.
Connection Pooling
Kafka producer connection reuse eliminates connection overhead.
Monitoring and Alerting Strategy
Critical alerts include:
Event generation rate falling below configured thresholds
Circuit breaker open states lasting longer than recovery windows
Memory usage exceeding 80% of allocated heap space
Kafka producer queue depth growing continuously
These alerts enable proactive response to issues before they impact downstream systems.
Failure Scenarios and Recovery
We've tested several failure scenarios:
Kafka Cluster Unavailability
Generator switches to circuit breaker mode, continues accepting requests but fails fast.
Redis Unavailability
Rate limiting falls back to local token buckets with reduced accuracy.
Memory Pressure
Automatic load shedding maintains service availability at reduced throughput.
Scale Connection: FAANG-Level Patterns
The patterns implemented in our log generator directly mirror those used in large-scale production systems. Netflix's data platform, Uber's event streaming infrastructure, and Amazon's order processing pipelines all use similar rate limiting, circuit breaker, and event sourcing patterns. The key difference is scale—while our generator handles thousands of events per second, these systems handle millions. However, the architectural patterns remain fundamentally the same.
Understanding how to implement these patterns correctly at small scale provides the foundation for scaling to enterprise levels through horizontal partitioning and service replication.
Next Steps: Collection and Processing
Tomorrow we'll build the log collector service that consumes our generated events. We'll explore stream processing patterns, windowing operations, and building the first stages of our distributed log processing pipeline. The events we generate today become the foundation for implementing complex stream processing patterns that power real-time analytics at scale.
Getting Started: Build Your System
Prerequisites
Before starting, make sure you have:
Java 17 or higher installed
Maven 3.8 or higher
Docker and Docker Compose
A text editor (VS Code, IntelliJ, or similar)
curl for testing APIs
Step 1: Generate Your Project
Run the project generation script:
This creates your complete project structure with all source files, configuration, and infrastructure setup.
Step 2: Understanding the Project Structure
Take a moment to explore what was created:
Step 3: Start Infrastructure Services
Launch all supporting services (Kafka, Redis, PostgreSQL, Prometheus, Grafana):
This script will:
Start Docker containers for all infrastructure
Build your Java applications
Create necessary Kafka topics
Launch the log generator and API gateway
Wait about 2-3 minutes for all services to fully start.
Step 4: Verify Installation
Check that all services are running:
You should see healthy status responses from both services.
Step 5: Start Generating Logs
Trigger log generation through the API:
Your system is now generating events and sending them to Kafka.
Step 6: Monitor Your System
Open your browser and explore the monitoring dashboards:
Grafana Dashboard (admin/admin)
Prometheus Metrics
Zipkin Distributed Tracing
Step 7: Check Generation Status
Monitor the generator's performance:
You'll see output like:
Step 8: Run Integration Tests
Validate your system is working correctly:
This automated test will:
Verify all services are healthy
Start log generation
Confirm events are being produced
Check metrics are being collected
Validate the complete data flow
Step 9: Load Testing
Test your system under stress:
This runs a 60-second load test and monitors:
Events generated per second
Rate limiting behavior
System resource usage
Circuit breaker states
Step 10: Explore Kafka Messages
View the actual events being generated:
You'll see JSON events with correlation IDs, timestamps, and metadata.
Understanding What You Built
Data Flow
Here's what happens when you start generation:
API Gateway receives your start request and checks rate limits in Redis
Log Generator spawns multiple worker threads
Each worker thread generates batches of log events
Rate limiter checks Redis to ensure we don't exceed configured throughput
Circuit breaker monitors Kafka health before sending
Events are batched and sent to Kafka for downstream processing
PostgreSQL stores events for audit trail
Metrics are exported to Prometheus for monitoring
Grafana visualizes system performance in real-time
Key Components Explained
LogGenerationService.java
The heart of event generation. Creates worker threads that continuously generate events at configured rates. Uses adaptive batching to optimize throughput based on downstream latency.
RateLimitingService.java
Implements sliding window rate limiting using Redis sorted sets. Prevents system overload by enforcing configurable event generation limits.
KafkaProducerService.java
Handles Kafka communication with circuit breaker protection. Batches events for efficiency and handles failures gracefully.
Circuit Breakers
Monitor Kafka and Redis health. When failures exceed thresholds, they "open" and fail fast to prevent cascading failures.
Common Issues and Solutions
Issue: Services Won't Start
Problem: Docker containers fail to start or crash immediately.
Solution: Ensure ports 8080, 8090, 9092, 6379, 5432, 9090, 3000, and 9411 are not already in use. Stop any conflicting services.
Issue: No Events Being Generated
Problem: Status shows 0 events generated.
Solution: Check that Kafka is running (docker ps | grep kafka) and that the log-events topic was created successfully.
Issue: Rate Limiting Not Working
Problem: Events exceed configured rate limits.
Solution: Verify Redis is accessible (docker ps | grep redis). Check Redis connection in application.yml.
Issue: Circuit Breaker Always Open
Problem: Circuit breaker stays in OPEN state.
Solution: Check Kafka logs for errors. Increase circuit breaker thresholds in application.yml if Kafka is slower in your environment.
Experiments to Try
Experiment 1: Increase Throughput
Modify the configuration to generate more events:
Rebuild and observe how the system handles higher load.
Experiment 2: Simulate Kafka Failure
Observe how the system fails gracefully without crashing.
Experiment 3: Test Rate Limiting
Send many requests rapidly:
Check how many requests were rate-limited in Redis.
Experiment 4: Monitor Resource Usage
Watch system resources while generating events:
Observe CPU and memory usage under different load levels.
What You Learned
By completing this lesson, you now understand:
How to implement production-ready rate limiting using sliding window algorithms
Why circuit breakers prevent cascading failures in distributed systems
How event sourcing provides audit trails and replay capabilities
The importance of backpressure and flow control in high-throughput systems
How to build observable systems with metrics and distributed tracing
Real-world patterns used by companies like Netflix, Uber, and Amazon
How to scale event generation horizontally across multiple instances
Cleanup
When you're done exploring:
Looking Ahead: Day 3
Tomorrow we'll build the log collector service that reads from Kafka and processes these events. You'll learn about:
Stream processing with windowing operations
Real-time aggregations and analytics
Event filtering and routing patterns
Building the next stage of the distributed pipeline
The foundation you built today powers everything that comes next.