Production-Ready Log Generator

Lesson 2 2-3 hours

Building Real-World Event Processing Systems

Component Architecture

Log Generator System Architecture - Production-Ready Event Processing External Clients Load Balancer API Requests API Gateway Port: 8090 Rate Limiting Load Balancing Log Generator Port: 8080 50K+ events/sec Circuit Breakers Apache Kafka Port: 9092 Event Streaming 3 Partitions Redis Cache Port: 6379 Rate Limiting Session Storage PostgreSQL Port: 5432 Event Storage Audit Trail Monitoring Stack Prometheus :9090 Grafana :3000 Zipkin :9411 HTTP 1K req/s REST Rate Limited Events 50K/s Rate Check Persist Rate Limit Metrics FAILURE SCENARIO Circuit Breaker Open Fail Fast Scalability & Performance Characteristics Throughput • 50,000+ events/second • Configurable batch sizes • Multi-threaded generation Rate Limiting • Sliding window algorithm • Redis-backed coordination • Burst traffic handling Fault Tolerance • Circuit breaker patterns • Graceful degradation • Automatic recovery Observability • Real-time metrics • Distributed tracing • Performance dashboards Horizontal Scaling • Multiple instances • Load balancing • Auto-scaling ready Enterprise-Scale Design Patterns Event Sourcing ✓ Immutable event streams ✓ Complete audit trail ✓ Time-travel debugging ✓ Replay capability ✓ CQRS compatibility Used by: Netflix, Uber Rate Limiting ✓ Sliding window algorithm ✓ Distributed coordination ✓ Burst traffic handling ✓ Fair resource allocation ✓ Backpressure control Used by: Twitter, GitHub Circuit Breaker ✓ Fail fast mechanism ✓ Prevent cascading failures ✓ Automatic recovery ✓ Resource protection ✓ Graceful degradation Used by: Amazon, Microsoft Backpressure ✓ Adaptive batching ✓ Queue depth monitoring ✓ Load shedding ✓ Flow control ✓ Memory protection Used by: LinkedIn, Spotify Observability ✓ Metrics collection ✓ Distributed tracing ✓ Real-time dashboards ✓ Alerting system ✓ Performance insights Used by: Google, Facebook

What You'll Build Today

Today you're building a production-grade log generator that demonstrates how real companies handle millions of events per second. By the end of this lesson, you'll have:

  • Multi-source log generator with configurable throughput (1-100k events/second)

  • Rate limiting and backpressure mechanisms using Redis-backed sliding window counters

  • Event sourcing foundation with Kafka integration for downstream processing

  • Circuit breaker patterns with Resilience4j for handling downstream failures

  • Comprehensive observability with Prometheus metrics and distributed tracing


Why This Matters: From Development to Netflix Scale

Every distributed system starts with events. Whether it's user interactions on Netflix (generating billions of viewing events), Uber ride requests (creating location streams), or Amazon order processing (triggering inventory updates), the ability to generate, throttle, and reliably deliver events is foundational.

Today's log generator isn't just about creating sample data. It's about understanding how event-driven architectures handle load, implement backpressure, and maintain system stability under stress. The patterns we implement today scale directly to systems processing millions of events per second. Netflix's viewing data pipeline, for instance, uses similar rate limiting and circuit breaker patterns to ensure their event ingestion doesn't overwhelm downstream analytics systems.

The architectural decisions we make around event generation—batching strategies, rate limiting algorithms, and failure handling—directly impact system reliability at scale. A poorly designed event generator can bring down entire data pipelines, while a well-architected one enables horizontal scaling and fault tolerance.

Understanding how to implement these patterns correctly at small scale provides the foundation for scaling to enterprise levels through horizontal partitioning and service replication.


System Design Deep Dive: Five Core Distributed Patterns

1. Configurable Rate Limiting with Sliding Window Algorithm

Traditional rate limiting often uses simple token buckets, but production systems require more sophisticated approaches. We implement a Redis-backed sliding window counter that provides precise rate control while allowing controlled bursts.

Trade-off Analysis

Fixed window algorithms can cause thundering herd problems at window boundaries. Token buckets provide smooth rate limiting but don't handle burst traffic well. Our sliding window approach gives precise rate control while allowing controlled bursts, at the cost of additional Redis operations.

Implementation Pattern

Each rate limit key maintains a sorted set in Redis with timestamps as scores. We remove expired entries and count current entries to determine if new requests can proceed. This pattern scales horizontally as multiple generator instances can coordinate through the shared Redis state.

Real-world Application

This exact pattern is used by Twitter's API rate limiting and GitHub's API quotas, where precise control over request rates is critical for maintaining service stability.

2. Event Sourcing with Immutable Log Streams

Our log generator produces events that follow event sourcing principles—each generated log entry is an immutable fact about something that happened. This isn't just academic purity; it's essential for building systems that can replay, debug, and scale.

Architectural Insight

Event sourcing solves the dual-write problem inherent in systems that need to update databases and send messages atomically. By treating the event stream as the source of truth, we eliminate complex two-phase commit scenarios.

Trade-off Analysis

Event sourcing provides complete audit trails and enables time-travel debugging, but increases storage requirements and adds complexity to query patterns. For log processing systems, this trade-off heavily favors the event sourcing approach.

Scale Implications

Netflix's data platform processes over 1 trillion events daily using event sourcing patterns. The ability to replay event streams enables their recommendation algorithms to be retrained on historical data without complex data migration.

3. Circuit Breaker Pattern for Downstream Protection

Our generator implements circuit breakers around Kafka publishing and Redis operations. This prevents cascading failures when downstream systems become unavailable.

Pattern Implementation

We use Resilience4j's circuit breaker with three states: CLOSED (normal operation), OPEN (failing fast), and HALF_OPEN (testing recovery). The circuit breaker monitors failure rates and response times, automatically transitioning between states.

Failure Mode Analysis

Without circuit breakers, a slow Kafka cluster would cause thread pool exhaustion in our generators, creating system-wide failures. With circuit breakers, we fail fast and preserve system resources for recovery.

Production Considerations

Uber's event generation system uses similar circuit breaker patterns to handle the variable load between their different geographical regions. When one region's Kafka cluster experiences issues, generators automatically degrade gracefully rather than cascading failures across regions.

4. Backpressure and Flow Control

High-throughput event generation requires sophisticated flow control to prevent overwhelming downstream systems. We implement multiple backpressure mechanisms.

Adaptive Batching

Our generator dynamically adjusts batch sizes based on downstream latency. When Kafka responds quickly, we increase batch sizes for efficiency. When latency increases, we reduce batch sizes to maintain responsiveness.

Queue Depth Monitoring

We monitor internal queue depths and implement load shedding when queues exceed thresholds. This prevents memory exhaustion and maintains system stability under extreme load.

Cooperative Scheduling

Multiple generator threads coordinate through shared metrics to ensure fair resource allocation and prevent resource starvation.

5. Observability and Performance Monitoring

Production event generators require comprehensive observability to understand performance characteristics and debug issues at scale.

Metrics Strategy

We expose key performance indicators through Micrometer: events per second, batch sizes, circuit breaker states, queue depths, and latency percentiles. These metrics feed into Prometheus for aggregation and alerting.

Distributed Tracing

Each generated event carries trace context that flows through the entire processing pipeline. This enables end-to-end latency tracking and bottleneck identification across distributed services.

Performance Implications

Our monitoring adds roughly 5% overhead to event generation throughput—an acceptable trade-off for the operational visibility it provides. LinkedIn's Kafka-based event platform uses similar observability patterns to maintain their multi-petabyte data streams.


Implementation Walkthrough: Building for Scale

Core Architecture Decisions

Our log generator follows a multi-threaded producer pattern with configurable thread pools. Each thread operates independently, coordinating only through shared rate limiting and circuit breaker state. This design enables horizontal scaling by running multiple generator instances.

Thread Pool Configuration

We size thread pools based on CPU cores and expected I/O latency. For Kafka-heavy workloads, we typically configure 2-4 threads per CPU core to account for I/O wait time.

Event Schema Design

Events follow a consistent schema with correlation IDs, timestamps, and configurable payload sizes. This enables downstream systems to implement efficient partitioning and indexing strategies.

Configuration and Extensibility

The generator accepts YAML-based configuration for:

  • Event generation rates (per thread, per instance)

  • Payload templates and randomization strategies

  • Kafka topic routing and partitioning

  • Circuit breaker thresholds and recovery timeouts

  • Rate limiting windows and burst allowances

This configuration-driven approach enables the same generator codebase to simulate different load patterns without code changes—essential for load testing and capacity planning.

Error Handling and Recovery

Production systems fail in unexpected ways. Our generator implements comprehensive error handling:

Transient Failure Recovery

Automatic retry with exponential backoff for temporary Kafka unavailability.

Poison Event Handling

Dead letter queues for events that consistently fail processing.

Resource Exhaustion Protection

Graceful degradation when memory or CPU limits are approached.


Production Considerations: Performance and Monitoring

Performance Characteristics

Our generator achieves 50,000+ events/second per instance on commodity hardware (4 CPU, 8GB RAM). Key performance factors:

Batch Size Optimization

Larger batches improve throughput but increase latency.

Serialization Efficiency

Avro serialization provides better performance than JSON for high-volume scenarios.

Connection Pooling

Kafka producer connection reuse eliminates connection overhead.

Monitoring and Alerting Strategy

Critical alerts include:

  • Event generation rate falling below configured thresholds

  • Circuit breaker open states lasting longer than recovery windows

  • Memory usage exceeding 80% of allocated heap space

  • Kafka producer queue depth growing continuously

These alerts enable proactive response to issues before they impact downstream systems.

Failure Scenarios and Recovery

We've tested several failure scenarios:

Kafka Cluster Unavailability

Generator switches to circuit breaker mode, continues accepting requests but fails fast.

Redis Unavailability

Rate limiting falls back to local token buckets with reduced accuracy.

Memory Pressure

Automatic load shedding maintains service availability at reduced throughput.


Scale Connection: FAANG-Level Patterns

The patterns implemented in our log generator directly mirror those used in large-scale production systems. Netflix's data platform, Uber's event streaming infrastructure, and Amazon's order processing pipelines all use similar rate limiting, circuit breaker, and event sourcing patterns. The key difference is scale—while our generator handles thousands of events per second, these systems handle millions. However, the architectural patterns remain fundamentally the same.

Understanding how to implement these patterns correctly at small scale provides the foundation for scaling to enterprise levels through horizontal partitioning and service replication.


Next Steps: Collection and Processing

Tomorrow we'll build the log collector service that consumes our generated events. We'll explore stream processing patterns, windowing operations, and building the first stages of our distributed log processing pipeline. The events we generate today become the foundation for implementing complex stream processing patterns that power real-time analytics at scale.


Getting Started: Build Your System

Prerequisites

Before starting, make sure you have:

  • Java 17 or higher installed

  • Maven 3.8 or higher

  • Docker and Docker Compose

  • A text editor (VS Code, IntelliJ, or similar)

  • curl for testing APIs

Step 1: Generate Your Project

Run the project generation script:

bash
chmod +x generate_system_files.sh
./generate_system_files.sh
cd log-processing-system

This creates your complete project structure with all source files, configuration, and infrastructure setup.

Step 2: Understanding the Project Structure

Take a moment to explore what was created:

Code
log-processing-system/
├── log-generator/          # Main event generation service
│   ├── src/main/java/     # Java source code
│   └── src/main/resources/ # Configuration files
├── api-gateway/            # Gateway with rate limiting
├── docker-compose.yml      # Infrastructure setup
├── monitoring/             # Prometheus and Grafana config
└── integration-tests/      # Automated tests

Step 3: Start Infrastructure Services

Launch all supporting services (Kafka, Redis, PostgreSQL, Prometheus, Grafana):

bash
./setup.sh

This script will:

  1. Start Docker containers for all infrastructure

  2. Build your Java applications

  3. Create necessary Kafka topics

  4. Launch the log generator and API gateway

Wait about 2-3 minutes for all services to fully start.

Step 4: Verify Installation

Check that all services are running:

bash
# Check API Gateway health
curl http://localhost:8090/actuator/health

# Check Log Generator health
curl http://localhost:8080/actuator/health

You should see healthy status responses from both services.

Step 5: Start Generating Logs

Trigger log generation through the API:

bash
curl -X POST http://localhost:8090/api/v1/generator/start

Your system is now generating events and sending them to Kafka.

Step 6: Monitor Your System

Open your browser and explore the monitoring dashboards:

Grafana Dashboard (admin/admin)

Code
http://localhost:3000

Prometheus Metrics

Code
http://localhost:9090

Zipkin Distributed Tracing

Code
http://localhost:9411

Step 7: Check Generation Status

Monitor the generator's performance:

bash
curl http://localhost:8090/api/v1/generator/status

You'll see output like:

json
{
  "running": true,
  "totalGenerated": 45230,
  "totalRateLimited": 0,
  "currentRate": 980
}

Step 8: Run Integration Tests

Validate your system is working correctly:

bash
./integration-tests/system-test.sh

This automated test will:

  • Verify all services are healthy

  • Start log generation

  • Confirm events are being produced

  • Check metrics are being collected

  • Validate the complete data flow

Step 9: Load Testing

Test your system under stress:

bash
./load-test.sh

This runs a 60-second load test and monitors:

  • Events generated per second

  • Rate limiting behavior

  • System resource usage

  • Circuit breaker states

Step 10: Explore Kafka Messages

View the actual events being generated:

bash
# Connect to Kafka container
docker exec -it kafka bash

# Consume messages from the log-events topic
kafka-console-consumer --bootstrap-server localhost:9092 
  --topic log-events --from-beginning --max-messages 10

You'll see JSON events with correlation IDs, timestamps, and metadata.


Understanding What You Built

Data Flow

Here's what happens when you start generation:

  1. API Gateway receives your start request and checks rate limits in Redis

  2. Log Generator spawns multiple worker threads

  3. Each worker thread generates batches of log events

  4. Rate limiter checks Redis to ensure we don't exceed configured throughput

  5. Circuit breaker monitors Kafka health before sending

  6. Events are batched and sent to Kafka for downstream processing

  7. PostgreSQL stores events for audit trail

  8. Metrics are exported to Prometheus for monitoring

  9. Grafana visualizes system performance in real-time

Key Components Explained

LogGenerationService.java

The heart of event generation. Creates worker threads that continuously generate events at configured rates. Uses adaptive batching to optimize throughput based on downstream latency.

RateLimitingService.java

Implements sliding window rate limiting using Redis sorted sets. Prevents system overload by enforcing configurable event generation limits.

KafkaProducerService.java

Handles Kafka communication with circuit breaker protection. Batches events for efficiency and handles failures gracefully.

Circuit Breakers

Monitor Kafka and Redis health. When failures exceed thresholds, they "open" and fail fast to prevent cascading failures.


Common Issues and Solutions

Issue: Services Won't Start

Problem: Docker containers fail to start or crash immediately.

Solution: Ensure ports 8080, 8090, 9092, 6379, 5432, 9090, 3000, and 9411 are not already in use. Stop any conflicting services.

Issue: No Events Being Generated

Problem: Status shows 0 events generated.

Solution: Check that Kafka is running (docker ps | grep kafka) and that the log-events topic was created successfully.

Issue: Rate Limiting Not Working

Problem: Events exceed configured rate limits.

Solution: Verify Redis is accessible (docker ps | grep redis). Check Redis connection in application.yml.

Issue: Circuit Breaker Always Open

Problem: Circuit breaker stays in OPEN state.

Solution: Check Kafka logs for errors. Increase circuit breaker thresholds in application.yml if Kafka is slower in your environment.


Experiments to Try

Experiment 1: Increase Throughput

Modify the configuration to generate more events:

yaml
# log-generator/src/main/resources/application.yml
app:
  generator:
    rate-per-second: 5000  # Increase from 1000
    threads: 8             # Increase thread count

Rebuild and observe how the system handles higher load.

Experiment 2: Simulate Kafka Failure

bash
# Stop Kafka
docker stop kafka

# Watch the circuit breaker activate
curl http://localhost:8090/api/v1/generator/status

Observe how the system fails gracefully without crashing.

Experiment 3: Test Rate Limiting

Send many requests rapidly:

bash
for i in {1..100}; do
  curl -X POST http://localhost:8090/api/v1/generator/start &
done

Check how many requests were rate-limited in Redis.

Experiment 4: Monitor Resource Usage

Watch system resources while generating events:

bash
docker stats

Observe CPU and memory usage under different load levels.


What You Learned

By completing this lesson, you now understand:

  1. How to implement production-ready rate limiting using sliding window algorithms

  2. Why circuit breakers prevent cascading failures in distributed systems

  3. How event sourcing provides audit trails and replay capabilities

  4. The importance of backpressure and flow control in high-throughput systems

  5. How to build observable systems with metrics and distributed tracing

  6. Real-world patterns used by companies like Netflix, Uber, and Amazon

  7. How to scale event generation horizontally across multiple instances


Cleanup

When you're done exploring:

bash
# Stop all services
./stop.sh

# Remove all data volumes
docker-compose down -v

Looking Ahead: Day 3

Tomorrow we'll build the log collector service that reads from Kafka and processes these events. You'll learn about:

  • Stream processing with windowing operations

  • Real-time aggregations and analytics

  • Event filtering and routing patterns

  • Building the next stage of the distributed pipeline

The foundation you built today powers everything that comes next.