Part 8: Production Best Practices

Everything you need to run event-sourced systems in production.

Table of contents

  1. Snapshots
    1. When to Snapshot
  2. Optimistic Concurrency
    1. Retry with Reload
  3. Schema Evolution
    1. Strategy: Weak Schema
    2. Best Practices
  4. Monitoring
    1. Key Metrics
    2. Health Checks
  5. Disaster Recovery
    1. The Event Store is Truth
    2. Recovery
  6. Production Checklist
    1. Before Launch
    2. Monitoring
  7. Conclusion
    1. What You’ve Learned
    2. Key Principles

This is Part 8 of an 8-part series on Event Sourcing and CQRS with Go.


Snapshots

For aggregates with thousands of events, snapshots prevent slow replay:

Events:     [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
                         ^
                    Snapshot at v5

Load: Snapshot(v5) + Events [6, 7, 8, 9, 10]

When to Snapshot

// Every N events
if order.Version() % 100 == 0 {
    snapshotStore.Save(ctx, Snapshot{
        AggregateID: order.AggregateID(),
        Version:     order.Version(),
        State:       serialize(order),
    })
}

Optimistic Concurrency

Retry with Reload

func ExecuteWithRetry(ctx context.Context, id string, action func(*Order) error) error {
    for attempt := 0; attempt < 3; attempt++ {
        order := NewOrder(id)
        store.LoadAggregate(ctx, order)

        if err := action(order); err != nil {
            return err  // Business error
        }

        err := store.SaveAggregate(ctx, order)
        if err == nil {
            return nil
        }
        if !errors.Is(err, mink.ErrConcurrencyConflict) {
            return err
        }
        // Retry...
    }
    return ErrMaxRetries
}

Schema Evolution

Strategy: Weak Schema

func (o *Order) ApplyEvent(event interface{}) error {
    switch e := event.(type) {
    case OrderCreated:
        o.CustomerID = e.CustomerID
        // Handle optional new fields gracefully
        if e.CustomerName != "" {
            o.CustomerName = e.CustomerName
        }
    }
    return nil
}

Best Practices

  1. Never remove fields from events
  2. Make new fields optional
  3. Version your events in metadata
  4. Test with old events

Monitoring

Key Metrics

  • Events appended per second
  • Command success/failure rate
  • Projection lag
  • Append latency (p50, p95, p99)

Health Checks

func HealthCheck(engine *mink.ProjectionEngine) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        for _, name := range engine.ProjectionNames() {
            status := engine.GetStatus(name)
            if status.State == mink.ProjectionStateFaulted {
                w.WriteHeader(http.StatusServiceUnavailable)
            }
        }
    }
}

Disaster Recovery

The Event Store is Truth

  • Events: Regular PostgreSQL backups
  • Snapshots: Can be regenerated
  • Read models: Can be rebuilt

Recovery

# Lost read model? Reset checkpoint and rebuild
psql -c "DELETE FROM checkpoints WHERE projection_name = 'OrderList'"
psql -c "TRUNCATE order_list"
# Restart - projection rebuilds automatically

Production Checklist

Before Launch

  • PostgreSQL adapter with connection pooling
  • All events registered
  • Snapshots for high-traffic aggregates
  • Checkpoint store configured
  • Metrics exposed
  • Health check endpoints
  • Backup strategy

Monitoring

  • Dashboard for command/event rates
  • Alerts for projection lag
  • Alerts for error rates
  • Storage growth trending

Conclusion

Congratulations! You’ve completed this 8-part series on Event Sourcing and CQRS with go-mink.

What You’ve Learned

  1. Event Sourcing Fundamentals: Storing events instead of state
  2. Getting Started: Setting up go-mink
  3. Aggregates: Encapsulating business logic
  4. Event Store: Streams, versioning, metadata
  5. CQRS: Separating reads from writes
  6. Middleware: Cross-cutting concerns
  7. Projections: Building read models
  8. Production: Snapshots, monitoring, operations

Key Principles

  1. Events are facts: Immutable records of what happened
  2. State is derived: Always rebuildable from events
  3. Aggregates guard consistency: Single unit of change
  4. Commands express intent: Validated before execution
  5. Projections serve queries: Optimized for specific needs
  6. Observability is essential: Know what your system is doing

Thank you for reading. Happy event sourcing!


← Part 7: Projections Back to Blog