Part 8: Production Best Practices
Everything you need to run event-sourced systems in production.
Table of contents
- Snapshots
- Optimistic Concurrency
- Schema Evolution
- Monitoring
- Disaster Recovery
- Production Checklist
- Conclusion
This is Part 8 of an 8-part series on Event Sourcing and CQRS with Go.
Snapshots
For aggregates with thousands of events, snapshots prevent slow replay:
Events: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
^
Snapshot at v5
Load: Snapshot(v5) + Events [6, 7, 8, 9, 10]
When to Snapshot
// Every N events
if order.Version() % 100 == 0 {
snapshotStore.Save(ctx, Snapshot{
AggregateID: order.AggregateID(),
Version: order.Version(),
State: serialize(order),
})
}
Optimistic Concurrency
Retry with Reload
func ExecuteWithRetry(ctx context.Context, id string, action func(*Order) error) error {
for attempt := 0; attempt < 3; attempt++ {
order := NewOrder(id)
store.LoadAggregate(ctx, order)
if err := action(order); err != nil {
return err // Business error
}
err := store.SaveAggregate(ctx, order)
if err == nil {
return nil
}
if !errors.Is(err, mink.ErrConcurrencyConflict) {
return err
}
// Retry...
}
return ErrMaxRetries
}
Schema Evolution
Strategy: Weak Schema
func (o *Order) ApplyEvent(event interface{}) error {
switch e := event.(type) {
case OrderCreated:
o.CustomerID = e.CustomerID
// Handle optional new fields gracefully
if e.CustomerName != "" {
o.CustomerName = e.CustomerName
}
}
return nil
}
Best Practices
- Never remove fields from events
- Make new fields optional
- Version your events in metadata
- Test with old events
Monitoring
Key Metrics
- Events appended per second
- Command success/failure rate
- Projection lag
- Append latency (p50, p95, p99)
Health Checks
func HealthCheck(engine *mink.ProjectionEngine) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
for _, name := range engine.ProjectionNames() {
status := engine.GetStatus(name)
if status.State == mink.ProjectionStateFaulted {
w.WriteHeader(http.StatusServiceUnavailable)
}
}
}
}
Disaster Recovery
The Event Store is Truth
- Events: Regular PostgreSQL backups
- Snapshots: Can be regenerated
- Read models: Can be rebuilt
Recovery
# Lost read model? Reset checkpoint and rebuild
psql -c "DELETE FROM checkpoints WHERE projection_name = 'OrderList'"
psql -c "TRUNCATE order_list"
# Restart - projection rebuilds automatically
Production Checklist
Before Launch
- PostgreSQL adapter with connection pooling
- All events registered
- Snapshots for high-traffic aggregates
- Checkpoint store configured
- Metrics exposed
- Health check endpoints
- Backup strategy
Monitoring
- Dashboard for command/event rates
- Alerts for projection lag
- Alerts for error rates
- Storage growth trending
Conclusion
Congratulations! You’ve completed this 8-part series on Event Sourcing and CQRS with go-mink.
What You’ve Learned
- Event Sourcing Fundamentals: Storing events instead of state
- Getting Started: Setting up go-mink
- Aggregates: Encapsulating business logic
- Event Store: Streams, versioning, metadata
- CQRS: Separating reads from writes
- Middleware: Cross-cutting concerns
- Projections: Building read models
- Production: Snapshots, monitoring, operations
Key Principles
- Events are facts: Immutable records of what happened
- State is derived: Always rebuildable from events
- Aggregates guard consistency: Single unit of change
- Commands express intent: Validated before execution
- Projections serve queries: Optimized for specific needs
- Observability is essential: Know what your system is doing
Thank you for reading. Happy event sourcing!