Monitoring
Monitoring & Logging Guide
Overview
The Nom Database implements comprehensive monitoring and structured logging using zerolog and custom metrics collection. This guide covers how to use and interpret the monitoring features.
Structured Logging
Log Format
The application uses zerolog for high-performance structured logging with the following features:
- Structured Fields: All log entries include structured key-value pairs
- Request Correlation: Unique request IDs for tracing requests
- Performance Metrics: Request duration, response size, status codes
- Color-Coded Console: Easy-to-read colored output in development
- JSON Output: Structured JSON logs for production (set
LOG_FORMAT=json)
Log Levels
- DEBUG: Detailed information for debugging (enabled with
DEBUG=true) - INFO: General informational messages
- WARN: Warning messages for non-critical issues
- ERROR: Error messages for failures
- FATAL: Critical errors that cause application exit
Log Output Examples
Console Format (Development)
2025-12-30 20:13:44 INF HTTP request completed bytes=1124 duration=4.333917 ip=192.168.65.1:55864 method=GET path=/api/categories request_id=62d30079-6774-46f6-b623-83680479d9a7 status=200JSON Format (Production)
{
"level": "info",
"time": "2025-12-30T20:13:44Z",
"message": "HTTP request completed",
"bytes": 1124,
"duration": 4.333917,
"ip": "192.168.65.1:55864",
"method": "GET",
"path": "/api/categories",
"request_id": "62d30079-6774-46f6-b623-83680479d9a7",
"status": 200
}Structured Log Fields
Each HTTP request log includes:
| Field | Description | Example |
|---|---|---|
request_id | Unique UUID for request tracing | 62d30079-6774-46f6-b623-83680479d9a7 |
method | HTTP method | GET |
path | Request path | /api/categories |
ip | Client IP address | 192.168.65.1:55864 |
duration | Request processing time | 4.333917 (milliseconds) |
status | HTTP status code | 200 |
bytes | Response size in bytes | 1124 |
Configuring Logging
Development Mode
Enable debug logging:
docker compose up -d
# or
DEBUG=true go run ./cmd/serverProduction Mode
Use JSON format for log aggregation:
LOG_FORMAT=json go run ./cmd/serverOr in docker-compose.yml:
environment:
LOG_FORMAT: json
DEBUG: falseRequest Correlation IDs
Every request is assigned a unique UUID that appears in:
- Response Headers:
X-Request-IDheader - Log Entries:
request_idfield - Error Messages: Contextual error logging
Using Request IDs
From Client
# Server generates ID automatically
curl http://localhost:8080/api/restaurants -v
# Or provide your own
curl -H "X-Request-ID: my-custom-id" http://localhost:8080/api/restaurantsTracing Requests
# Filter logs by request ID
docker compose logs backend | grep "62d30079-6774-46f6-b623-83680479d9a7"Metrics Collection
Real-Time Metrics
Access current metrics at: http://localhost:8080/api/metrics
Metrics Endpoint
curl http://localhost:8080/api/metricsResponse:
{
"total_requests": 1234,
"total_errors": 5,
"requests_by_method": {
"GET": 800,
"POST": 300,
"PUT": 100,
"DELETE": 34
},
"requests_by_path": {
"/api/restaurants": 450,
"/api/categories": 200,
"/api/ratings": 150
},
"requests_by_status": {
"200": 1100,
"201": 50,
"400": 20,
"404": 50,
"500": 14
},
"avg_response_time": "2.5ms",
"p50_response_time": "1.2ms",
"p95_response_time": "8.5ms",
"p99_response_time": "15.3ms",
"uptime": "2h15m30s"
}Metrics Explained
| Metric | Description |
|---|---|
total_requests | Total number of HTTP requests processed |
total_errors | Number of 5xx server errors |
requests_by_method | Request count by HTTP method |
requests_by_path | Request count by API path |
requests_by_status | Request count by HTTP status code |
avg_response_time | Average request duration |
p50_response_time | 50th percentile (median) response time |
p95_response_time | 95th percentile response time |
p99_response_time | 99th percentile response time |
uptime | Time since metrics collection started |
Periodic Metrics Logging
Metrics are automatically logged every 5 minutes with structured fields:
INFO 📊 Metrics Summary total_requests=1234 total_errors=5 avg_response_time=2.5ms ...Performance Monitoring
Response Time Percentiles
- p50 (Median): Half of requests are faster than this
- p95: 95% of requests are faster than this (target for SLA)
- p99: 99% of requests are faster than this (outlier detection)
Interpreting Metrics
Healthy Application
{
"avg_response_time": "2.5ms",
"p50_response_time": "1.2ms",
"p95_response_time": "8.5ms",
"p99_response_time": "15.3ms"
}Performance Issues
{
"avg_response_time": "150ms",
"p50_response_time": "80ms",
"p95_response_time": "500ms",
"p99_response_time": "2s"
}Action: Investigate database queries, add caching, optimize handlers
High Error Rate
{
"total_requests": 1000,
"total_errors": 150,
"requests_by_status": {
"500": 150
}
}Action: Check logs for error details, investigate failing endpoints
Viewing Logs
Docker Compose
# View all logs
docker compose logs backend
# Follow logs in real-time
docker compose logs -f backend
# Last 100 lines
docker compose logs --tail=100 backend
# Filter by log level
docker compose logs backend | grep "ERR\|FATAL"
# Filter by request ID
docker compose logs backend | grep "request_id=abc123"
# Filter by endpoint
docker compose logs backend | grep "/api/restaurants"Log Analysis
Find slow requests (>100ms)
docker compose logs backend | grep "duration=" | awk -F'duration=' '{print $2}' | sort -nCount requests by status code
docker compose logs backend | grep "status=" | awk -F'status=' '{print $2}' | cut -d' ' -f1 | sort | uniq -cFind errors
docker compose logs backend | grep "ERR"Production Recommendations
Log Aggregation
For production, use a log aggregation service:
- JSON Format: Set
LOG_FORMAT=json - Shipping: Use Fluentd, Filebeat, or CloudWatch agent
- Storage: Send to Elasticsearch, Splunk, or CloudWatch Logs
- Analysis: Use Kibana, Splunk, or CloudWatch Insights
Example: CloudWatch Logs
# docker-compose.yml
services:
backend:
logging:
driver: awslogs
options:
awslogs-region: us-east-1
awslogs-group: nomdb-backend
awslogs-stream: backendExample: Elasticsearch + Filebeat
# filebeat.yml
filebeat.inputs:
- type: container
paths:
- '/var/lib/docker/containers/*/*.log'
processors:
- add_docker_metadata: ~
- decode_json_fields:
fields: ["message"]
target: ""
output.elasticsearch:
hosts: ["elasticsearch:9200"]Alerting
Set up alerts based on metrics:
- High Error Rate:
total_errors / total_requests > 0.05(5% error rate) - Slow Response:
p95_response_time > 500ms - High Request Count:
requests_by_path["/api/endpoint"] > 1000/min(potential DoS)
Monitoring Dashboards
Create dashboards with:
- Request rate over time
- Error rate trends
- Response time percentiles (p50, p95, p99)
- Top endpoints by request count
- Top endpoints by errors
- Geographic distribution of requests
Debugging with Logs
Example: Tracing a Slow Request
- Make a request and capture the request ID from headers:
curl -v http://localhost:8080/api/restaurants | grep X-Request-ID
# X-Request-ID: abc123-def456-789- Find all logs for that request:
docker compose logs backend | grep "abc123-def456-789"- Analyze the logs:
- Check duration
- Look for database queries
- Identify bottlenecks
Example: Debugging an Error
- Find recent errors:
docker compose logs backend | grep "ERR" | tail -10- Look for stack traces and context
- Check request_id to trace the full request lifecycle
- Investigate related requests with similar patterns
Environment Variables
| Variable | Default | Description |
|---|---|---|
DEBUG | false | Enable debug logging |
LOG_FORMAT | console | Log format: console or json |
PORT | 8080 | Server port |
Best Practices
- Always include request_id in error reports
- Monitor p95/p99 response times, not just averages
- Set up alerts for error rates and slow responses
- Use JSON format in production for log aggregation
- Keep logs for at least 30 days for debugging
- Analyze metrics regularly to identify trends
- Correlate logs with metrics for full observability
Troubleshooting
No Logs Appearing
Check if debug mode is enabled:
docker compose logs backend | grep "Debug mode"Metrics Reset to Zero
Metrics are reset when the application restarts. For persistent metrics, integrate with Prometheus or similar.
High Memory Usage from Logs
Logs are written to stdout/stderr. If memory is an issue:
- Use log rotation
- Ship logs to external service
- Limit log retention
Next Steps
For full production monitoring, consider:
- Prometheus Integration - Time-series metrics
- Grafana Dashboards - Visual monitoring
- Alert Manager - Automated alerting
- Distributed Tracing - OpenTelemetry/Jaeger
- Error Tracking - Sentry integration