監控與告警
本文檔說明花店管理系統的監控配置與告警設定。
監控架構
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Application │────▶│ Prometheus │────▶│ Grafana │
│ (Actuator) │ │ (收集指標) │ │ (視覺化) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ AlertManager │
│ (告警通知) │
└─────────────────┘
Spring Boot Actuator
啟用端點
# application.yml
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
base-path: /actuator
endpoint:
health:
show-details: when_authorized
metrics:
export:
prometheus:
enabled: true
可用端點
| 端點 | 說明 |
|---|---|
/actuator/health | 健康狀態 |
/actuator/info | 應用程式資訊 |
/actuator/metrics | 指標列表 |
/actuator/prometheus | Prometheus 格式指標 |
健康檢查
內建健康指標
- db:資料庫連線狀態
- diskSpace:磁碟空間
- mail:郵件服務狀態
自訂健康指標
@Component
public class CacheHealthIndicator implements HealthIndicator {
private final CacheManager cacheManager;
@Override
public Health health() {
if (cacheManager.isHealthy()) {
return Health.up()
.withDetail("cacheSize", cacheManager.size())
.build();
}
return Health.down()
.withDetail("error", "Cache unavailable")
.build();
}
}
指標收集
關鍵指標
| 指標 | 說明 |
|---|---|
http_server_requests_seconds | HTTP 請求延遲 |
jvm_memory_used_bytes | JVM 記憶體使用 |
hikaricp_connections_active | 資料庫連線池活躍連線 |
cache_gets_total | 快取命中/未命中次數 |
自訂指標
@Service
public class OrderService {
private final Counter orderCounter;
private final Timer orderTimer;
public OrderService(MeterRegistry registry) {
this.orderCounter = Counter.builder("orders.created")
.description("Number of orders created")
.register(registry);
this.orderTimer = Timer.builder("orders.processing.time")
.description("Order processing time")
.register(registry);
}
public Order createOrder(OrderRequest request) {
return orderTimer.record(() -> {
Order order = // ... 建立訂單邏輯
orderCounter.increment();
return order;
});
}
}
Prometheus 配置
# prometheus.yml
scrape_configs:
- job_name: 'florist-api'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['api.florist.leandev.io:8080']
- job_name: 'florist-web-host'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['florist.leandev.io:8080']
告警規則
Prometheus AlertManager
# alert.rules.yml
groups:
- name: florist-alerts
rules:
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
- alert: HighMemoryUsage
expr: jvm_memory_used_bytes / jvm_memory_max_bytes > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "JVM memory usage above 90%"
- alert: DatabaseConnectionPoolExhausted
expr: hikaricp_connections_active / hikaricp_connections_max > 0.9
for: 5m
labels:
severity: critical
annotations:
summary: "Database connection pool near exhaustion"
Grafana Dashboard
建議的面板
-
概覽
- 請求總數
- 錯誤率
- 平均回應時間
-
JVM
- 記憶體使用
- GC 次數與時間
- 執行緒數
-
資料庫
- 連線池使用率
- 查詢延遲
-
業務指標
- 訂單數量
- 活躍使用者
日誌整合
結構化日誌
# application.yml
logging:
pattern:
console: "%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n"
level:
io.leandev: DEBUG
org.springframework: INFO
日誌聚合
建議使用 ELK Stack 或 Loki 進行日誌聚合。