跳至主要内容

監控與告警

本文檔說明花店管理系統的監控配置與告警設定。

監控架構

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ Application │────▶│ Prometheus │────▶│ Grafana │
│ (Actuator) │ │ (收集指標) │ │ (視覺化) │
└─────────────────┘ └─────────────────┘ └─────────────────┘


┌─────────────────┐
│ AlertManager │
│ (告警通知) │
└─────────────────┘

Spring Boot Actuator

啟用端點

# application.yml
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
base-path: /actuator

endpoint:
health:
show-details: when_authorized

metrics:
export:
prometheus:
enabled: true

可用端點

端點說明
/actuator/health健康狀態
/actuator/info應用程式資訊
/actuator/metrics指標列表
/actuator/prometheusPrometheus 格式指標

健康檢查

內建健康指標

  • db:資料庫連線狀態
  • diskSpace:磁碟空間
  • mail:郵件服務狀態

自訂健康指標

@Component
public class CacheHealthIndicator implements HealthIndicator {

private final CacheManager cacheManager;

@Override
public Health health() {
if (cacheManager.isHealthy()) {
return Health.up()
.withDetail("cacheSize", cacheManager.size())
.build();
}
return Health.down()
.withDetail("error", "Cache unavailable")
.build();
}
}

指標收集

關鍵指標

指標說明
http_server_requests_secondsHTTP 請求延遲
jvm_memory_used_bytesJVM 記憶體使用
hikaricp_connections_active資料庫連線池活躍連線
cache_gets_total快取命中/未命中次數

自訂指標

@Service
public class OrderService {

private final Counter orderCounter;
private final Timer orderTimer;

public OrderService(MeterRegistry registry) {
this.orderCounter = Counter.builder("orders.created")
.description("Number of orders created")
.register(registry);

this.orderTimer = Timer.builder("orders.processing.time")
.description("Order processing time")
.register(registry);
}

public Order createOrder(OrderRequest request) {
return orderTimer.record(() -> {
Order order = // ... 建立訂單邏輯
orderCounter.increment();
return order;
});
}
}

Prometheus 配置

# prometheus.yml
scrape_configs:
- job_name: 'florist-api'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['api.florist.leandev.io:8080']

- job_name: 'florist-web-host'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['florist.leandev.io:8080']

告警規則

Prometheus AlertManager

# alert.rules.yml
groups:
- name: florist-alerts
rules:
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"

- alert: HighMemoryUsage
expr: jvm_memory_used_bytes / jvm_memory_max_bytes > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "JVM memory usage above 90%"

- alert: DatabaseConnectionPoolExhausted
expr: hikaricp_connections_active / hikaricp_connections_max > 0.9
for: 5m
labels:
severity: critical
annotations:
summary: "Database connection pool near exhaustion"

Grafana Dashboard

建議的面板

  1. 概覽

    • 請求總數
    • 錯誤率
    • 平均回應時間
  2. JVM

    • 記憶體使用
    • GC 次數與時間
    • 執行緒數
  3. 資料庫

    • 連線池使用率
    • 查詢延遲
  4. 業務指標

    • 訂單數量
    • 活躍使用者

日誌整合

結構化日誌

# application.yml
logging:
pattern:
console: "%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n"
level:
io.leandev: DEBUG
org.springframework: INFO

日誌聚合

建議使用 ELK Stack 或 Loki 進行日誌聚合。

下一步