監控與告警

本文檔說明花店管理系統的監控配置與告警設定。

監控架構

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Application   │────▶│   Prometheus    │────▶│    Grafana      │
│   (Actuator)    │     │   (收集指標)     │     │   (視覺化)      │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                │
                                ▼
                        ┌─────────────────┐
                        │  AlertManager   │
                        │   (告警通知)     │
                        └─────────────────┘

Spring Boot Actuator

啟用端點

# application.yml
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
      base-path: /actuator

  endpoint:
    health:
      show-details: when_authorized

  metrics:
    export:
      prometheus:
        enabled: true

可用端點

端點	說明
`/actuator/health`	健康狀態
`/actuator/info`	應用程式資訊
`/actuator/metrics`	指標列表
`/actuator/prometheus`	Prometheus 格式指標

健康檢查

內建健康指標

db：資料庫連線狀態
diskSpace：磁碟空間
mail：郵件服務狀態

自訂健康指標

@Component
public class CacheHealthIndicator implements HealthIndicator {

    private final CacheManager cacheManager;

    @Override
    public Health health() {
        if (cacheManager.isHealthy()) {
            return Health.up()
                    .withDetail("cacheSize", cacheManager.size())
                    .build();
        }
        return Health.down()
                .withDetail("error", "Cache unavailable")
                .build();
    }
}

指標收集

關鍵指標

指標	說明
`http_server_requests_seconds`	HTTP 請求延遲
`jvm_memory_used_bytes`	JVM 記憶體使用
`hikaricp_connections_active`	資料庫連線池活躍連線
`cache_gets_total`	快取命中/未命中次數

自訂指標

@Service
public class OrderService {

    private final Counter orderCounter;
    private final Timer orderTimer;

    public OrderService(MeterRegistry registry) {
        this.orderCounter = Counter.builder("orders.created")
                .description("Number of orders created")
                .register(registry);

        this.orderTimer = Timer.builder("orders.processing.time")
                .description("Order processing time")
                .register(registry);
    }

    public Order createOrder(OrderRequest request) {
        return orderTimer.record(() -> {
            Order order = // ... 建立訂單邏輯
            orderCounter.increment();
            return order;
        });
    }
}

Prometheus 配置

# prometheus.yml
scrape_configs:
  - job_name: 'florist-api'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['api.florist.leandev.io:8080']

  - job_name: 'florist-web-host'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['florist.leandev.io:8080']

告警規則

Prometheus AlertManager

# alert.rules.yml
groups:
  - name: florist-alerts
    rules:
      - alert: HighErrorRate
        expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"

      - alert: HighMemoryUsage
        expr: jvm_memory_used_bytes / jvm_memory_max_bytes > 0.9
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "JVM memory usage above 90%"

      - alert: DatabaseConnectionPoolExhausted
        expr: hikaricp_connections_active / hikaricp_connections_max > 0.9
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Database connection pool near exhaustion"

Grafana Dashboard

建議的面板

概覽
- 請求總數
- 錯誤率
- 平均回應時間
JVM
- 記憶體使用
- GC 次數與時間
- 執行緒數
資料庫
- 連線池使用率
- 查詢延遲
業務指標
- 訂單數量
- 活躍使用者

日誌整合

結構化日誌

# application.yml
logging:
  pattern:
    console: "%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n"
  level:
    io.leandev: DEBUG
    org.springframework: INFO

日誌聚合

建議使用 ELK Stack 或 Loki 進行日誌聚合。

下一步

故障排除 - 常見問題解決
生產環境部署 - 生產環境配置

監控架構​

Spring Boot Actuator​

啟用端點​

可用端點​

健康檢查​

內建健康指標​

自訂健康指標​

指標收集​

關鍵指標​

自訂指標​

Prometheus 配置​

告警規則​

Prometheus AlertManager​

Grafana Dashboard​

建議的面板​

日誌整合​

結構化日誌​

日誌聚合​

下一步​

監控架構

Spring Boot Actuator

啟用端點

可用端點

健康檢查

內建健康指標

自訂健康指標

指標收集

關鍵指標

自訂指標

Prometheus 配置

告警規則

Prometheus AlertManager

Grafana Dashboard

建議的面板

日誌整合

結構化日誌

日誌聚合

下一步