Daniel Lyons' Notes

What to monitor in a Database

Monitoring checklist (dashboard 1):

  1. TPS and (optional but also desired) QPS
  2. Latency (query duration) — at least average. Better: histogram, percentiles
  3. Connections (sessions) — stacked graph of session counts by state (first of all: active and idle-in-transaction; also interesting: idle, others) and how far the sum is from max_connection (+pool size for PgBouncer).
  4. Longest transactions (max transaction age or top-n transactions by age), excluding autovacuum activity
  5. Commits vs rollbacks — how many transactions are rolled back
  6. Transactions left till transaction ID wraparound
  7. Replication lags / bytes in replication slot / unused replication slots
  8. Count of WALs waiting to be archived (archiving lag)
  9. WAL generation rates
  10. Locks and deadlocks
  11. Basic query analysis graph (top-n by total_time or by mean_time?)
  12. Basic wait event analysis (a.k.a. “active session analysis” or “performance insights”)

Sources

What to monitor in a Database
Interactive graph
On this page
Sources