Database Credential Rotation Incident

← Back to ASE Projects

End-to-end incident response for a realistic outage: a database credential rotation occurred in Postgres while the application still used the old secret. The result was 500s on DB-backed routes. I scoped by timestamp, reproduced once, correlated logs, mitigated safely, validated recovery, and wrote a short rotation checklist to prevent repeats.

Stack

Docker • Nginx • Flask • Postgres • Linux

What I Did

  • Captured baseline behavior & timestamp window
  • Rotated the DB password to simulate an outage
  • Correlated 500s on /api/users with FATAL auth in app logs
  • Mitigated by restoring the secret or updating the app secret + restart
  • Validated 200s and a clean log window after recovery
  • Published a DB-secret rotation checklist

Incident Timeline

  • Baseline: routes 200
  • Rotate DB password → 500s on users API
  • Logs show Postgres authentication failures
  • Rollback/secret update → app restart
  • Recovery validated; log window clean

Incident Response Story

1) Baseline & Scope

Confirm all services are healthy and take a quick baseline (/api/users 200). Note the Date header / timestamp window to align evidence in logs and future requests.

Baseline: docker compose ps shows all services up
Baseline: services up before any credential changes.

2) Introduce Change → Reproduce Failure

Rotate the DB password in Postgres while the app still uses the old secret. DB-backed routes flip to 500; capture the failures in the same timestamp window as the credential change.

ALTER USER postgres WITH PASSWORD — credential rotation
Outage introduced: Postgres password is rotated while the app still has the old secret.
App logs: GET /users 500 and FATAL: password authentication failed
Failure window: GET /api/users returns 500 and app logs show FATAL password authentication failures from Postgres.

3) Mitigation

The quickest mitigation is to restore the previous credential so the app and DB match again.

Rollback command to restore credential
Mitigation: restore the original DB password
ALTER ROLE success confirmation after rollback
Rollback confirmed in Postgres: ALTER ROLE completed successfully.

4) Recovery Validation

Re-test /api/users to confirm 200s, and tail logs to ensure the window is clean (no new auth failures). Document the incident and add the rotation checklist so future password changes don't cause surprise outages.

Post-fix log tail: no new authentication failures
Clean window after fix: follow-up requests succeed and there are no new DB auth failures in the logs.

Key Commands Used

Repro & Evidence

# Baseline
curl -i http://localhost:8080/api/users

# Introduce outage (DB rotation only)
docker compose exec db psql -U postgres -d appdb -c "ALTER USER postgres WITH PASSWORD 'WrongNow#1';"

# Failure & logs (aligned by timestamp)
curl -i http://localhost:8080/api/users      # expect 500
docker compose logs --timestamps --tail=50 app | grep -Ei "FATAL|auth|psycopg2"

Mitigation & Validation

# Fast rollback
docker compose exec db psql -U postgres -d appdb -c "ALTER USER postgres WITH PASSWORD 'postgres';"

# OR rotate app secret to the new value, then:
docker compose up -d --build app

# Validate recovery
curl -i http://localhost:8080/api/users      # expect 200
docker compose logs --timestamps --since=2m app

Outcome & Prevention