Degraded performance for Uptime Checks
Incident Report for OnlineOrNot
Resolved
On 2025-01-16, between 18:05 UTC and 21:32 UTC, OnlineOrNot had degraded performance in its uptime checker.

During this incident, uptime checks would take longer than usual to complete (often taking several minutes rather than the usual 30 seconds between checks).

The root cause was a loss of connectivity between our Cloudflare Workers service, and the AWS database that stores metadata about uptime checks. There was a replica system ready to run in AWS, however due to a misconfiguration it could not start automatically.

The misconfiguration has been fixed, and OnlineOrNot will run regular (monthly) drills to ensure the fallback system functions as expected.
Posted Jan 16, 2025 - 18:00 CET