Clearly Cloud USA registration issues

Incident Report for Clearly IP

Postmortem

During the morning of Wednesday May 14, we began receiving reports of intermittent issues from Clearly Cloud users in parts of the US with registrations and call completion. The engineering team immediately began investigating and closely monitored the performance of all related systems for the next several hours, identifying the cause of the problem.

A full analysis confirmed that call signaling messages were being jammed up during three timeframes, each approximately ~13 minutes, ~6 minutes, and ~5 minutes in duration. The cause of the problem was a replica (spare) database server unable to keep up with the synchronization from the primary systems, causing widespread delays and backups completing tasks like registration handling.

Once the cause was identified as a non-production server, our team began to take immediate action by disabling its role and disconnecting it from the production systems while evaluating what was behind its performance problem. This prevented additional issues beyond those experienced. Metrics and logs indicated a likely hardware issue. Thankfully, ClearlyIP's recent investments in improving its Central US datacenter operations meant our team could quickly move the replica database to this newer environment.

That migration was completed several hours after the issues began, situating the replica server in the new environment. It was then restored to service late afternoon. No similar Clearly Cloud issues were observed after the replica server was taken out of service, or since it was restored to service. Our teams will continue to proactively monitor the performance of these systems and study additional improvements which can minimize the impact of similar circumstances for the future.

Posted May 15, 2025 - 16:05 CDT

Resolved

This incident has been resolved.

Posted May 14, 2025 - 14:00 CDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted May 14, 2025 - 13:35 CDT

Update

We are continuing to work on a fix for this issue.

Posted May 14, 2025 - 12:44 CDT

Identified

The issue has been identified and a fix is being implemented.

Posted May 14, 2025 - 11:55 CDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted May 14, 2025 - 11:48 CDT

Identified

The issue has been identified and a fix is being implemented.

Posted May 14, 2025 - 11:24 CDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted May 14, 2025 - 11:07 CDT

Investigating

We are seeing intermittent registration issues on our Clearly Cloud US servers.

Posted May 14, 2025 - 10:52 CDT

This incident affected: CloudPBX (Clearly Cloud USA).