A scheduled data center facility network maintenance event occurred during the evening of 10/29, which caused a variety of anticipated monitoring alarms during its completion. The data center facility provided expected updates on the maintenance activity and no issues were immediately noted following its conclusion.
At approximately 8AM ET on 10/30, a customer reported that trunk registrations to the ClearlyIP US-East trunking servers were failing, while US-Central trunks were operating normally. The issue was escalated by support to engineering, as it indicated potential issues with inbound calling mapped to the US-East region and would also affect customers who hadn't configured redundancy with ClearlyIP US-Central trunking. After analysis, some database connectivity issues within the affected data center were identified as likely to be responsible.
The engineering team conducted a full review of the affected equipment and began work to restore access to the unresponsive database servers. This required a careful review of replication and synchronization states before steps to re-activate each of the affected servers were taken. By approximately 8:45 AM ET, the team had determined data integrity was intact and started operations to restore synchronization between the US-East database notes. At approximately 9 AM ET, US-East trunking servers were reopened to customer traffic.
Systems monitoring and review occurred immediately following US-East reactivation, and support teams were advised to monitor customer equipment for proper re-registration with US-East servers. During final analysis, several process and monitoring improvements were identified for implementation which should help avoid similar monitoring ambiguity (and potential disruption) during service provider maintenance events.