Some stations on "UK West" are experiencing longer connection times or buffering/reconnections
Incident Report for Broadcast Radio
Postmortem

We' sorry for the inconvenience caused by data centre issues earlier today.

This morning, a data centre engineer received an alarm that one of the hard drives in part of the the UKWest streaming infrastructure was degraded. This had no impact itself, but would require the drive to be replaced to prevent issues happening in the future. In the vast majority of server hardware, including here, hard drives can be “hot swapped”, meaning the hard drive can be replaced without having to shut down the server, or having some other noticeable impact.

The data centre engineer prepared a backup/mirror copy of the affected UKWest server, as a spare, in case any issues arose during the replacement. During this process, this spare server was prepared/configured onto the data centre's network.

However, the engineer mistakenly did not apply the correct firewall rules to the spare server - instead applying a default firewall policy. This policy only allowed traffic on ports 80 and 443.

Given its designation as a spare, as this new server had been configured with the same IP address as the UKWest server it would be replacing if needed, its firewall policy was pushed out to routers and devices across the data centre, which had the immediate and noticeable effect of blocking new connections to customer Icecast ports.

Following the restoration of the correct firewall rules, the data centre team have assured us that they will learn from this incident and adapt their processes to ensure this cannot happen again.

Posted Nov 23, 2023 - 13:23 GMT

Resolved
This incident has been resolved.
Posted Nov 23, 2023 - 10:54 GMT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 23, 2023 - 10:45 GMT
Investigating
We're currently seeing a high number of streaming disconnections and reconnections on our UK West streaming Server.

We've are in direct contact with with the datacentre engineering team itself and they have confirmed they are are working on a networking and hosting connectivity issue with the datacentre that is affecting part of the infrastructure we are in.

Currently we are not expecting this to be a long duration and expect it to be resolved very shortly.
Posted Nov 23, 2023 - 10:35 GMT
This incident affected: Streaming, Web hosting, Apps and smart speaker skills (Streaming Services).