it's not the runners, it's the orchestration service that's the problem
been working to move all our workflows to self hosted, on demand ephemeral runners. was severely delayed to find out how slipshod the Actions Runner Service was, and had to redesign to handle out-of-order or plain missing webhook events. jobs would start running before a workflow_job event would be delivered
we've got it now that we can detect a GitHub Actions outage and let them know by opening a support ticket, before the status page updates
That’s not hard, the status page is updated manually, and they wait for support tickets to confirm an issue before they update the status page. (Users are a far better monitoring service than any automated product.)
Webhook deliveries do suffer sometimes, which sucks, but that’s not the fault of the Actions orchestration.
I'm seeing wonky webhook deliveries for Actions service events, like dropping them completely, while other webhooks work just fine. I struggle to see what else could be responsible for that behaviour. it has to be the case that the Actions service emits events that trigger webhook deliveries & sometimes it messes them up.
been working to move all our workflows to self hosted, on demand ephemeral runners. was severely delayed to find out how slipshod the Actions Runner Service was, and had to redesign to handle out-of-order or plain missing webhook events. jobs would start running before a workflow_job event would be delivered
we've got it now that we can detect a GitHub Actions outage and let them know by opening a support ticket, before the status page updates