Root Cause Analysis (RCA)
Service Disruptions Caused by AWS Outage
26 October 2025
1. Overview
On October 20, 2025, Signiant services experienced disruptions due to a widespread outage impacting Amazon Web Services (AWS) in the US-East-1 region. This outage affected DNS services within AWS, which started cascading failures across other AWS services. These cascading AWS failures caused intermittent failures in several Signiant products, including Media Shuttle and Jet.
During this time, some customers were unable to log in, transfer files, or complete scheduled automations. Our teams worked continuously to restore services, rerouting systems to an unaffected AWS region and supporting customers until full recovery was confirmed.
All services were fully restored once the AWS DNS resolution stabilized, all affected AWS services were back online, and the Signiant system failover was completed.
2. Products Affected and Customer Impact
| Product | Impact |
| Jet | Automated transfers scheduled between 3:00 AM and 4:30 AM failed. Some required manual retries or later system fixes. |
| Media Shuttle | Users could not load the interface, log in, connect to storage endpoints, or transfer files. Scheduled auto delivery jobs between 3:00 AM and 4:30 AM failed. |
| Media Engine | Users could not load the interface, log in, connect to storage endpoints, or transfer files. |
3. What Happened (Root Cause)
The outage was caused by a failure in AWS’s US-East-1 region that impacted DNS resolution, which is a core Internet service used to route traffic. This caused AWS compute, storage, and monitoring services to become unavailable or behave inconsistently.
Because Signiant’s primary systems are hosted in this AWS region, the outage affected:
- Authentication and transfer workflows
- Internal failover automation tools
- Third-party communication systems (e.g., Zoom, Slack, Status Page)
Once it became clear that the outage was not resolving quickly, Signiant initiated a region failover to an unaffected AWS region in order to restore service availability.
4. Timeline of Events
| Time (ET) | Event |
| 2:57 AM | AWS US-East-1 outage begins, impacting DNS services and multiple AWS systems. Signiant services begin experiencing failures. |
| 3:03 AM | First customer reports issues accessing Media Shuttle. |
| 4:20 AM | Initial service restoration efforts begin; most Jet transfers resume. |
| 5:15 AM | Signiant shuts down affected systems in US-East-1 to stabilize performance. |
| 5:34 AM | Services are successfully migrated to US-West-2. Customer updates begin. |
| 7:02 AM | The status page is updated with outage details. |
| 7:36 AM | As per AWS guidance, Engineering recommends customers flush DNS cache if still experiencing issues. |
| 10:33 AM | Issue identified with failed Jet jobs; workaround shared with affected customers. |
| 3:59 PM | Permanent fix deployed for Jet scheduling issues. |
6:00 PM | The majority of Signiant customers’ functionality is confirmed to be restored. |
| 7:40 PM | All services are operating normally. AWS confirms resolution of their outage. |
5. How Customers Were Affected
- Inability to log in or access any component on most of the Signiant platform, including Media Shuttle, Media Engine, and Jet
- Failed or delayed file transfers and automations in Jet and Media Shuttle auto delivery jobs
- Slower communication updates due to impact on internal systems
- Increased need for manual workarounds (e.g., pausing/resuming transfers, DNS cache flushing)
6. How We Resolved the Issue
- Redirected services from AWS US-East-1 to US-West-2
- Restored access to Media Shuttle and Jet workflows as DNS issues stabilized
- Shared workaround instructions and provided direct customer support
- Deployed a permanent fix for Jet jobs impacted by the outage window
- Fully restored service and confirmed platform stability
7. What We’re Doing Going Forward
| Improvement | Description |
More resilient DNS & failover strategy | Improving “live-live” failover processes so they require fewer manual steps to complete |
More comprehensive health checks and better status reporting | Implementing more thorough health checks to inspect the health of supporting services. |
Review status and error handling | Investigate and evaluate improvements in error handling, status reporting, and communication tools. |
Faster customer communications | Enhancing customer notifications during third-party outages |
Updated response processes | Creating new cross-team procedures to streamline incident response and customer updates |
8. Our Commitment to You
We understand how critical our services are to your operations, and we sincerely apologize for the impact this outage had on your teams. While the root cause was outside of Signiant’s control, it is our responsibility to ensure resilience, rapid recovery, and clear communication.
We are taking concrete steps to strengthen our infrastructure and improve our response in the future.
Please reach out to your Signiant account representative or support team if you have any questions or would like to discuss this further.