Root Cause Analysis (RCA)

Service Disruptions Caused by AWS Outage

26 October 2025

1. Overview

On October 20, 2025, Signiant services experienced disruptions due to a widespread outage impacting Amazon Web Services (AWS) in the US-East-1 region. This outage affected DNS services within AWS, which started cascading failures across other AWS services. These cascading AWS failures caused intermittent failures in several Signiant products, including Media Shuttle and Jet.

During this time, some customers were unable to log in, transfer files, or complete scheduled automations. Our teams worked continuously to restore services, rerouting systems to an unaffected AWS region and supporting customers until full recovery was confirmed.

All services were fully restored once the AWS DNS resolution stabilized, all affected AWS services were back online, and the Signiant system failover was completed.

2. Products Affected and Customer Impact

Product	Impact
Jet	Automated transfers scheduled between 3:00 AM and 4:30 AM failed. Some required manual retries or later system fixes.
Media Shuttle	Users could not load the interface, log in, connect to storage endpoints, or transfer files. Scheduled auto delivery jobs between 3:00 AM and 4:30 AM failed.
Media Engine	Users could not load the interface, log in, connect to storage endpoints, or transfer files.

3. What Happened (Root Cause)

The outage was caused by a failure in AWS’s US-East-1 region that impacted DNS resolution, which is a core Internet service used to route traffic. This caused AWS compute, storage, and monitoring services to become unavailable or behave inconsistently.

Because Signiant’s primary systems are hosted in this AWS region, the outage affected:

Authentication and transfer workflows
Internal failover automation tools
Third-party communication systems (e.g., Zoom, Slack, Status Page)

Once it became clear that the outage was not resolving quickly, Signiant initiated a region failover to an unaffected AWS region in order to restore service availability.

4. Timeline of Events

Time (ET)	Event
2:57 AM	AWS US-East-1 outage begins, impacting DNS services and multiple AWS systems. Signiant services begin experiencing failures.
3:03 AM	First customer reports issues accessing Media Shuttle.
4:20 AM	Initial service restoration efforts begin; most Jet transfers resume.
5:15 AM	Signiant shuts down affected systems in US-East-1 to stabilize performance.
5:34 AM	Services are successfully migrated to US-West-2. Customer updates begin.
7:02 AM	The status page is updated with outage details.
7:36 AM	As per AWS guidance, Engineering recommends customers flush DNS cache if still experiencing issues.
10:33 AM	Issue identified with failed Jet jobs; workaround shared with affected customers.
3:59 PM	Permanent fix deployed for Jet scheduling issues.
6:00 PM	The majority of Signiant customers’ functionality is confirmed to be restored.
7:40 PM	All services are operating normally. AWS confirms resolution of their outage.

5. How Customers Were Affected

Inability to log in or access any component on most of the Signiant platform, including Media Shuttle, Media Engine, and Jet
Failed or delayed file transfers and automations in Jet and Media Shuttle auto delivery jobs
Slower communication updates due to impact on internal systems
Increased need for manual workarounds (e.g., pausing/resuming transfers, DNS cache flushing)

6. How We Resolved the Issue

Redirected services from AWS US-East-1 to US-West-2
Restored access to Media Shuttle and Jet workflows as DNS issues stabilized
Shared workaround instructions and provided direct customer support
Deployed a permanent fix for Jet jobs impacted by the outage window
Fully restored service and confirmed platform stability

7. What We’re Doing Going Forward

Improvement

Description

More resilient DNS & failover strategy

Improving “live-live” failover processes so they require fewer manual steps to complete

More comprehensive health checks and better status reporting

Implementing more thorough health checks to inspect the health of supporting services.

Review status and error handling

Investigate and evaluate improvements in error handling, status reporting, and communication tools.

Faster customer communications

Enhancing customer notifications during third-party outages

Updated response processes

Creating new cross-team procedures to streamline incident response and customer updates

8. Our Commitment to You

We understand how critical our services are to your operations, and we sincerely apologize for the impact this outage had on your teams. While the root cause was outside of Signiant’s control, it is our responsibility to ensure resilience, rapid recovery, and clear communication.

We are taking concrete steps to strengthen our infrastructure and improve our response in the future.

Please reach out to your Signiant account representative or support team if you have any questions or would like to discuss this further.