PyPI Outage
Incident Report for Python Infrastructure
Resolved
Our mitigation is deployed and the search cluster is fully healthy. We'll publish an Incident Report shortly with details on the incident and steps taken to avoid a similar cascading failure in the future.
Posted Jul 23, 2018 - 19:01 UTC
Monitoring
The mitigation is being deployed, and the search cluster has completed it's maintenance. We're monitoring the stability of the service until those complete.
Posted Jul 23, 2018 - 18:51 UTC
Update
We've identified a change in our search client library that introduced retry behavior that is causing the search issues to cascade. A fix has been identified and is being deployed that should bring PyPI's main interface back up until the search cluster completes it's maintenance.
Posted Jul 23, 2018 - 18:45 UTC
Update
The search cluster restart inadvertently caused all the incoming search requests to timeout, overloading our backend processes and causing a wider outage. Uploads, the Simple Index, Package file hosting, and JSON APIs are still operational, but pypi.org's main UI is currently unavailable until the search cluster recovers.
Posted Jul 23, 2018 - 18:21 UTC
Update
PyPI search is down again as the search cluster restarts.
Posted Jul 23, 2018 - 18:13 UTC
Identified
The search cluster node that failed was automatically removed from the cluster and search is operational. We're working to restore the failed node and bring the cluster fully back online.
Posted Jul 23, 2018 - 18:08 UTC
Investigating
PyPI's search infrastructure is currently unavailable. We are investigating.
Posted Jul 23, 2018 - 18:04 UTC
This incident affected: PyPI (pypi.org - CDN).