PyPI Intermittent Outages
Incident Report for Python Infrastructure
Postmortem

On August 15th, an Audit Logging feature was added to PyPI which records events when Users and Projects are modified. This feature retains a history of events for Projects indefinitely. Due to a misconfiguration in the way that the feature was implemented, event logs for Projects were being loaded from the database whenever a Project was fetched by the PyPI codebase. This additional load steadily increased as the size of the events table grew until it reached a tipping point causing database queries to spill over to disk, dramatically impacting performance leading to a significant outage.

This was resolved by ensuring that the Audit Log events are only loaded from the database when necessary for display to users and administrators. Database load and application response times quickly dropped to levels similar to before the feature was shipped.

Posted Oct 22, 2019 - 13:30 UTC

Resolved
This incident has been resolved.
Posted Oct 22, 2019 - 13:16 UTC
Monitoring
Our attempt to resolve the performance issues has significantly improved response times. We are monitoring to ensure stability.
Posted Oct 22, 2019 - 13:08 UTC
Update
We've identified an issue that may have been degrading PyPI performance and have begun to deploy a change to address it.
Posted Oct 22, 2019 - 12:53 UTC
Investigating
We are currently investigating this issue.
Posted Oct 22, 2019 - 12:22 UTC
This incident affected: PyPI (pypi.org - CDN).