PyPI Intermittent Outages

Incident Report for Python Infrastructure

Postmortem

On August 15th, an Audit Logging feature was added to PyPI which records events when Users and Projects are modified. This feature retains a history of events for Projects indefinitely. Due to a misconfiguration in the way that the feature was implemented, event logs for Projects were being loaded from the database whenever a Project was fetched by the PyPI codebase. This additional load steadily increased as the size of the events table grew until it reached a tipping point causing database queries to spill over to disk, dramatically impacting performance leading to a significant outage.

This was resolved by ensuring that the Audit Log events are only loaded from the database when necessary for display to users and administrators. Database load and application response times quickly dropped to levels similar to before the feature was shipped.

Posted 5 years ago. Oct 22, 2019 - 13:30 UTC

Resolved

This incident has been resolved.
Posted 5 years ago. Oct 22, 2019 - 13:16 UTC

Monitoring

Our attempt to resolve the performance issues has significantly improved response times. We are monitoring to ensure stability.
Posted 5 years ago. Oct 22, 2019 - 13:08 UTC

Update

We've identified an issue that may have been degrading PyPI performance and have begun to deploy a change to address it.
Posted 5 years ago. Oct 22, 2019 - 12:53 UTC

Investigating

We are currently investigating this issue.
Posted 5 years ago. Oct 22, 2019 - 12:22 UTC
This incident affected: PyPI (pypi.org - CDN).