PyPI Emails Not Sending
Incident Report for Python Infrastructure
Postmortem

Summary

At 20:36 UTC on 2023-03-17, PyPI’s email provider enforced a suspension against our account. After our remediation efforts and response, our account was restored at 00:36 UTC 2023-03-18.

In response, the PyPI Administrators did the following:

  • Audited our email logs, including delivery and complaint status to determine the underlying factor that had led to our suspension.
  • Rolled out updates to our signup form to mitigate the behavior that was leading to complaint emails.
  • Notified our provider of our remediation efforts
  • Rolled out updates to our retry logic to report exceptions in tasks that were being swallowed by retry logic.

The following was determined in the course of our response:

  • Our monitoring was blind to email failures due to retry logic swallowing the apparently transient errors, so response was delayed by nearly 15 minutes.
  • An account review was initiated 2021-09-04 by our provider due to a marginal complaint rate. This had been considered at the time and PyPI Administrators somewhat ironically began sending more valid notification emails whenever reasonable to bump our reputation. PyPI’s email sending pattern is consistently low volume aside from routine account signup and password resets. After rolling these changes out, it wasn’t clear if “enough” had been done and it eventually fell out of the collective mind of the PyPI administrators.
  • Our China accessible (not reCaptcha as it is blocked there) bot detection honeypot had been letting just enough bad signups through to maintain a 0.6% complaint rate from victims of whatever malfeasance the bots were enabling.

Impact

Actions that triggered emails during the outage window from 2023-03-17T20:36Z - 2023-03-18T00:36Z were delayed in sending . All emails enqueued during that time were delivered by retry before 2023-03-18T01:00Z (up to 4.5 hours delayed).

Timeline

  • 2018-03-08: Issue filed regarding inability to signup for PyPI from China due to reCaptcha being blocked.
  • 2018-03-21: Implementation of a relaxed bot detection in place of reCaptcha merged and deployed.
  • 2021-09-04: Report of account review received from our email provider.
  • 2023-03-17: Account suspension enforced by our email provider.
  • 2023-03-17: reCaptcha re-enabled for our signup form.
  • 2023-03-17: Error reporting for exceptions in failed tasks merged and deployed.

Future Work

  • Investigate bot detection mechanisms that are robust and available to users in China.
  • Research backup mail providers for failover.
  • Send more email to retain reputation.
Posted Mar 18, 2023 - 11:59 UTC

Resolved
Our vendor has reinstated our sending ability after reviewing our remediation. Emails that were enqueued during the outage will send over the next 1-2 hours.
Posted Mar 18, 2023 - 01:00 UTC
Update
We have validated that failed email sends are queuing for delivery when our account suspension is lifted. For the moment we are holding off on mitigating by failing over to another provider. If we do not see resolution in 12 hours we will begin work to add another provider as a failover. Regardless we plan to reassess our resilience against email provider outages in the next work week.
Posted Mar 17, 2023 - 23:12 UTC
Monitoring
We have submitted our response with remediation efforts to the provider and are awaiting their response/resolution.

Detail: PyPI replaced reCaptcha on our signup form with a less robust form of automation defense in 2018 due to reports from users in China not being able to sign up. We have temporarily re-enabled reCaptcha in order to reduce inbox-bombing signups against PyPI. We will readdress ways to improve our defenses in the future, without needlessly denying access to users in China to PyPI.
Posted Mar 17, 2023 - 22:28 UTC
Update
We have identified the contributing factor that led to our complaint rate going above the acceptable threshold of our provider and are working to implement a fix that will satisfy their requirements.
Posted Mar 17, 2023 - 21:47 UTC
Update
We are working through the remediation process requested by our provider.
Posted Mar 17, 2023 - 21:34 UTC
Identified
All emails sent by PyPI are currently not being sent. Our email provider has suspended our account. We are currently determining how to proceed.
Posted Mar 17, 2023 - 21:04 UTC
This incident affected: PyPI (pypi.org - Email).