[PyPI] Hosting Provider Reboots
Scheduled Maintenance Report for Python Infrastructure
Postmortem

Hosting Provider Reboots

The infrastructure which runs PyPI was scheduled by our hosting provider for rolling reboots of all hosts beginning on 2014-09-28 at 1100 UTC and ending at 2014-09-29 at 1101 UTC.

Ultimately, due to the lack of specific timing from our provider the PSF Infrastructure team chose to follow the lowest risk path and take down the active backends ahead of these rolling reboots in order to ensure a trouble free return to 100% availability.

The PyPI Infrastructure is currently comprised of:

  • 2x GlusterFS nodes which store and serve packages in a an active-active pair.
  • 2x PostgreSQL Servers in a primary-hot-standby cluster.
  • 2x Redis servers for download counts and caching.
  • 3x Python Web servers
  • 3x Geo-Distributed mirrors which offer the full /simple, /packages, and /serversig trees.
  • Various administrative/backup servers.
  • A stellar Global CDN.

The PSF Infrastructure team has tooling in place to easily switch from the active web backends which provide 100% of PyPI's functionality to the static mirror network, behind our CDN.

In static mode, all package installers using the /simple index are fully supported. This service is considered the #1 priority of PyPI.

Ultimately three factors contributed to our choice in going static for the lengthy scheduled maintenance window:

  • Possible unforeseen complexities that could have arisen with an unmanaged reboot in the GlusterFS and PostgreSQL clusters, however unlikely.
  • Static mirror network meets the #1 Priority of the Python Package Index, without needing active backends.
  • The lead on the PyPI Infrastructure had been managing similar maintenance windows with other providers, for other services and would have been approaching 28 hours awake at the start of PyPI's scheduled maintenance.

We felt that leaving PyPI vulnerable to intermittent or difficult to recover from outages due to rolling reboots would have been regrettable. Instead, our choice was to go static.

Posted Sep 28, 2014 - 21:48 UTC

Completed
PyPI is 100% online and available! Thanks for your patience and understanding.
Posted Sep 28, 2014 - 17:17 UTC
Verifying
The maintenance for the class of instances we host PyPI on has finished and we are verifying the state of our clusters. Should be online soon!
Posted Sep 28, 2014 - 17:12 UTC
Update
PyPI is now in full static mode ahead of the reboots.

We will monitor the status of our hosting provider's reboot progress and update as necessary.
Posted Sep 28, 2014 - 07:58 UTC
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Sep 28, 2014 - 07:00 UTC
Scheduled
Our hosting provider will be performing a rolling reboot of the datacenter in which Python Package Index is hosted beginning at 2014-09-28 11:00 UTC.

In order to explicitly ensure the safety of the distributed systems within the PyPI infrastructure, PyPI will enter a read only static mode at 2014-09-28 07:00 UTC.

Package installs via `pip` and downloads will still function during the maintenance.

During the maintenance window all uploaded packages will remain available via the `/simple` and `/packages` endpoints. The web UI and XMLRPC interfaces will be unavailable. No package registration, upload, or search functionality will be available.

Once we have verified that reboots are complete, we will bring PyPI back into full service. Estimated return to service is 2014-09-28 19:00 UTC.

In preparation, we have built out a new global triad of internal mirrors which are closely monitored for consistency and freshness before going into maintenance mode.
Posted Sep 27, 2014 - 20:59 UTC