April 08, 2013

How the world ended up in Costa Rica

Even though I haven't had much time to dedicate to http.debian.net lately, it has been up and running, or should I say serving?

Part of its job is to detect mirrors that have temporary issues or are entirely gone, down, unavailable. It does so, and many other things, by monitoring the so-called "trace files". A very important one being the "master" (or "origin") trace file.

With the recent integration of backports into the main archive, the master trace file of the backports mirrors also changed. Long story short, this change caused backports mirrors to no longer be considered by the mirror redirector as candidates. As long as they were up to date.

After the usual mirror synchronisation delay, more and more mirrors were disabled and subsets of "up to date" candidates re-calculated. This reached a critical point when only one mirror was left in the database. The mirror had not been synchronised for a couple of weeks.

This mirror is located in Costa Rica, and as the only candidate left in the database it was the only one used to serve requests for the backports archive. No matter where the client was located in the world.

The issue was later noticed and the necessary updates to the mirrors master list made. Mirrors started to be re-considered as they were re-checked (with some delay due to the rate limiter) and the subsets re-calculated. In a few hours everything was back to normality.

Correctness and fault-tolerance don't always get together very well...