October 31, 2012

rsync is not enough

Ask how can one create a Debian mirror and you will get a dozen different responses. Those who are used to mirroring content, whether it is distribution packages, software in general, documents, etc., will usually come up with one answer: rsync.

Truth is: for Debian's archive, rsync is not enough.

Due to the design of the archive, a single call to rsync leaves the mirror in an inconsistent state during most of the sync process.
Explanation is simple: index files are stored in dists/, package files are stored in pool/. Sort those directories by name (just like rsync does) and you get that the indexes will be updated before the actual packages are downloaded.

There are multiple scripts out there that do exactly that, one of them in Ubuntu's wiki. Plenty more if you search the web.

Now, addressing that issue shouldn't be so difficult, right? after all, all the index files are in dists/, so syncing in two stages should be enough. It's not that simple.

With the dists/ directory containing over 8.5GBs worth of indexes and, erm, installer files, even a two stages sync will usually leave the mirror in an inconsistent state for a while.

How about only deferring to the second stage the bare minimum?, I hear you ask.
That is the current approach, but it leads to some errors when new index files are added and used. The fact that people insist in writing their own scripts doesn't help.

Hopefully, some ideas like moving the installer stuff out of dists/ and
overhauling the repository layout are being considered. An alternative is to make the users of the mirrors more robust and fault-tolerant, but we would be talking about tenths if not hundreds of tools that would need to be improved.

In all cases, the one script that is actively maintained, is rather portable, and improved from time to time is the ftpsync script. Please, do yourself and your users a favour: don't attempt to reinvent the wheel (and forget about calling rsync just once).

October 22, 2012

Where to get checkbashisms from (community service)

Lately I've been spending some time checking the Debian archive for bashisms in preparation of the release of Debian wheezy. This requires running checkbashisms against every /bin/sh script, checking the results by hand to discard some false positives, and filing bug reports of bashisms.
And of course, fixing and improving checkbashisms; some of that work to be published soon.

It is fun that when one fixes some parsing errors it leads to regressions in the form of false negatives due to other parsing errors... oh well.

However, while looking around the web for references about checkbashisms, I noticed that somebody created a sourceforge project under that same name. It is a fork of an old version of checkbashisms, and hasn't seen an update in over a year. It even appears that a FreeBSD port is based on it.

If you are looking for the latest checkbashisms, please get it either from the latest version of devscripts, or from devscripts' git repository.

October 13, 2012

Debian mirrors map

Ever wondered how the Earth would look like if you added markers for every mirror that is part of Debian's mirrors network?
Debian mirrors map

(the bigger the shadow of the marker, the larger the number of mirrors in that zone)

Mirrors map generated with leaflet, using Openstreetmap.org tiles, and mirrors geolocation using GeoLite data created by MaxMind, available from maxmind.com.