In 2011, I wrote about how small files could consume a lot of space. I meant to do a follow-up on the savings but I forgot about it until now.
In 2.5.7, we started compressing some of the collected data files. Some of these are ridiculously compressable (#664794). Even better, compressing them is sometimes faster than writing them directly to the disk, so in some cases it is a pure win/win. For lintian.d.o, we also see a vast size reduction in overall size of the laboratory.
I have taken a few samples occasionally. The samples were done with du(1):
$ du -csh [--apparent-size] laboratory/*
|N/A – around 20 Mar 2012 (#664794)||16G||13G|
|2.5.6 (Fri Apr 27 2012)||14GB||N/A|
|2.5.6 (Mon Jun 04 2012))||N/A||12G|
|220.127.116.11 (Fri Sep 21 2012)||12G||8.3G|
|2.5.11 (Wed Jan 2 2013)||10G||6.1G|
And the most awesome part of this? The comparison is quite biased against the 2.5.11 entry, which is the only entry to also process experimental (approx. 10% extra packages). Some of the early entries (2.5.6 and “older”) might also have suffered from the “too many links” issue. I only wish I had been better at collecting data points, so I could have made a proper graph of it. 🙂
It sounds almost too good to be true, but if you look at the size of one of the linux-image packages, the space usage dropped from 27M to 15M between 2.5.5 to 2.5.9. Currently it is squeezed down to 14M (tested with head of the git master branch).
 I believe is about 5-10% less binary packages processed for those runs.