Performance tuning of lintian, take 3

About 7 months ago, I wrote about we had improved Lintian’s performance. In 2.5.41, we are doing another memory reduction, where we primarily reduce the memory consumption of data about ELF binaries.  Like previously, memory reductions follows the “less is more” pattern.

My initial test subject was linux-image-4.4.0-trunk-rt-686-pae_4.4-1~exp1_i386.deb. It had a somewhat unique property that the ELF data made up a little over half the cache.

  • We do away with a lot of unnecessary default values [f4c57bb, 470875f]
    • That removed about ~3MB (out of 10.56MB) of that ELF data cache
  • Discard section information we do not use [3fd98d9]
    • This reduced the ELF data cache to 2MB (down from the 7MB).
  • Then we stop caching the output of file(1) twice [7c2bee4]
    • While a fairly modest reduction (only 0.80MB out of 16MB total), it also affects packages without ELF binaries.

At this point, we had reduced the total memory usage from 18.35MB to 8.92MB (the ELF data going from 10.56MB to 1.98MB)[1]. At this point, I figured that I was happy with the improvement and discarded my test subject.

While impressive, the test subject was unsurprisingly a special case.  The improvement in “regular” packages[2] (with ELF binaries) were closer to 8% in total.  Not being satisfied with that, I pulled one more trick.

  • Keep only “UND” and “.text” symbols [2b21621]
    • This brought coreutils (just the lone deb) another 10% memory reduction in total.

In the grand total, coreutils 8.24-1 amd64 went from 4.09MB to 3.48MB.  The ELF data cache went from 3.38MB to 2.84MB.  Similarly, libreoffice/4.2.5-1 (including its ~170 binaries) has also seen a 8.5% reduction in total cache size[3] and is now down to 260.48MB (from 284.83MB).

 

[1] If you are wondering why I in 3fd98d9 wrote “The total “cache” memory usage is approaching 1/3 of the original for that package”, then you are not alone.  I am not sure myself any more, but it seems obviously wrong.

[2] FTR: The sample size of “regular packages” is 2 in this case.  Of which one of them being coreutils…

[3] Admittedly, since “take 2” and not since 2.5.40.2 like the rest.

Posted in Debian, Lintian | 1 Comment

Lintian 2.5.40 – now with less output

You have probably tried to run lintian (-EIL +pedantic) on your packages only to watch lintian drown your terminal.  If you have, you would certainly not be the first.

A concrete example with lintian 2.5.40.2:

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb | wc -l
85

Notably, at least 45 of these appeared in 2.5.40 (the hardening-no-bindnow tag):

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb \
  --tags hardening-no-bindnow | wc -l
45

In a single release, we have over doubled the number of tags in the given package.  I very much doubt this is the first time such a thing happened. Therefore, we have implemented a “per package” tag filter in 2.5.40.

The filter is applied automatically when stdout is a tty and restricts lintian to emitting no more than 3 concrete instances of a given tag per package.  If a fourth tag would have been emitted, the filter replaces it with a “how to see all instances” message and suppresses further instances in that package.

Accordingly, lintian “only” emits 25 lines (instead of 85) for the example package.  It looks something like this:

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb 
I: 389-ds-base: spelling-error-in-binary usr/bin/dbscan-bin conents contents
X: 389-ds-base: hardening-no-bindnow usr/bin/dbscan-bin
X: 389-ds-base: hardening-no-bindnow usr/bin/dsktune-bin
X: 389-ds-base: hardening-no-bindnow usr/bin/infadd-bin
X: 389-ds-base: hardening-no-bindnow ... use --no-tag-display-limit to see all (or pipe to a file/program)
I: 389-ds-base: spelling-error-in-binary usr/lib/x86_64-linux-gnu/dirsrv/libns-dshttpd.so.0.0.0 occured occurred
[...]

With this very simple filter in place, the entire lintian output for that single binary now fits on my screen.  I am pretty sure the filter could do with additional smarts, but I believe it is a good start.

 

Posted in Debian, Lintian | 3 Comments

Tor enabled MTA

As I posted earlier, I have migrated to use tor on my machine.  Though I had a couple of unsolved issues back then.  One of them being my Mail Transport Agent (MTA) did not support tor.

A regular user might not have a lot of use for a MTA on their laptop.  However, it is needed for a lot of Debian development scripts (bts, mass-bug, nmudiff), if they are to file/manipulate bugs for you.

I have some requirements for my MTA

  • tor support (or at least “torsocks”-able)
  • support end-to-end encryption with my provider (STARTTLS)
  • verify that it is talking to my provider.
  • rewrite my “From” if it is not correct (otherwise the mail will just be rejected)
  • support the auth mechanisms of my provider
  • it should be simple to configure

I also have some non-requirements:

  • Local mail delivery is not required
  • The MTA will not be used as a general mail relay.
    • One target relay
    • No relaying from other hosts
  • Mail delivery queue is nice to have but not a strict requirement.

Originally, I used postfix, which supported most of these requirements.  Except:

  • My attempt to make it use tor failed.  The best suggestion I found was to divert its smtp handler and then replace it with a torsocks call to the original handler.  Sadly, it just seg. faulted.
  • While postfix is almost certainly able to verify it is talking with my provider, I never got it configured to do that.  In the end, postfix was to complicated for what I was ready to put up with.

 

Per suggestion of Jakub Wilk, I tried msmtp, which turned out do what I wanted.

  • There is a trivial config file example to start with.  I did not need to read any manuals or extended documentation to figure out what they were doing.
  • You probably also want to specify tls_priorities (assuming msmtp is linked against gnutls)
    • A code dive suggests it defaults to “NORMAL:-VERS-SSL3.0″ if not set.  It is probably not too bad, but could be better. 🙂
    • From a quick look at the gnutls manual “PFS:%PROFILE_<name>” seems like decent value (requires gnutls >= 3.2.4 and that your provider has decent/modern SSL setup).
    • You probably want to have a look at the values for the %PROFILE_<name> before deciding on one.
  • The msmtp program supports connecting through SOCKS proxies and even has a sample config snippet for using it with tor.
    • Of course, by the time I had discovered that I had already been using “torsocks /usr/sbin/sendmail” a couple of times.  🙂

The only feature I will probably miss is having a local queue, which can be rate limited.  But all in all, I am quite happy with it so far. 🙂

Posted in Debian | 2 Comments

“dput change-all-of-debian.changes”

Lucas Nussbaum recently did a blog post called “Debian is still changing“.  I found it a very welcome continuation of his previous blog post on the same topic.  I find the graphs very interesting and was very happy to learn that he included relative graphs this time.

Now I can with relatively ease say that 69% of all Debian packages are using a dh-style build (source).  We have another 15% using classic debhelper, which means that at least 84% of all packages uses debhelper directly.  Assuming all CDBS based packages rely on the “debhelper class”, we are at 99%!  The latter is certainly an assumption, although I suspect it is probably pretty accurate[1].

 

Now, it is very cute to have “world dominance”, but that is not my primary interest in these numbers.  Instead, we can use these numbers to determine that:

  • We can deploy changes to up to 99% of all source packages via existing debhelper tools
  • We can deploy changes to up to 84% of all sources packages via debhelper + CDBS if it requires a new debhelper tool.

Such as automatic dbgsym packages, indexable build-id from dbg(sym) packages via Packages files[2], and replacing maintscripts with ldconfig triggers. All of these changes happen to be changes that could be trivially deployed with very little risk and very high efficiency[3].  Notably, none of them required a compat bump (or a new debhelper tool).

Of course, I do not intend to say that every change can (or should) be deployed via debhelper and much less into an existing “dh_cmd”-tool.  Notably, dh_strip is reaching its breaking point for content.  And if we were to require a compat bump for your change, we can now at least see the adoption rate via lintian. 🙂

Nevertheless, it is nice to know that (politics aside) there is some agility in the Debian build system! 🙂

 

[1] I would very much love to see numbers to (dis)prove my assumption about CDBS + debhelper.  In fact, an absolute number of packages not using debhelper (indirectly) in Debian would be very intriguing.

[2] New fields by default end up the Packages file.  See e.g. the Packages.xz file on the debug mirror or your apt-cache via:

apt-cache show mscgen-dbgsym | grep ^Build-Ids

The latter assumes that you have the debug mirror in your sources list.

[3] Efficiency here being features people rarely override/disable.

Posted in Debhelper, Debian | Leave a comment

Debian, please plan for Stretch

In the 4th quarter of 2016, we will freeze Debian Stretch.  If you are hoping to do any larger changes for Stretch, please consider starting on them now.  This also includes features that need to be in APT/dpkg (etc.) in Stretch, so we can start using them for Buster.

Even something as “trivial” as the automatic dbgsym packages took over 8 months to “complete” (from the prototype was announced in April).  I call it “trivial” because:

  • The specs were simple and were fairly easy to implement
    • Not to mention, the basic idea was already implemented before in e.g. Ubuntu (albeit differently).
  • The chosen implementation only had 3 primary tools affected that truly blocked deploying dbgsym packages in Debian.
    • dak
    • debhelper
    • dpkg
  • I have yet to hear anyone being against the idea itself.
    • There were some concerns about various implementation details.  Fortunately almost all of them had trivial or “obvious” solutions.
  • We could deploy dbgsym packages immediately once the primary tools had been patched in/for unstable.
    • Compared to Multi-Arch, Build-Profiles etc., where we had to wait till the next release before using the feature.
    • It also meant we could immediately test that the feature worked as intended (rather than discovering bugs post release).

NB: There were certainly other parties involved.  But these were the most important ones.

Mind you, the dbgsym saga is not complete yet.  We are still lacking support for migrating dbgsym packages to testing (and, by extension, the next stable release as well).  Meanwhile, you can pull the dbgsym packages from snapshot.debian.org.

 

In summary: If you want a larger change to land in Debian Stretch, please start already now. 🙂

Posted in Debian, Release-Team | 3 Comments

There is nothing like (missing) iptables (rules) to make you use tor

I have been fiddling with setting up both iptables and tor on my local machine.  Most of it was fairly easy to do, once I dedicated the time to actually do it. Configuring both “at the same time” also made things easier for me, but YMMV.  Regardless, it did take quite a while researching, tweaking and testing – most of that time was spent on the iptables front for me.

I ended up doing this incrementally.  The major 5 steps I went through were:

  1. Created a basic incoming (INPUT) firewall – enforcing
  2. Installed tor + torsocks and aliased a few commands to run with torsocks
  3. Created a basic outgoing (OUTPUT) firewall – permissive
  4. Make the outgoing firewall enforcing
  5. Migrate the majority of programs and services to use tor.

Some of these overlapped time-wise and I certainly revisited the configuration a couple of times.  A couple of things, that I learned:

  • You probably want to have a look at “netstat --listen -put --numeric” when you write your INPUT firewall.
  • The tor developers have tried a lot to make things easy.  It is scary how often “torsocks program [args]” just works(tm).
    • That said, it does not always work.
  • Tor and iptables (OUTPUT) can have a synergy effect on each other.
    • Notably, when it is easier to just “torsocks” a program than adding the necessary iptables rules.
  • Writing iptables rules become a lot easier once:
    • You learn how to iptables’s LOG rule
    • You use sensible-editor + iptables-restore or something like puppet’s firewall module
Posted in Debian | Tagged , | 2 Comments

With 3 months of automatic decrufting in unstable

After 3 months of installing an automatic decrufter in DAK, it:

  • has removed 689 cruft items from unstable and experimental
    • average removal rate being just shy of 230 cruft items/month
  • has become the “top 11th remover”.
  • is expected to become top 10 in 6 days from now and top 9 in 10 days.
    • This is assuming a continued average removal rate of 7.6 cruft items per day

On a related note, the FTP masters have removed 28861 items between 2001 and now.  The average being 2061 items a year (not accounting for the current year still being open). Though, intriguingly, in 2013 and 2014 the FTP masters removed 3394 and 3342 items.  With the (albeit limited) stats from the auto-decrufter, we can estimate that about 2700 of those were cruft items.

One could certainly also check the removal messages and check for the common “tags” used in cruft removals.  I leave that as an exercise to the curious readers, who are not satisfied with my estimate. 🙂

Posted in Debian, Release-Team | 2 Comments

The gcc-5 transition is coming to testing tonight

Thanks to hard work of Adam, Julien, Jonathan, Matthias, Scott, Simon and many others, the GCC-5/libstdc++ transition has progressed to a state, where we are ready to migrate the bulk of it to testing.

It should be a mostly smooth ride.  However, there will a few packages that are going to be uninstallable in testing for a few days and some packages will be temporarily removed from testing.  If APT is unable to provide you with an upgrade for all of your packages, please try again in a few days.  We apologise for the inconvenience.

Currently, we expect about 36 binary packages to become temporarily uninstallable on amd64 and 34 on i386.  This involves Britney accepting at least 4800 “change items” on testing (some of these are removals).  Many thanks to Julien for providing a proposed set of hints and Adam extending them.

Update: We now got a list of the packages being removed and a list of packages becoming uninstallable.  It will be available on debian-devel within 20 minutes from now.

Posted in Debian | 1 Comment

I accidentally dak

So, yesterday, I “unbroke” dak – twice even! It is of course slightly less awesome that one of the broken parts was in code written by yours truly.  Anyhow:

 

Unbreaking the dak auto-decrufter

You may remember the auto-decrufter, which I added to dak.  As a safety measure, it bails out when in doubt about which removal breaks what package.  Turns out it was often in doubt, because the code had a bug.  Of course, nothing that could not be solved with a patch.

Thanks to Ansgar for merging this. 🙂

Unbreaking dak generate-releases

As a part of migrating apt-file to use APTs new acquire system (from experimental), I learned APT really likes having checksums for everything.  Now including checksums for both the compressed file and the uncompressed file.

Sadly, dak had optimised out the uncompressed checksums for Contents files, but even after removing that optimisation (and Ganneff unbreaking my dinstall breakage) some Contents files still did not have an checksum for the uncompressed Contents file.  After some sophisticated debugging (read: “printf-debugging”), I finally discovered the issue and submitted a patch.

Thanks to Ansgar and Ganneff for merging (and fixing my dinstall breakage).

 

Posted in Debian | 1 Comment

Performance tuning of lintian, take 2

The other day, I wrote about our recent performance tuning in lintian.  Among other things, we reduced the memory usage by ~33%.  The effect was also reproducible on libreoffice (4.2.5-1 plus its 170-ish binaries, arch amd64), which started at ~515 MB and was reduced to ~342 MB.  So this is pretty great in its own right…

But at this point, I have seen what was in “Pandora’s box”. By which, I mean the two magical numbers 1.7kB per file and 2.2kB per directory in the package (add +250-300 bytes per entry in binary packages).  This is before even looking at data from file(1), readelf, etc.  Just the raw index of the package.

Depending on your point of view, 1.7-2.2kB might not sound like a lot.  But for the lintian source with ~1 500 directories and ~3 300 non-directories, this sums up to about 6.57MB out of the (then) usage at 12.53MB.  With the recent changes, it dropped to about 1.05kB for files and 1.5kB for dirs.  But even then, the index is still 4.92MB (out of 8.48MB).

This begs the question, what do you get for 1.05kB in perl? The following is a dump of the fields and their size in perl for a given entry:

lintian/vendors/ubuntu/main/data/changes-file/known-dists: 1077.00 B
  _path_info: 24.00 B
  date: 44.00 B
  group: 42.00 B
  name: 123.00 B
  owner: 42.00 B
  parent_dir: 24.00 B
  size: 42.00 B
  time: 42.00 B
  (overhead): 694.00 B

With time, date, owner and group being fixed sized strings (at most 15 characters).  The size and _path_info fields being integers, parent_dir a reference (nulled).  Finally, the name being a variable length string.  Summed the values take less than half of the total object size.  The remainder of ~700 bytes is just “overhead”.

Time for another clean up:

  • The ownership fields are usually always “root/root” (0/0).  So let’s just omit them when they satisfy said assumption. [f627ef8]
    • This is especially true for source packages where lintian ignores the actual value and just uses “root/root”.
  • The Lintian::Path API has always had a “cop-out” on the size field for non-files and it happens to be 0 for these.  Let’s omit the field if the value was zero and save 0.17MB on lintian. [5cd2c2b]
    • Bonus: Turns out we can save 18 bytes per non-zero “size” by insisting on the value being an int.
  • Unsurprisingly, the date and time fields can trivially be merged into one.  In fact, that makes “time” redundant as nothing outside Lintian::Path used its value.  So say goodbye to “time” and good day to 0.36MB more memory. [f1a7826]

Which leaves us now with:

lintian/vendors/ubuntu/main/data/changes-file/known-dists: 698.00 B
  _path_info: 24.00 B
  date_time: 56.00 B
  name: 123.00 B
  parent_dir: 24.00 B
  size: 24.00 B
  (overhead): 447.00 B

Still a ~64% overhead, but at least we reduced the total size by 380 bytes (585 bytes for entries in binary packages).  With these changes, the memory used for the lintian source index is now down to 3.62MB.  This brings the total usage down to 7.01MB, which is a reduction to 56% of the original usage (a.k.a. “the-almost-but-not-quite-50%-reduction”).

But at least the results also carried over to libreoffice, which is now down to 284.83 MB (55% of original).  The chromium-browser (source-only, version 32.0.1700.123-2) is down to 111.22MB from 179.44MB (61% of original, better results expected if processed with binaries).

 

In closing, Lintian 2.5.34 will use slightly less memory than 2.5.33.

 

Posted in Debian, Lintian | 2 Comments