Performance tuning of lintian, take 3

About 7 months ago, I wrote about we had improved Lintian’s performance. In 2.5.41, we are doing another memory reduction, where we primarily reduce the memory consumption of data about ELF binaries.  Like previously, memory reductions follows the “less is more” pattern.

My initial test subject was linux-image-4.4.0-trunk-rt-686-pae_4.4-1~exp1_i386.deb. It had a somewhat unique property that the ELF data made up a little over half the cache.

  • We do away with a lot of unnecessary default values [f4c57bb, 470875f]
    • That removed about ~3MB (out of 10.56MB) of that ELF data cache
  • Discard section information we do not use [3fd98d9]
    • This reduced the ELF data cache to 2MB (down from the 7MB).
  • Then we stop caching the output of file(1) twice [7c2bee4]
    • While a fairly modest reduction (only 0.80MB out of 16MB total), it also affects packages without ELF binaries.

At this point, we had reduced the total memory usage from 18.35MB to 8.92MB (the ELF data going from 10.56MB to 1.98MB)[1]. At this point, I figured that I was happy with the improvement and discarded my test subject.

While impressive, the test subject was unsurprisingly a special case.  The improvement in “regular” packages[2] (with ELF binaries) were closer to 8% in total.  Not being satisfied with that, I pulled one more trick.

  • Keep only “UND” and “.text” symbols [2b21621]
    • This brought coreutils (just the lone deb) another 10% memory reduction in total.

In the grand total, coreutils 8.24-1 amd64 went from 4.09MB to 3.48MB.  The ELF data cache went from 3.38MB to 2.84MB.  Similarly, libreoffice/4.2.5-1 (including its ~170 binaries) has also seen a 8.5% reduction in total cache size[3] and is now down to 260.48MB (from 284.83MB).

 

[1] If you are wondering why I in 3fd98d9 wrote “The total “cache” memory usage is approaching 1/3 of the original for that package”, then you are not alone.  I am not sure myself any more, but it seems obviously wrong.

[2] FTR: The sample size of “regular packages” is 2 in this case.  Of which one of them being coreutils…

[3] Admittedly, since “take 2” and not since 2.5.40.2 like the rest.

Posted in Debian, Lintian | 1 Comment

Lintian 2.5.40 – now with less output

You have probably tried to run lintian (-EIL +pedantic) on your packages only to watch lintian drown your terminal.  If you have, you would certainly not be the first.

A concrete example with lintian 2.5.40.2:

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb | wc -l
85

Notably, at least 45 of these appeared in 2.5.40 (the hardening-no-bindnow tag):

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb \
  --tags hardening-no-bindnow | wc -l
45

In a single release, we have over doubled the number of tags in the given package.  I very much doubt this is the first time such a thing happened. Therefore, we have implemented a “per package” tag filter in 2.5.40.

The filter is applied automatically when stdout is a tty and restricts lintian to emitting no more than 3 concrete instances of a given tag per package.  If a fourth tag would have been emitted, the filter replaces it with a “how to see all instances” message and suppresses further instances in that package.

Accordingly, lintian “only” emits 25 lines (instead of 85) for the example package.  It looks something like this:

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb 
I: 389-ds-base: spelling-error-in-binary usr/bin/dbscan-bin conents contents
X: 389-ds-base: hardening-no-bindnow usr/bin/dbscan-bin
X: 389-ds-base: hardening-no-bindnow usr/bin/dsktune-bin
X: 389-ds-base: hardening-no-bindnow usr/bin/infadd-bin
X: 389-ds-base: hardening-no-bindnow ... use --no-tag-display-limit to see all (or pipe to a file/program)
I: 389-ds-base: spelling-error-in-binary usr/lib/x86_64-linux-gnu/dirsrv/libns-dshttpd.so.0.0.0 occured occurred
[...]

With this very simple filter in place, the entire lintian output for that single binary now fits on my screen.  I am pretty sure the filter could do with additional smarts, but I believe it is a good start.

 

Posted in Debian, Lintian | 3 Comments

Tor enabled MTA

As I posted earlier, I have migrated to use tor on my machine.  Though I had a couple of unsolved issues back then.  One of them being my Mail Transport Agent (MTA) did not support tor.

A regular user might not have a lot of use for a MTA on their laptop.  However, it is needed for a lot of Debian development scripts (bts, mass-bug, nmudiff), if they are to file/manipulate bugs for you.

I have some requirements for my MTA

  • tor support (or at least “torsocks”-able)
  • support end-to-end encryption with my provider (STARTTLS)
  • verify that it is talking to my provider.
  • rewrite my “From” if it is not correct (otherwise the mail will just be rejected)
  • support the auth mechanisms of my provider
  • it should be simple to configure

I also have some non-requirements:

  • Local mail delivery is not required
  • The MTA will not be used as a general mail relay.
    • One target relay
    • No relaying from other hosts
  • Mail delivery queue is nice to have but not a strict requirement.

Originally, I used postfix, which supported most of these requirements.  Except:

  • My attempt to make it use tor failed.  The best suggestion I found was to divert its smtp handler and then replace it with a torsocks call to the original handler.  Sadly, it just seg. faulted.
  • While postfix is almost certainly able to verify it is talking with my provider, I never got it configured to do that.  In the end, postfix was to complicated for what I was ready to put up with.

 

Per suggestion of Jakub Wilk, I tried msmtp, which turned out do what I wanted.

  • There is a trivial config file example to start with.  I did not need to read any manuals or extended documentation to figure out what they were doing.
  • You probably also want to specify tls_priorities (assuming msmtp is linked against gnutls)
    • A code dive suggests it defaults to “NORMAL:-VERS-SSL3.0″ if not set.  It is probably not too bad, but could be better. :)
    • From a quick look at the gnutls manual “PFS:%PROFILE_<name>” seems like decent value (requires gnutls >= 3.2.4 and that your provider has decent/modern SSL setup).
    • You probably want to have a look at the values for the %PROFILE_<name> before deciding on one.
  • The msmtp program supports connecting through SOCKS proxies and even has a sample config snippet for using it with tor.
    • Of course, by the time I had discovered that I had already been using “torsocks /usr/sbin/sendmail” a couple of times.  :)

The only feature I will probably miss is having a local queue, which can be rate limited.  But all in all, I am quite happy with it so far. :)

Posted in Debian | 2 Comments

“dput change-all-of-debian.changes”

Lucas Nussbaum recently did a blog post called “Debian is still changing“.  I found it a very welcome continuation of his previous blog post on the same topic.  I find the graphs very interesting and was very happy to learn that he included relative graphs this time.

Now I can with relatively ease say that 69% of all Debian packages are using a dh-style build (source).  We have another 15% using classic debhelper, which means that at least 84% of all packages uses debhelper directly.  Assuming all CDBS based packages rely on the “debhelper class”, we are at 99%!  The latter is certainly an assumption, although I suspect it is probably pretty accurate[1].

 

Now, it is very cute to have “world dominance”, but that is not my primary interest in these numbers.  Instead, we can use these numbers to determine that:

  • We can deploy changes to up to 99% of all source packages via existing debhelper tools
  • We can deploy changes to up to 84% of all sources packages via debhelper + CDBS if it requires a new debhelper tool.

Such as automatic dbgsym packages, indexable build-id from dbg(sym) packages via Packages files[2], and replacing maintscripts with ldconfig triggers. All of these changes happen to be changes that could be trivially deployed with very little risk and very high efficiency[3].  Notably, none of them required a compat bump (or a new debhelper tool).

Of course, I do not intend to say that every change can (or should) be deployed via debhelper and much less into an existing “dh_cmd”-tool.  Notably, dh_strip is reaching its breaking point for content.  And if we were to require a compat bump for your change, we can now at least see the adoption rate via lintian. :)

Nevertheless, it is nice to know that (politics aside) there is some agility in the Debian build system! :)

 

[1] I would very much love to see numbers to (dis)prove my assumption about CDBS + debhelper.  In fact, an absolute number of packages not using debhelper (indirectly) in Debian would be very intriguing.

[2] New fields by default end up the Packages file.  See e.g. the Packages.xz file on the debug mirror or your apt-cache via:

apt-cache show mscgen-dbgsym | grep ^Build-Ids

The latter assumes that you have the debug mirror in your sources list.

[3] Efficiency here being features people rarely override/disable.

Posted in Debhelper, Debian | Leave a comment

Debian, please plan for Stretch

In the 4th quarter of 2016, we will freeze Debian Stretch.  If you are hoping to do any larger changes for Stretch, please consider starting on them now.  This also includes features that need to be in APT/dpkg (etc.) in Stretch, so we can start using them for Buster.

Even something as “trivial” as the automatic dbgsym packages took over 8 months to “complete” (from the prototype was announced in April).  I call it “trivial” because:

  • The specs were simple and were fairly easy to implement
    • Not to mention, the basic idea was already implemented before in e.g. Ubuntu (albeit differently).
  • The chosen implementation only had 3 primary tools affected that truly blocked deploying dbgsym packages in Debian.
    • dak
    • debhelper
    • dpkg
  • I have yet to hear anyone being against the idea itself.
    • There were some concerns about various implementation details.  Fortunately almost all of them had trivial or “obvious” solutions.
  • We could deploy dbgsym packages immediately once the primary tools had been patched in/for unstable.
    • Compared to Multi-Arch, Build-Profiles etc., where we had to wait till the next release before using the feature.
    • It also meant we could immediately test that the feature worked as intended (rather than discovering bugs post release).

NB: There were certainly other parties involved.  But these were the most important ones.

Mind you, the dbgsym saga is not complete yet.  We are still lacking support for migrating dbgsym packages to testing (and, by extension, the next stable release as well).  Meanwhile, you can pull the dbgsym packages from snapshot.debian.org.

 

In summary: If you want a larger change to land in Debian Stretch, please start already now. :)

Posted in Debian, Release-Team | 3 Comments

There is nothing like (missing) iptables (rules) to make you use tor

I have been fiddling with setting up both iptables and tor on my local machine.  Most of it was fairly easy to do, once I dedicated the time to actually do it. Configuring both “at the same time” also made things easier for me, but YMMV.  Regardless, it did take quite a while researching, tweaking and testing – most of that time was spent on the iptables front for me.

I ended up doing this incrementally.  The major 5 steps I went through were:

  1. Created a basic incoming (INPUT) firewall – enforcing
  2. Installed tor + torsocks and aliased a few commands to run with torsocks
  3. Created a basic outgoing (OUTPUT) firewall – permissive
  4. Make the outgoing firewall enforcing
  5. Migrate the majority of programs and services to use tor.

Some of these overlapped time-wise and I certainly revisited the configuration a couple of times.  A couple of things, that I learned:

  • You probably want to have a look at “netstat --listen -put --numeric” when you write your INPUT firewall.
  • The tor developers have tried a lot to make things easy.  It is scary how often “torsocks program [args]” just works(tm).
    • That said, it does not always work.
  • Tor and iptables (OUTPUT) can have a synergy effect on each other.
    • Notably, when it is easier to just “torsocks” a program than adding the necessary iptables rules.
  • Writing iptables rules become a lot easier once:
    • You learn how to iptables’s LOG rule
    • You use sensible-editor + iptables-restore or something like puppet’s firewall module
Posted in Debian | Tagged , | 2 Comments

With 3 months of automatic decrufting in unstable

After 3 months of installing an automatic decrufter in DAK, it:

  • has removed 689 cruft items from unstable and experimental
    • average removal rate being just shy of 230 cruft items/month
  • has become the “top 11th remover”.
  • is expected to become top 10 in 6 days from now and top 9 in 10 days.
    • This is assuming a continued average removal rate of 7.6 cruft items per day

On a related note, the FTP masters have removed 28861 items between 2001 and now.  The average being 2061 items a year (not accounting for the current year still being open). Though, intriguingly, in 2013 and 2014 the FTP masters removed 3394 and 3342 items.  With the (albeit limited) stats from the auto-decrufter, we can estimate that about 2700 of those were cruft items.

One could certainly also check the removal messages and check for the common “tags” used in cruft removals.  I leave that as an exercise to the curious readers, who are not satisfied with my estimate. :)

Posted in Debian, Release-Team | 1 Comment