Partial rewrite of lintian’s reporting setup

I had the mixed pleasure of doing a partial rewrite of lintian’s reporting framework.  It started as a problem with generating the graphs, which turned out to be “not enough memory”. On the plus side, I am actually quite pleased with the end result.  I managed to “scope-creep” myself quite a bit and I ended up getting rid of a lot of old issues.

The major changes in summary:

  • A lot of logic was moved out of harness, meaning it is now closer to becoming a simple “dumb” task scheduler.  With the logic being moved out in separate processes, harness now hogs vastly less memory that I cannot convince perl to release to the OS.  On lilburn.debian.org “vastly less” is on the order of reducing “700ish MB” to “32 MB”.
  • All important metadata was moved into the “harness state-cache”, which is a simple YAML file. This means that “Lintian laboratory” is no longer a data store. This change causes a lot of very positive side effects.
  • With all metadata now stored in a single file, we can now do atomic updates of the data store. That said, this change itself does not enable us to run multiple lintian’s in parallel.
  • As the lintian laboratory is no longer a data store, we can now do our processing in “throw away laboratories” like the regular lintian user does.  As the permanent laboratory is the primary source of failure, this removes an entire class of possible problems.

There are also some nice minor “features”:

  • Packages can now be “up to date” in the generated reports.  Previously, they would always be listed as “out of date” even if they were up to date.  This is the only end user/website-visitor visible change in all of this (besides the graphs are now working again \o/).
  • The size of the harness work list is no longer based on the number of changes to the archive.
  • The size of the harness work list can now be changed with a command line option and is no longer hard coded to 1024.  However, the “time limit” remains hard coded for now.
  • The “full run” (and “clean run”) now simply marks everything “out-of-date” and processes its (new) backlog over the next (many) harness runs.  Accordingly, a full-run no longer causes lintian to run 5-7 days on lilburn.d.o before getting an update to the website.  Instead we now get incremental updates.
  • The “harness.log” now features status updates from lintian as they happen with “processed X successfully” or “error processing Y” plus a little wall time benchmark.  With this little feature I filed no less than 3 bugs against lintian – 2 of which are fixed in git.  The last remains unfixed but can only be triggered in Debian stable.
  • It is now possible with throw-away labs to terminate the lintian part of a reporting run early with minimal lost processing.  Since the lintian-harness is regular fed status updates from lintian, we can now mark successfully completed entries as done even if lintian does not complete its work list.  Caveat: There may be minor inaccuracies in the generated report for the particular package lintian was processing when it was interrupted.  This will fix itself when the package is reprocessed again.
  • It is now vastly easier to collect new meta data to be used in the reports.  Previously, they had to be included in the laboratory and extracted from there.  Now, we just have to fit it into a YAML file.  In fact, I have been considering to add the “wall time” and make a “top X slowest” page.
  • It is now possible to generate the html pages with only a “state-cache” and the “lintian.log” file.  Previously, it also required a populated lintian laboratory.

As you can probably tell, I am quite pleased with the end result.  The reporting framework lacks behind in development, since it just “sits there and takes care of itself”.  Also with the complete lack of testing, it also suffers from the “if it is not broken, then do not fix it” paradigm (because we will not notice if we broke until it is too late).

Of course, I managed to break the setup a couple of times in the process.  However, a bonus feature of the reporting setup is that if you break it, it simply leaves an outdated report on the website.

Anyway, enjoy. 🙂

This entry was posted in Debian, Lintian. Bookmark the permalink.

1 Response to Partial rewrite of lintian’s reporting setup

  1. Pingback: Unseen changes to lintian.d.o | nthykier

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.