Improving bulk performance in debhelper

Since debhelper/10.3, there has been a number of performance related changes.  The vast majority primarily improves bulk performance or only have visible effects at larger “input” sizes.

Most visible cases are:

  • dh + dh_* now scales a lot better for large number of binary packages.  Even more so with parallel builds.
  • Most dh_* tools are now a lot faster when creating many directories or installing files.
  • dh_prep and dh_clean now bulk removals.
  • dh_install can now bulk some installations.  For a concrete corner-case, libssl-doc went from approximately 11 seconds to less than a second.  This optimization is implicitly disabled with –exclude (among other).
  • dh_installman now scales a lot better with many manpages.  Even more so with parallel builds.
  • dh_installman has restored its performance under fakeroot (regression since 10.2.2)

 

For debhelper, this mostly involved:

  • avoiding fork+exec of commands for things doable natively in perl.  Especially, when each fork+exec only process one file or dir.
  • bulking as many files/dirs into the call as possible, where fork+exec is still used.
  • caching / memorizing slow calls (e.g. in parts of pkgfile inside Dh_Lib)
  • adding an internal API for dh to do bulk check for pkgfiles. This is useful for dh when checking if it should optimize out a helper.
  • and, of course, doing things in parallel where trivially possible.

 

How to take advantage of these improvements in tools that use Dh_Lib:

  • If you use install_{file,prog,lib,dir}, then it will come out of the box.  These functions are available in Debian/stable.  On a related note, if you use “doit” to call “install” (or “mkdir”), then please consider migrating to these functions instead.
  • If you need to reset owner+mode (chown 0:0 FILE + chmod MODE FILE), consider using reset_perm_and_owner.  This is also available in Debian/stable.
    • CAVEAT: It is not recursive and YMMV if you do not need the chown call (due to fakeroot).
  • If you have a lot of items to be processed by a external tool, consider using xargs().  Since 10.5.1, it is now possible to insert the items anywhere in the command rather than just in the end.
  • If you need to remove files, consider using the new rm_files function.  It removes files and silently ignores if a file does not exist. It is also available since 10.5.1.
  • If you need to create symlinks, please consider using make_symlink (available in Debian/stable) or make_symlink_raw_target (since 10.5.1).  The former creates policy compliant symlinks (e.g. fixup absolute symlinks that should have been relative).  The latter is closer to a “ln -s” call.
  • If you need to rename a file, please consider using rename_path (since 10.5).  It behaves mostly like “mv -f” but requires dest to be a (non-existing) file.
  • Have a look at whether on_pkgs_in_parallel() / on_items_in_parallel() would be suitable for enabling parallelization in your tool.
    • The emphasis for these functions is on making parallelization easy to add with minimal code changes.  It pre-distributes the items which can lead to unbalanced workloads, where some processes are idle while a few keeps working.

Credits:

I would like to thank the following for reporting performance issues, regressions or/and providing patches.  The list is in no particular order:

  • Helmut Grohne
  • Kurt Roeckx
  • Gianfranco Costamagna
  • Iain Lane
  • Sven Joachim
  • Adrian Bunk
  • Michael Stapelberg

Should I have missed your contribution, please do not hesitate to let me know.

 

Advertisements
This entry was posted in Debhelper, Debian. Bookmark the permalink.

One Response to Improving bulk performance in debhelper

  1. Pingback: “debhelper-compat (= 12)” is now released | nthykier

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.