Software Archeology with git

A couple of useful git commands for styding big git repositories:

Rank contributors by contributed lines of code in HEAD

git grep --cached -zIle '' |\
     xargs -0n1 git blame -e |\
    sed  -n 's/^.*(<\([a-zA-Z.]*\)@.*/\1/p' |\
    sort|uniq -c|sort -rn >authors-by-line.txt

sort branches by last modified date in the remote repository

git for-each-ref --sort=-committerdate \
    --format='%(committerdate:short)%(authorname)%(refname:short)' refs/remotes >branches-by-date.txt

How Lenovo, Dell, HP and Fujitsu messed up with their most lucrative clients for 1½ years

It's not a coincidence that notebook docking stations are being sold only with high-end laptops. My Dell e7740 costs over 2000 €. When I bought it in January 2014, I couldn't imagine I'll have trouble making it work under Linux. After all, it was all-intel, already well-supported hardware. The trouble came from a usually dumb piece of hardware: the docking station. I run two 1920x1200 screens in portrait mode, but this freaking docking station"intelligently" merged the two screens into a virtual 3840x1200 screen and presented just that to the notebook.

The bug-o-feature responsible for this is called Displayport Multi-Stream Transport and it was intended to drive multiple displays via one cable using daisy-chaining. When first docking stations with MST support appeared by end 2013, none was prepared for it. It took well over a year before MST support landed in mainline 3.17 kernel. And it will take another year until all major distributions move to 3.17 and past it.

Red Hat itself found it out only when they run into the problem, and it took 6 months before David Airlie came up with a patch to fix this hardware bug. Check out this talk for a good overview of the story.

How to detect bad web design

Whether you order designs from 99designs or from an in-house designer, use this simple rule of thumb:

Your most valuable content should be the most contrasted.

Here's why. Bad designs are wildly different, but all good designs share one thing in common: they put your content first. As I am writing this, I realize that the text I am typing appears in charcoal gray or #4d4f51 while the title is all black. This is inconsistent at best, but not everything is lost for LinkedIn — the list of my last posts on the right is in a lighter gray #96999c and the editor buttons are even lighter than that.

The poster boy of good design, Boston Globe, displays text as black #000000 on all-white background #ffffff while their menu items have varying levels of greyness, from a rather dark #464646 on white to the same dark #464646 on neutral gray #eeeeee. This is good design — the most important thing Boston Globe has is news, and their news enjoy the most contrast you can get: all black text on all white background.

However, many websites have their main content in a rather worn-out pale black, while surrounding elements are flashy and attractive. Check for instance, their content is shown in dark gray #404040 while the navigation is in all-black #00000.

Zen layout demystified

I absolutely love the zen layout for its flexibility. What other layout can handle fixed-width sidebars for banners and a liquid content area? However, figuring out how it works can be challenging. Here's a minimal zen layout, coded along the explanation given here.


Here is its css…


The European Commission repeats the mistake of the '09 Microsoft deal

We've already seen it in 2009. The European Commission could have built an exemplary anti-trust case against Microsoft for forcing manufacturers to bundle its operating system with the computers. Instead, it targeted a smaller issue of tying the web browser to the operating system. Since then, Microsoft lost the browser war on technical grounds, and Internet Explorer became known as a tool to download Google Chrome on a fresh computer. The only real impact of the 2009 Commission's decision on consumers was negative. A sudden drop in the quality of email rendering in Microsoft Outlook 2010 and all the later versions comes from the fact that Microsoft teared off the Internet Explorer's component from Microsoft Office and replaced it with a much older and less capable library developed in the early 80s and initially used to display RTF documents. So, whenever you see bulky fonts and ugly formatting in your Microsoft Outlook, don't blame the sender, blame the European Commission. This old story repeats with Google. They took on an irrelevant issue of Google Shopping search results, instead of picking one of the well-publicized issues of: * abuse of control over the Android ecosystem * Gmail interoperability with small independent mail servers * Gtalk and Google Hangouts interoperability with third-party XMPP servers * Google Single Sign-On interoperability with regards to industry standards …the list can go on and on.

Forget Gmail filters, let Google sort your inbox using Machine Learning algorithms

Among the multitude of programming APIs provided by Google lies a jewel called Prediction API. It has a high-quality classifier that allows for continuous learning with model updates.

Let's quickly use it to automatically sort incoming mail into your existing labels. The most tedious part is configuration:


  1. Create a new Blank Project in Google Apps Script and enable Prediction API in the Resources/Advanced Google services… menu.
  2. Sign up for the Google Developers Console and take the 300$ free credit. Then, create your first Developers Console project and enable Prediction API in its API & auth section.
  3. Switch back to your newly created Google Apps Script and link it with your new Developers Console project through the Resources/Developers Console Project menu.

We are done configuring. Now, there are only two functions to implement: one to train the model and the other to classify incoming mail.


The function GmailApp.getUserLabels() ❶ gets all labels that you defined in Gmail and disregards standard labels such as Inbox, All Mail or Spam. Mails in Gmail are organized by threads, so once you get a handle on a label, you have to get all of its threads ❷, then grab individual mails under that thread. We'll use the first email of a thread for this simple exercise ❸.

Quick notes on running MacOS X, NetBSD and Windows 7 together under KVM on Linux

As a reminder for myself, mostly:

  1. Only Mavericks, Yosemite is not yet working under KVM.
  2. KVM SLiRP networking does not work with MacOS X, bridging is hard to setup for wireless networks so it is better to use NAT versions of qemu-ifup and qemu-ifdown.
  3. It looks so cool on screenshots

Data-mining users in a screenful of code


Select like-minded users from a local community website.


  1. A Drupal website with the votingapi module enabled and at least a few dozen votes by registered users.
  2. A working installation of the R language.

Exract data

For each user, select all other users that voted on same node and comments:

SELECT v1.uid uid1, v2.uid uid2, name1, name2,
  v2.entity_id entity_id, v1.value value1, v2.value value2
FROM votingapi_vote v1
JOIN (votingapi_vote v2, users u1, users u2)
 ON (v1.uid != v2.uid AND v1.entity_id=v2.entity_id
   AND v1.entity_type=v2.entity_type AND v1.uid=u1.uid AND v2.uid=u2.uid)
WHERE v1.uid 

This produces a table