A subtle allusion to the f-word in Microsoft's EU coding week banners

Just stumbled upon a fancy banner by Microsoft that advertises its Embrace and Extend from the childhood program.

For the record: the only reason Microsoft supports this "Coding in classroom initiative" is because they want to push their products through kids. It's a problem, but a bigger problem is that Microsoft have long striven to make computing an elite profession by introducing inconsistencies and complexity for the most basic abstractions: a character, a file, a block device... their products are designed to fail pupils who want to understand how computers work. And this design is intentional, because the less people understand computing, the less competition their business has... and higher are the profits.

Thus, taking money from Microsoft to promote coding in the classroom is akin to taking money from Philip Morris to promote healthy lifestyle. Shameful.

All permutations of a string for the teacher's sake

Today my kid brought back from school an assignment to guess words from a bag of letters... it took a mum and a programmer to solve all six. I left one for you, though. Guess what C D E E I M R R stands for.

P.S. it's a classical programming interview question about all permutations of a string. Generate all permutations, grep -f them against /usr/share/dict/french and you'll get the answer.

Reducing the size of the codebase by 20 or 30 times is possible, I've done it… twice.

Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs. © Bill Gates

I once rewrote 30 000 lines of C++ code in 1000 lines of Ruby. Years passed, and shit hit the fan again. Today, I rewrote 424 lines of Java+Spring+Hibernate in 18 lines of bash. This is less glorious, but if you compare the size of the deliverable, it's 39Mb for the J2EE webapp against… 772 bytes for the shell script.

P.S. It is probably safe to say now, after 5+ years, that the C++ code was TopiEngine and my rewrite was tm4r. The latest version of TopiEngine on launchpad has 67 279 lines of code. It doubled in size since I rewrote it in Ruby. My tm4r now counts 1 227 lines of code.

P.P.S. Of course, these rewrites are not exact functional replicas. tm4r is an in-memory engine, TopiEngine uses sqlite underneath, so their usage patterns may differ wildly. Same with the Java → bash rewrite. But for the task at hand, there was always a reason to rewrite, and the reason was directly related to the code bloat, modifiability and maintainability.

P.P.P.S. Both TopiEngine and tm4r have little practical value. Topic Maps are dead.

How to coerce your team into creating branches and tags while using Subversion

Remember the standard structure of Subversion repositories? The one that you create with mkdir project/{trunk,tags,branches}? I now figured why people create so few branches and tags in this configuration. Because they checkout at project/trunk level and not at project level by fear of getting essentially the same code multiple times. And if you are at project/trunk, you can't really work with project/branches or project/tags easily.

But there's a solution! Use the --depth and --set-depth options to svn checkout and svn update commands. For instance, when checking out a repository, do it in two steps. First, checkout only the {trunk, tags, branches} folders, but nothing below them:

svn co --depth immediates http://example.com/svn/project

then, change to project/trunk and get the rest of the codebase from trunk:

cd project/trunk
svn up --set-depth infinity

See how it helps? You can now cherry-pick only the branches you want. And get rid of them by setting depth back to immediates

Processing csv reports from your KBC Online Banking: just use awk, dude!

KBC exports bank statements in an awkward format. For instance, there's no structured field for the correspondent's bank account number — this information is lumped together with the description field. Fortunately, awk can come to your rescue

Here is for instance the code that prints your balance

 FS=";" # this is the field separator
 RS="\r" # linefeed is the record separator, they are probably using AS400 still
{ if (NF == 0) next # there's usually some garbage at the end of the file 

  total += gensub(/,/, "", "g",$9) # sum up the total for each record
  print "TOTAL", gensub(/(.+)(..)/, "\\1,\\2", "g", total) # print the total

Want to know how much you spent on gas? Here is the code to do just that:

 FS=";" # this is the field separator
 RS="\r" # linefeed is the record separator, they are probably using AS400 still
{  if ($7 ~ /PAIEMENT CARBURANT/)  $11="GAS"  sub(/,/, "", $9)
  totals[$11] += gensub(/,/, "", "g",$9)
  for (n in totals) {
  print n, gensub(/(.+)(..)/, "\\1,\\2", "g", totals[n]);

Facebook says, Ukraine is 81.25% Russian-speaking

Try not to make it political or bring in the discussion on civil war. I was just playing around with Facebook stats for advertisers and found that 81.3% of Ukrainians use Facebook in Russian. Anyone who published ads on Facebook can check it themselves in the Ad Manager. BTW, 92.86% of Belarusians use Facebook in Russian. And there's no "Belarusian" among language choices there, so the rest should be covered by English and a fraction of a percent for Polish.