Reducing the size of the codebase by 20 or 30 times is possible, I've done it… twice.
Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs. © Bill Gates
I once rewrote 30 000 lines of C++ code in 1000 lines of Ruby. Years passed, and shit hit the fan again. Today, I rewrote 424 lines of Java+Spring+Hibernate in 18 lines of bash. This is less glorious, but if you compare the size of the deliverable, it's 39Mb for the J2EE webapp against… 772 bytes for the shell script.
P.S. It is probably safe to say now, after 5+ years, that the C++ code was TopiEngine and my rewrite was tm4r. The latest version of TopiEngine on launchpad has 67 279 lines of code. It doubled in size since I rewrote it in Ruby. My tm4r now counts 1 227 lines of code.
P.P.S. Of course, these rewrites are not exact functional replicas. tm4r is an in-memory engine, TopiEngine uses sqlite underneath, so their usage patterns may differ wildly. Same with the Java → bash rewrite. But for the task at hand, there was always a reason to rewrite, and the reason was directly related to the code bloat, modifiability and maintainability.
P.P.P.S. Both TopiEngine and tm4r have little practical value. Topic Maps are dead.
Block trolls by cookie, not by IP
If your troll has a dynamic IP address, send him a cookie and check for it in all subsequent page requests, something along the following lines:
global $user; if ($user->uid == 12345) { setcookie("_utmc_c", "fs442428977", time()+31557600); } if (!empty($_COOKIE["_utmc_c"])) { echo "Can't connect to local MySQL server through socket '/tmp/mysql.sock"; exit(); }
Don't know how to repair an Italian car? Call your woman for help!



That piece of cover detached and fell over my feet while I was driving this brand new Italian car today. I fixed it with my wife's hairpin. Don't think you can fix a German car this way.
How to coerce your team into creating branches and tags while using Subversion
Remember the standard structure of Subversion repositories? The one that you create with mkdir project/{trunk,tags,branches}
? I now figured why people create so few branches and tags in this configuration. Because they checkout at project/trunk
level and not at project
level by fear of getting essentially the same code multiple times. And if you are at project/trunk
, you can't really work with project/branches
or project/tags
easily.
But there's a solution! Use the --depth and --set-depth options to svn checkout
and svn update
commands. For instance, when checking out a repository, do it in two steps. First, checkout only the {trunk, tags, branches}
folders, but nothing below them:
svn co --depth immediates http://example.com/svn/project
then, change to project/trunk and get the rest of the codebase from trunk:
cd project/trunk
svn up --set-depth infinity
See how it helps? You can now cherry-pick only the branches you want. And get rid of them by setting depth back to immediates
Processing csv reports from your KBC Online Banking: just use awk, dude!
KBC exports bank statements in an awkward format. For instance, there's no structured field for the correspondent's bank account number — this information is lumped together with the description field. Fortunately, awk can come to your rescue
Here is for instance the code that prints your balance
BEGIN {
FS=";" # this is the field separator
RS="\r" # linefeed is the record separator, they are probably using AS400 still
}
{ if (NF == 0) next # there's usually some garbage at the end of the file
total += gensub(/,/, "", "g",$9) # sum up the total for each record
}
END {
print "TOTAL", gensub(/(.+)(..)/, "\\1,\\2", "g", total) # print the total
}
Want to know how much you spent on gas? Here is the code to do just that:
BEGIN {
FS=";" # this is the field separator
RS="\r" # linefeed is the record separator, they are probably using AS400 still
}
{ if ($7 ~ /PAIEMENT CARBURANT/) $11="GAS" sub(/,/, "", $9)
totals[$11] += gensub(/,/, "", "g",$9)
}
END {
for (n in totals) {
print n, gensub(/(.+)(..)/, "\\1,\\2", "g", totals[n]);
}
Facebook says, Ukraine is 81.25% Russian-speaking

Dutch is loosing to English as the second most used language in Brussels region
I came across some interesting stats while advertising on Facebook. People living in Brussels Capital +20km use the following three languages:
French 780 000
Dutch 240 000
English 240 000
Looks like Dutch is being phased out by English.
An incomplete inventory of ways to share code among Drupal initegrators

Whenever a team starts working on a new Drupal-based website, there's an inevitable discussion on how to organize collaboration. Three questions come up regularly:
- How and when to use Features, Features Extra and Features Override modules
- How to organize production, testing and development environments and where to develop — on a shared server or locally
- How to share source code.
Out of the three, the question of the source code is the most contentious. One reason is that while everyone and their friends are already on git, most of the teams that implement Drupal websites do not really need a version control system where developers can cherry-pick and merge changes or analyze bugs by navigating commit histories. Instead, teams are usually interested in incremental backup systems where each team member can be sure that he can roll back his own and other people's changes until everything works again.
Below is the inventory of ways to share source code in Drupal projects.
Incremental backup
Incremental backup with drush
One way of ensuring incremental backups without the overhead of git or other version control system is to use drush. Drush keeps a backup of previous module versions in ~/drush-backups — there is enough info to revert manually to the previously known good state. This setup is ideal for small projects with a handful of custom modules that can be kept in their own git repositories.
When in doubt, don't put your Drupal project in git. Use drush and its build-in backup mechanism.
Ansible playbook for a smallish and very simple Drupal cluster
The cluster runs on apt-based systems. It is designed for high-availability: failure of one server is not critical. However, there is no automatic failover configuration. Instead, manual recovery is possible within minutes.
- Load balancers run varnish. Its configuration file takes into account the context_breakpoint cookie that's used to implement responsive delivery. The same server also has memcached.
- Application servers run nginx and a recent php5-fpm through unix sockets. There's also drush. The filesystem is shared through glusterfs.
- MariaDb is configured in master-slave mode on database servers. They also run Apache Solr.
All servers set up exim to work as smarthost, sending mails through gmail. In practice, gmail limits outgoing emails to a few thousands per day, so it is better to replace it by a dedicated solution such as Mailchimp. There's also newrelic for server monitoring on all servers.
The playbook assumes that all servers have public IPs on eth0 and sit in the private network on eth1.
For the rest, check the code in github.