software programming

A programmer's joke

A CS student shows his lab assignment to the professor. The code works and even produces the correct output, but the professor mutters that the code is not OK:

— You have to choose variable names wisely. Names like i, j, foo and bar make your code unreadable. By next time, please use long, mnemonic variable names.

A few weeks later, the same student completes a new assignment and brings it to the professor. This time, his code is full of long_mnemonic_variable_i, long_mnemonic_variable_j, long_mneminic_variable_foo and long_mnemonic_variable_bar.

Reducing the size of the codebase by 20 or 30 times is possible, I've done it… twice.

Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs. © Bill Gates

I once rewrote 30 000 lines of C++ code in 1000 lines of Ruby. Years passed, and shit hit the fan again. Today, I rewrote 424 lines of Java+Spring+Hibernate in 18 lines of bash. This is less glorious, but if you compare the size of the deliverable, it's 39Mb for the J2EE webapp against… 772 bytes for the shell script.

P.S. It is probably safe to say now, after 5+ years, that the C++ code was TopiEngine and my rewrite was tm4r. The latest version of TopiEngine on launchpad has 67 279 lines of code. It doubled in size since I rewrote it in Ruby. My tm4r now counts 1 227 lines of code.

P.P.S. Of course, these rewrites are not exact functional replicas. tm4r is an in-memory engine, TopiEngine uses sqlite underneath, so their usage patterns may differ wildly. Same with the Java → bash rewrite. But for the task at hand, there was always a reason to rewrite, and the reason was directly related to the code bloat, modifiability and maintainability.

P.P.P.S. Both TopiEngine and tm4r have little practical value. Topic Maps are dead.

How to coerce your team into creating branches and tags while using Subversion

Remember the standard structure of Subversion repositories? The one that you create with mkdir project/{trunk,tags,branches}? I now figured why people create so few branches and tags in this configuration. Because they checkout at project/trunk level and not at project level by fear of getting essentially the same code multiple times. And if you are at project/trunk, you can't really work with project/branches or project/tags easily.

But there's a solution! Use the --depth and --set-depth options to svn checkout and svn update commands. For instance, when checking out a repository, do it in two steps. First, checkout only the {trunk, tags, branches} folders, but nothing below them:

svn co --depth immediates http://example.com/svn/project

then, change to project/trunk and get the rest of the codebase from trunk:

cd project/trunk
svn up --set-depth infinity

See how it helps? You can now cherry-pick only the branches you want. And get rid of them by setting depth back to immediates

An example of project management workflow

Here's an example of software project management worfklow that I use daily.

Each project is split into two uneven parts:

  • Definition of project scope and objectives
  • Project execution and follow-up.

The main difference is that the former is document-based while the later uses a bug issue tracker

For definition of project scope and objectives, we edit a series of documents:

  • Personas — a list of different kinds of people that may be using the product, e.g. readers, clients, journalists, etc
  • User stories — a description of how these personas would use the product
  • Functional requirements — a non-technical list of features induced from user stories
  • Technical requirements — a list of technical features, induced from functional requirements
  • Budget estimation — based on technical requirements and team skills.

There's also an optional Traceability matrix — a many-to-many mapping between functional requirements and technical requirements.

For project execution and follow-up, we use an issue tracker Assembla to define:

  • Milestones — e.g. "Freeze data structures"
  • Tickets — for individual tasks, such as "Update the logo"
  • Components — to group tickets by origin or subproduct, e.g. "Marketing & Sales requests", "Newsletters"
  • Time — estimate and invested time, related to a ticket or not.