drupal

Data-mining users in a screenful of code

Objective

Select like-minded users from a local community website.

Pre-requisites

  1. A Drupal website with the votingapi module enabled and at least a few dozen votes by registered users.
  2. A working installation of the R language.

Exract data

For each user, select all other users that voted on same node and comments:

SELECT v1.uid uid1, v2.uid uid2, u1.name name1, u2.name name2,
  v2.entity_id entity_id, v1.value value1, v2.value value2
FROM votingapi_vote v1
JOIN (votingapi_vote v2, users u1, users u2)
 ON (v1.uid != v2.uid AND v1.entity_id=v2.entity_id
   AND v1.entity_type=v2.entity_type AND v1.uid=u1.uid AND v2.uid=u2.uid)
WHERE v1.uid 

This produces a table

An incomplete inventory of ways to share code among Drupal initegrators

Whenever a team starts working on a new Drupal-based website, there's an inevitable discussion on how to organize collaboration. Three questions come up regularly:

  1. How and when to use Features, Features Extra and Features Override modules
  2. How to organize production, testing and development environments and where to develop — on a shared server or locally
  3. How to share source code.

Out of the three, the question of the source code is the most contentious. One reason is that while everyone and their friends are already on git, most of the teams that implement Drupal websites do not really need a version control system where developers can cherry-pick and merge changes or analyze bugs by navigating commit histories. Instead, teams are usually interested in incremental backup systems where each team member can be sure that he can roll back his own and other people's changes until everything works again.

Below is the inventory of ways to share source code in Drupal projects.

Incremental backup

Incremental backup with drush

One way of ensuring incremental backups without the overhead of git or other version control system is to use drush. Drush keeps a backup of previous module versions in ~/drush-backups — there is enough info to revert manually to the previously known good state. This setup is ideal for small projects with a handful of custom modules that can be kept in their own git repositories.

When in doubt, don't put your Drupal project in git. Use drush and its build-in backup mechanism.

Ansible playbook for a smallish and very simple Drupal cluster

The cluster runs on apt-based systems. It is designed for high-availability: failure of one server is not critical. However, there is no automatic failover configuration. Instead, manual recovery is possible within minutes.

  • Load balancers run varnish. Its configuration file takes into account the context_breakpoint cookie that's used to implement responsive delivery. The same server also has memcached.
  • Application servers run nginx and a recent php5-fpm through unix sockets. There's also drush. The filesystem is shared through glusterfs.
  • MariaDb is configured in master-slave mode on database servers. They also run Apache Solr.

All servers set up exim to work as smarthost, sending mails through gmail. In practice, gmail limits outgoing emails to a few thousands per day, so it is better to replace it by a dedicated solution such as Mailchimp. There's also newrelic for server monitoring on all servers.

The playbook assumes that all servers have public IPs on eth0 and sit in the private network on eth1.

For the rest, check the code in github.

A small firewall of China

There's been quite a few spam{bots,turks} lately passing through Drupal Captcha , reCAPTHA, Honeypot module… After a bit of research, I decided to block China from accessing my site. It turned out to be easy:

# install geoip filter into iptables
apt-get install xtables-addons-common
# download the existing geoip database
/usr/lib/xtables-addons/xt_geoip_dl
# set up csv parser in perl
apt-get install libtext-csv-xs-perl
# create a directory for geoip filter's database
mkdir /usr/share/xt_geoip
# build the database
/usr/lib/xtables-addons/xt_geoip_build  -D /usr/share/xt_geoip  *.csv
# load the geoip filter module
modprobe xt_geoip
# block China
iptables -A INPUT -m geoip --src-cc CN -j DROP

PROFIT!!!

P.S. Next in sight is US. US an CH account for 97% of Drupal spam.

Hosting Drupal on bare metal vs. cloud (Acquia)

We hosted Drupal websites at Hetzner for a few years. While it's unbeatable on price, it requires a skilled Linux sysadmin, which weights on personnel costs. Our guesstimate was that we'll pay a third more for a Drupal hosting, but our sysadmin costs would go down by ⅔. Add in non-material considerations, such as lower personnel turnover risks and better DDoS protection, and alternatives to Hetzner start to look almost attractive.

So, we decided to check Drupal cloud hosting solutions and asked for quotes from Pantheon and Acquia. Pantheon was less expensive and had more features. It also edged out Acquia on technology by using Linux containers instead of Amazon Web Services as underlying infrastructure. Unfortunately, Pantheon servers were located near Chicago, while most of our readers are from Europe, so we had no choice but to go for Acquia.

How to force Drupal users into answering a poll

Here, the poll is hardcoded at node/38566 and the patch applies to index.php in the root of Drupal installation.

Tested on Drupal 6.14, but shall work on many other versions as well.

--- index-original.php      2007-12-26 09:46:48.000000000 +0100
+++ index.php   2009-11-23 16:12:11.000000000 +0100
@@ -33,6 +33,12 @@
 }
 elseif (isset($return)) {
   // Print any value (including an empty string) except NULL or undefined:
+  global $user;
+  $result = db_fetch_object(db_query('SELECT nid FROM {poll_votes} WHERE nid = 38566 AND uid = %d', $user->uid));
+  if ($user->uid > 0 && !$result && $_GET['q'] != 'node/38566') {
+    drupal_goto('node/38566', NULL, NULL, 301);
+  }
+
   print theme('page', $return);
 }
Tags: