Mikkel Høgh

Coding the web since 1999

How to configure Octopress for Drupal Planet syndication

Since I moved my blog to Octopress, I’ve been struggling with my blog posts not getting picked up by Planet Drupal.

When I started using Octopress, it only had a site-wide Atom feed, and despite my little experience with Ruby or Liquid, I managed to hack together category-specific feeds for Octopress. These were merged into Octopress core, so if you get the latest version from the master branch, you should have these.

Sadly, despite my feed being valid Atom 1.0, Planet Drupal does not parse it properly, and my blog posts were still not included in its feed. This might not be all that surprising to someone more familiar with the history of the aggregator.module, which is the feed aggregation module that ships with Drupal, that Planet Drupal uses for its syndication.

Politely said, this module is not one of the parts of Drupal core that gets the most attention, and its feed parsing code is not exactly state-of-the-art. In fact, it only supports a subset of the Atom spec and a particular version of the older and inferior RSS. Now, I won’t get into the nasty history of web syndication technology, but suffice to say that this is one of the instances where I find myself wishing that the Drupal community wasn’t so NIH-prone. (that goes for you, too, project.module).

So, long story short, if you want to be on Drupal planet, you have to implement a feed format it understands, so I made an old-school RSS format, that should correspond almost exactly to what Drupal itself outputs when it generates feeds.

The template

“Drupal Planet feed template” (planet_drupal.xml) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
layout: nil
---
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="{{ site.url }}" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
  <title>Planet Drupal | {{ site.title | xml_escape }}</title>
  <link>{{ site.url }}/planet_drupal.xml</link>
  <atom:link href="{{ site.url }}/planet_drupal.xml" rel="self" type="application/rss+xml" />
  <language>en</language>
  <generator>Octopress</generator>
  <description>Planet Drupal RSS feed for {{ site.url }}</description>

{% for post in site.categories['Drupal'] limit: 10 %}
  <item>
    <title>{{ post.title | xml_escape }}</title>
    <link>{{ site.url }}{{ post.url }}</link>
    <dc:creator>{{ site.author | xml_escape }}</dc:creator>
    <guid isPermaLink="true">{{ site.url }}{{ post.url }}</guid>
    <pubDate>{{ post.date | date: "%a, %d %b %Y %H:%M:%S %z" }}</pubDate>
    <description><![CDATA[{{ post.content | expand_urls: site.url | cdata_escape }}]]></description>
  </item>
{% endfor %}

</channel>
</rss>

How to do it

  1. Download the feed template attached above.
  2. Put it somewhere in the source folder of your Octopress site.

    I have it at source/planet_drupal.xml, but the name and location should make no difference at all. Since I have it at the route of the source tree, my feed is available at http://mikkel.hoegh.org/planet_drupal.xml

  3. Change the {% for post in site.categories['Drupal'] limit: 10 %} line to match the name of the category you want to make a feed of. In my case, the category name is Drupal. This is probably case sensitive, so be sure to be consistent when categorizing your posts.
  4. If you haven’t done so already, tag your posts with the right category to have them included in the feed.

    This is how this post is tagged, for reference. This is standard Octopress post metadata.

     ---
     layout: post
     title: "How to configure Octopress for Drupal Planet syndication"
     date: 2012-08-12 22:32
     comments: true
     categories: [ Drupal, Octopress ]
     ---
     [post body here]
    
  5. Redirect your current feed address or report your new feed address to the Drupal.org webmasters.

Varnish as reverse proxy with nginx as web server and SSL terminator

Or, if you like, the nginx-Varnish-nginx sandwich.

Why?

This is, admittedly, a bit unorthodox. But here’s my rationale:

Since time immemorial (ie. more than a couple of years of Internet time), we at Reveal IT have been deploying Varnish in front of the web sites we build for our customers, with the dual purpose of a faster (and thus better) user experience and conservation of server resources.

In our previous Varnish setups, only standard HTTP would be passed through Varnish, and HTTPS traffic gets paased directly to nginx. However, we’re seeing increasing demands for TLS/SSL, and more sites are going HTTPS-only.

Varnish itself does not support (and with good reason), so we need another program to provide the secure connection. I’ve tried a couple of commonly recommended TLS/SSL terminators, namely Pound and stud, but I’ve yet to succeed in getting either to work on my server setup. And since I didn’t feel like spending an entire workday getting to know either tool enough to deploy it with confidence, I was reminded of the old quote

I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.

In this case, I have an excellent hammer, nginx, so I decided to treat this problem like a nail. Here’s my current setup:

How?

This is all running on a single machine, “rajka”, Varnish is listening on port 80, passing uncached (or uncacheable) requests on port 8080.

Now to the interesting parts. I’ve set up an nginx virtual host for the TLS terminator work:

“nginx TLS terminator” (nginx-tls-proxy.conf) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
server {
  listen 443 ssl;

  server_name revealit.dk;

  ssl_certificate /etc/ssl/revealit.dk/cert_chain.pem;
  ssl_certificate_key /etc/ssl/revealit.dk/key.pem;

  location / {
    # Pass the request on to Varnish.
    proxy_pass  http://127.0.0.1;

    # Pass a bunch of headers to the downstream server, so they'll know what's going on.
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # Most web apps can be configured to read this header and understand that the current session is actually HTTPS.
    proxy_set_header X-Forwarded-Proto https;

    # We expect the downsteam servers to redirect to the right hostname, so don't do any rewrites here.
    proxy_redirect     off;
  }
}

While nginx virtual host should be pretty self-explanatory, the Varnish configuration is a bit more tricky. I suggest you take the time to review our entire varnishconf, but I have extracted the most relevant parts here:

“Varnish VCL” (varnish-tls-proxy.vcl) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# List of upstream proxies we trust to set X-Forwarded-For correctly.
acl upstream_proxy {
  "127.0.0.1";
}

backend default {
  .host = "127.0.0.1";
  .port = "8080";
}

sub vcl_recv {
  # Set the X-Forwarded-For header so the backend can see the original
  # IP address. If one is already set by an upstream proxy, we'll just re-use that.
  if (client.ip ~ upstream_proxy && req.http.X-Forwarded-For) {
    set req.http.X-Forwarded-For = req.http.X-Forwarded-For;
  } else {
    set req.http.X-Forwarded-For = regsub(client.ip, ":.*", "");
  }
}

sub vcl_hash {
  # URL and hostname/IP are the default components of the vcl_hash
  # implementation. We add more below.
  hash_data(req.url);
  if (req.http.host) {
      hash_data(req.http.host);
  } else {
      hash_data(server.ip);
  }

  # Include the X-Forward-Proto header, since we want to treat HTTPS
  # requests differently, and make sure this header is always passed
  # properly to the backend server.
  if (req.http.X-Forwarded-Proto) {
    hash_data(req.http.X-Forwarded-Proto);
  }

  return (hash);
}

The main issue here is the X-Forwarded-For header which is used by the web application to determine the IP address of the actual client, not any intermediary proxies. Since the X-Forwarded-For can be used for IP address spoofing, it is important to configure this securely.

For your web application to get the correct IP address for the client user, it needs to be configured to trush the X-Forwarded-For header for requests coming from the Varnish server. Most web applications have built-in support for this.

However, this means that we need to be completely sure that we do not pass on malicious X-Forwarded-For headers from the client. The easiest way to accomplish this is to simply set the header yourself. But in this case, we have two levels of proxying. So nginx always sets the X-Forwarded-For header to the client’s IP address. Varnish normally does the same, but if, and only if, the request is coming from 127.0.0.1 (the IP of the TLS terminator), we use the value it provided instead.

Now, this is not strictly compliant to how X-Forwarded-For is supposed to be used (we should actually append the IP address of the TLS terminator to its value), but since Drupal uses the right-most (ie. last added) IP address, that would not actually work in our case.

Lastly, by including X-Forwarded-Proto in the vcl_hash function, we ensure that HTTP and HTTPS requests are cached separately, so a user visiting the site via HTTPS will get pages where the links are also HTTPS. This does reduce the efficiency of the cache (since it’ll leave two copies of everything in the cache, including images and other static files that are not protocol-sensitive.

Fixing that issue is left as an exercise to the reader.

Caveat lector

Though I’ve had this idea for a while, I’ve only had a working, in production, implementation of this for ~8 hours. It may yet turn out that this was a horrible idea, but so far it’s working great.

Deploying LedgerSMB with nginx and Plack on FreeBSD

I have recently been looking at LedgerSMB for Drupal Danmark’s accounting needs, but I’ve struggled getting it set up, since it is a classic CGI app, which is not something we usually run on our servers, especiallty since we use nginx on our servers, at it does not support classic CGI. It does support FastCGI, so we need some sort of wrapper around LedgerSMB to get it running.

Fortunately, the wonderful Perl commnuty has a solution for that, namely Plack.

After digging around a bit and getting helpful tips from the LedgerSMB mailing list, I managed to get a working solution together:

  1. Preamble

    Since we use FreeBSD in this example, I’m going to use the ports collection to install the required CPAN packages for this to work. Here we assumbe that you have LedgerSMB and its dependencies installed already.

  2. Installation

    The installation is as simple as running this command:

     # portmaster www/p5-Plack www/p5-CGI-Emulate-PSGI www/p5-CGI-Compile
    

    This will install Plack and its dependencies.

  3. Starting the Plack server

    This is fairly simple, in fact. Just proceed to the folder where you have LedgerSMB installed and run the following command:

     % plackup -MPlack::App::CGIBin -e 'Plack::App::CGIBin->new(root => "/usr/local/ledgersmb", exec_cb => sub { 1 })->to_app'
    

    Make sure you adjust the root path in the command to point to your LedgerSMB folder.

    Unless you have added the LedgerSMB folder to your Perl library path, you need to run this command from that folder.

  4. Configuring nginx

    This is very similar to what we do for our PHP-FPM apps, and it seems to work as it should:

     server {
       listen       443;
       server_name  ledger.example.com;
    
       keepalive_timeout 300;
    
       ssl                  on;
       ssl_certificate      /etc/ssl/ledger.example.com.crt;
       ssl_certificate_key  /etc/ssl/ledger.example.com.key;
    
       root   /usr/local/ledgersmb;
    
       # Serve the login page at the root.
       location = / {
         rewrite ^ /login.pl redirect;
       }
    
       location ~ \.pl$ {
         proxy_set_header Host $http_host;
         proxy_set_header X-Forwarded-Host $http_host;
         proxy_set_header X-Real-IP $remote_addr;
         proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
         proxy_set_header X-Forwarded-Port 443; #this is important for Catalyst Apps!
         proxy_pass http://localhost:5000; #changed from http://localhost:5000/ which was causing double forward slash problems in the url
       }
    
       # Deny access to configuration and other nasty places.
       location ~ \.conf$ { deny all; }
       location /users { deny all; }
       location /bin { deny all; }
       location /utils { deny all; }
       location /spool { deny all; }
       location /templates { deny all; }
       location /LedgerSMB { deny all; }
     }
    
  5. Profit

    This setup has worked very well for me so far. Hit me up in the comments below if you have any improvements :)

Trouble in Node.js paradise: The mess that is npm

The preface

Let me begin by stating that I love building web apps with Node.js, and I think it’s one of the greatest things that’s happened in the web app space this decade. I have been using Node.js for various small projects the last nine months, so I think I have a reasonable grasp of the subject matter.

The mess

One of the most exiting things about Node.js is the intense vibrance of the community of developers around it and the strength of tools like npm and Github that makes it almost effortless to share your Node.js creations with the world.

However, this has also created a huge problem for developers like me – that of picking the right module to use. The number of modules on npm is growing at a staggering speed, and it is becoming increasingly difficult to shift the wheat from the chaff.

The example

A real-world example. I’ve been looking for a tool to help me serve up my app’s CSS and JavaScript in as small a package as possible to reduce load times. That means concatenation and minification, and preferably compression as well.

In my old Django-days, I’d have used django_compressor, and if I was new to the community, I’d have to look no further than this asset manager comparison page to see that it would be the right choice for me.

With Node.js, there are no such comforts. I’ve spent a couple of hours digging around on Github and npm (mainly by looking at modules that depend on UglifyJS), and after eliminating modules that were obviously unmaintaned/outdated/undocumented, I’ve managed to shorten my list to these options:

  • ams
  • app.js
  • asereje
  • assets-packager
  • auton
  • bastard
  • beans
  • browserify
  • buddy
  • buildr
  • codesurgeon
  • dryice
  • express-asset
  • folio
  • hem
  • inliner
  • masher
  • nap
  • piler
  • polymorph
  • resmin
  • smoosh
  • snockets
  • stitchup
  • vivid-builder
  • wepp

Yes, twenty-six possible modules to evaluate. I don’t have the entire list, but I’d expect it to be around a hundred modules.

The choice

This is an almost impossible choice. I’d have to spend a couple of days trying out each of these modules to figure out which one is the right fit for me. It’ll probably be faster for me to write my own little tool for it thand to educate myself of all these options.

And it’s not just this space that suffers from this problem. It’s the same for routing, templating, testing and similar things that are commonly used in Node apps.

The waste

Each of these modules have been crafted, built. Someone took the time to explore the problem space, figure out what the his requirements, build a reusable tool, document it (somewhat), make releases, publish them omn npm, fix bugs, etc, etc.

The above list probably represents more than two thousand man-hours of work, most of it wasted.

Of course, some duplication is necessary for the best ideas to float to the top, but contrasting this with my experiences from the Drupal community, this is absurd.

The solution

If you have been listening to NodeUp (which you probably should if you’re at all interested in Node.js), you would know that the npm developers are working to come up with metrics for packages, that will make it easier to gauge the quality of a module. This will definitely help with sorting out the mess, but I think we should also consider what we can do about the waste and duplication of effort.

I think this is mainly a question of culture. What makes the Drupal community so strong is how much collaboration is valued. Duplication of effort is frowned upon, responsible maintainership is encouraged.

There’s no easy way to accomplish this. I won’t suggest we restrict access to posting modules on npm or have a formal application system, like Drupal does, as I do not believe either of those would work with the Node.js community. We would need more formal structure for that and a much more cohesive community, and I do not think either of those are feasible (nor desirable) for the Node.js project.

The plea

Instead, I’d just like to plead with my fellow developers. Before you decide to roll your own, please see if there is an existing project you could help improve instead. Consider it a way of repaying for all the open source work you have benefited from. Paving the road for others, like others have paved it for you.

I know it’s fun to do it lone ranger style, to take on the beast and slay it all by yourself – but ask yourself this: Do you care enough about this problem space to maintain a module for the next year? Would you prefer your efforts being 100% of the experience for a dozen people – or 10% of the experience for thousands of people.

That is why I love open source. Knowing that each of the half a million Drupal installations out there have a small piece of my work in them. That the President of the United States relies on my work, in an infitessimally tiny way.

That is what I’d like to see more of in the Node.js community.

Shave a couple of stubborn of DIV-wrappers off your Drupal site

One of the more annoying things about theming Drupal sites is having to wade through the staggering amounts of wrapping <div> elements and containers. Some of these are are fairly easy to get rid of. Others require you to override core templates.

I recently found a clean way to get rid of a couple of those. These two were introduced in Drupal 7, and you will probably find them on almost all Drupal 7 sites – they look like this:

Or in markup:

The culprits
1
2
3
4
5
6
7
<div class="region region-content">
  <div id="block-system-main" class="block block-system">
    <div class="content">
    <!-- Actual page content here -->
    </div>
  </div>
</div>

Now, the last of these wrappers are actually useful, the rest stems from one of the changes in Drupal 7, namely that the main page content is now a block, that can be positioned on the page via Drupal’s block system.

Now, that’s a nice concept, but all the site I’ve seen do business as usual, and get around this inconvenience by creating a block region called “content” and sticking the content-block in there as the only thing, leaving the region and block wrappers as more DIV-spam in your site’s markup.

So unless you’re actually doing something different with the content block and/or region, you can just get rid of these extra wrappers by sticking the two following templates in your theme’s template folder:

(region–content.tpl.php) download
1
2
3
4
5
6
7
8
<?php
/**
 * @file
 * Render the main content block region.
 *
 * We don't print all kinds of wrapper divs and titles, just the content.
 */
print $content;
(block–system–main.tpl.php) download
1
2
3
4
5
6
7
8
<?php
/**
 * @file
 * Render the main content block.
 *
 * We don't print all kinds of wrapper divs and titles, just the content.
 */
print $content;

Short and sweet :)

Using Database-level Foreign Keys in Drupal 7

If you use a good database system, foreign keys is an actual concept on the server that is used to enforce data integrity.

With database-level foreign keys, it becomes impossible to break your data by deleting data referenced by other data, without also dealing with the referenced data.

In practice this means that deleting a node from the database could be as simple as DELETE FROM node WHERE nid = 53 CASCADE, and conversely it would not be allowed to delete the row from the node table, as long as it is referenced in node_revision, field_something_something and the umpteen other tables a Drupal site is likely to have that depends on the node table.

Currently, it’s neigh-impossible to delete a node or another entity from the database without using Drupal’s API. This is a sticky problem if you ever need to share the database between systems.

In Drupal 7, the syntax to define foreign keys were introduced to Drupal’s schema API. The schema API is used to explain the database structure to Drupal, so it can be understood and utilised by modules like Views. This understanding is also translated into SQL code, when tables are created by Drupal’s install scripts.

However, the foreign key syntax introduced in Drupal 7 does not affect the database structure at all, it is only used inside Drupal for relating one table to another. Perhaps in time, Drupal will also create the foreign keys at the database level, but until that happens, you will need to create them manually. Here’s how to do it.

An example

In the following example, I define two tables, and then use hook_install and hook_uninstall to set up and dismantle the foreign keys.

(2011-10-05-foreign-key-example.php) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
<?php
/**
 * @file
 * Installation and upgrade code for Zavod supplier.
 */

/**
 * Implements hook_schema().
 */
function zavod_supplier_schema() {
  $schema = array();

  $schema['zavod_suppliers'] = array(
    'description' => 'Stock suppliers for Zavod.',
    'fields' => array(
      'supplier_id' => array(
        'description' => 'The primary identifier for a supplier.',
        'type' => 'serial',
        'unsigned' => TRUE,
        'not null' => TRUE,
      ),
      'title' => array(
        'description' => 'The title of this supplier, always treated as non-markup plain text.',
        'type' => 'text',
        'not null' => TRUE,
      ),
    ),
    'primary key' => array('supplier_id'),
  );

  $schema['zavod_supply_orders'] = array(
    'description' => 'Supply orders for Zavod.',
    'fields' => array(
      'order_id' => array(
        'description' => 'The primary identifier for a supply order.',
        'type' => 'serial',
        'unsigned' => TRUE,
        'not null' => TRUE,
      ),
      'supplier_id' => array(
        'description' => '{zavod_suppliers}.supplier_id of the supplier that the order is made to.',
        'type' => 'int',
        'not null' => TRUE,
        'unsigned' => TRUE,
      ),
      'title' => array(
        'description' => 'The title of this order, always treated as non-markup plain text.',
        'type' => 'text',
        'not null' => TRUE,
      ),
    ),
    'foreign keys' => array(
      'zavod_suppliers' => array(
        'table' => 'zavod_suppliers',
        'columns' => array('supplier_id' => 'supplier_id'),
      ),
    ),
    'primary key' => array('order_id'),
  );

  return $schema;
}

/**
 * Implements hook_install().
 */
function zavod_supplier_install() {
  // Make real foreign keys.
  db_query('
    ALTER TABLE {zavod_supply_orders}
    ADD CONSTRAINT {zavod_suppliers}
    FOREIGN KEY (supplier_id) REFERENCES {zavod_suppliers} (supplier_id)
  ');
}

/**
 * Implements hook_uninstall().
 */
function zavod_supplier_uninstall() {
  // Make real foreign keys.
  db_query('
    ALTER TABLE {zavod_supply_orders}
    DROP CONSTRAINT IF EXISTS {zavod_suppliers}
  ');
}

It’s pretty self-explanatory if you’re used to Drupal’s schema API. If you’re not, you should definitely learn.

How to install multicore Apache Solr on FreeBSD with Jetty

If you use Apache Solr with your Drupal site, you have probably come across the need to have more than one Solr instance. You may have multiple sites, or just multiple copies of the same site, production and staging perhaps?

There are two widely published ways to accomplish that. One is to set up completely separate Solr instances with whatever Java-server you’re using. That is somewhat inefficient, and I was unable to get such a setup working properly anyways. So here’s the alternative, “multi core”.

The benefit of using multi core is that you avoid most of the configuration overhead associated with figuring out how to get multiple WebAppContainerDeploymentContextFactoryGeneratorWidgetClass instances (or whatever they’re named in your brand of Java-server) to coexist by using the same WAR-file but not the same configuration. I spent a lot of hours trying to accomplish just that.

Instead, all the interesting stuff happens in the Solr configuration, which is a lot less confusing for a Java novice like me.

Instructions

  1. Install a Java JRE.

    You may want to see my explanation on how to do this.

  2. Install Jetty and Solr.

    If you use portmaster, this can be as simple as running portmaster www/jetty textproc/apache-solr

  3. Create a folder for your Solr multi core instance’s configuration and data files. This could be anywhere, but in this example, I’m going to use /srv/solr.

  4. Create a /srv/solr/solr.xml file for the configuration, setting up the different cores into folders.

    Mine looks like this:

    (solr.xml) download
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    <?xml version="1.0" encoding="UTF-8" ?>
    <!--
    All (relative) paths are relative to the installation path
      
      persistent: Save changes made via the API to this file
      sharedLib: path to a lib directory that will be shared across all cores
    -->
    <solr persistent="false">
    
      <!--
      adminPath: RequestHandler path to manage cores.  
        If 'null' (or absent), cores will not be manageable via request handler
      -->
      <cores adminPath="/admin/cores" sharedLib="lib">
        <core name="dev" instanceDir="dev" />
        <core name="prod" instanceDir="prod" />
        <core name="stg" instanceDir="stg" />
      </cores>
    </solr>
    

  5. Create all the folders referenced in the config file.

    We specified four folders, so lets create them via a simple mkdir dev lib prod stg (while standing in the /srv/solr folder).

  6. Make these folders owned by the user that will run our Solr instances. In this example, I’ll use the www account, but it would be more secure to set up a separate account for running Solr if you have other web servers running on the same machine.

    chown www:www dev lib prod stg

  7. For each core you want to set up, copy or symlink the schema and other configuration files you need into the conf folder in each core folder. In this example, I’m copying the example configuration from /usr/local/share/examples/apache-solr/solr/conf/. If you’re working with Drupal, be sure to copy the solr configuration it ships with into each core.

    mkdir prod/conf
    cd prod/conf
    cp -r /usr/local/share/examples/apache-solr/solr/conf/* ./
    cd ../.. cp -r prod/conf dev/ cp -r prod/conf stg/

  8. Enable Jetty

    In this example, we’re going to use Jetty to run the Solr service. I am not well versed in the Java lingo for this, but Jetty is a servlet container, so I guess that means Solr is being run as a servlet. I also tried this with Tomcat, but that was a lot harder to configure properly.

    Add jetty_enable="YES" on a new line in /etc/rc.conf.

  9. Copy /usr/local/jetty/etc/jetty.xml to /usr/local/etc.

  10. Symlink solr.war into /usr/local/jetty/webapps

cd /usr/local/jetty/webapps
ln -s /usr/local/share/java/classes/apache-solr-3.2.0.war solr.war

  1. Symlink /srv/solr into /usr/local/jetty

cd /usr/local/jetty ln -s /srv/solr

  1. Start Jetty by running service jetty start

Hopefully, after all this work, Solr should be ready once its done booting up.

You can check that it’s working by running curl -iL localhost:8080/solr/prod/admin/. This should output the HTML for the admin interface.

If you have problems, try running tail -f /usr/local/jetty/jetty.log in one terminal, and then service jetty restart in another, and look what goes on as Jetty restart (there’ll be a lot of messages flying by, but the error should be in there somewhere).

Run VirtualBox virtual machines on boot in Mac OS X

To celebrate the launch of VirtualBox 4.0, I’d like to share a simple trick for making your virtual machines start automatically when your computer boots.

I have a Mac mini server that has a couple of virtual machines running. Until recently, I used VMWare Fusion with a wonky setup with a user that autologins and has VMWare Fusion as a login item, but that has proven to be cumbersome and error-prone, and since both of those virtual machines are servers, there’s really no reason they should be run graphically.

Thus, I switched to VirtualBox and set it up to run automatically on boot. In this example, revealit.dk is my company, virtual is the user account on the server that owns/runs the virtual machines and Chestnut is the name of the virtual machine in question.
The virtual server config resides in /Users/virtual/VirtualBox VMs/chestnut

  1. Set up your virtual machine as you please using the VirtualBox Manager GUI app (or whatever tool you prefer).
  2. Shut down the virtual machine (ie. close the window with its screen content). To be completely safe, turn it off completely.
  3. Create and edit /Library/LaunchDaemons/dk.revealit.ChestnutVirtualbox.plist setting it up something like this:

    (dk.revealit.ChestnutVirtualbox.plist) download
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
      <key>Label</key>
      <string>dk.revealit.ChestnutVirtualbox</string>
      <key>ProgramArguments</key>
      <array>
        <string>/usr/bin/VBoxHeadless</string>
        <string>-s</string>
        <string>chestnut</string>
      </array>
      <key>UserName</key>
      <string>virtual</string>
      <key>WorkingDirectory</key>
      <string>/Users/virtual</string>
      <key>RunAtLoad</key>
      <true/>
    </dict>
    </plist>
    

  4. Edit the details to match your system. Make sure that the label matches the filename.

  5. Run sudo launchctl load -w /Library/LaunchDaemons/dk.revealit.ChestnutVirtualbox.plist to load the launchd service. This will cause launchd to keep the virtual machine running indefinitely.

And that’s it. Be aware that you must now use launchctl stop dk.revealit.ChestnutVirtualbox should you want to stop the virtual machine. Stopping it in any other way will just cause launchd to restart it.

del.icio.us – can the Drupal community do better?

As you may know, Yahoo! is in trouble, and has decided to jettison the social bookmarking service del.icio.us (Delicious).

I am not a big delicious user anymore (actually, I deleted my account when Microsoft was trying to purchase Yahoo!), but this recent closing made me wonder if the Drupal community couldn’t do better…

One of the great strengths of Drupal is self-hosting, and a bookmarking service is not a complicated thing.

Thus, I am planning to put in a bit of effort into having my own bookmarks section here on this very site. A bookmark content type should not take long to configure with the Links module, Automatic Nodetitles to make sure the nodes have the same title as the link itself, and a bit of Views work should make it presentable.

Workflow

The main issue here is workflow. Nowadays, I use Google Reader’s sharing interface for “bookmarking” links, since it has a nice interface for tagging, putting notes on there, etc.

So what’s really missing in my workflow is some way of getting that data into Drupal, and since Google provides a very tasty atom feed with lots of metadata for all my shared entries, it should be doable with a bit of add-on code for the Feeds module.

What to do?

The main thing I would like to know here is if other Drupallers would be interestered in such a thing, and if anyone has input as to how it best could be built? I suppose a Features module could be in order, but I have not seen a whole lot of those published on Drupal.org. Is it wise?

And where to put my Google Reader for Feeds module integration? I suppose I could make a separate project for that on d.o, but that seems a bit overkill-like. Any takers?

35% response time improvement from switching to uWSGI/nginx

As part of refreshing the Reveal IT website, I have moved it from mod_wsgi running on Apache HTTPD to uWSGI running on nginx, mainly because my previous setup had both Django and Drupal sites running on the same Apache server.

Due to some of the shortcomings of PHP, the only recommended way to run mod_php on Apache is via the prefork MPM, which carries a high memory usage penalty per process. My Apache processes hover around 100MB of RAM each after serving a few requests. Thus, it is a bit wasteful to use those fat PHP-enabled processes for serving Django requests.

Out of curiosity, I decided to move the site to nginx before the upgrade, just to see how that would affect performance. I expected a modest improvement, but in my case, it yielded a ~35% boost in page loading times – here’s the chart from Pingdom:

I am not entirely certain what goes on here, but not only is the uWSGI/nginx combo providing better RAM utilisation, but it is also providing much better response times.

Take note, this is the same hardware, the same OS, the same database, the same memcache instance.
Only thing thats changed in the period of the graphs is the replacement of Apache. I have upgraded the website yesterday, but that did not change the picture much. If anything, it increased the load time ever so slightly.

In fact, the picture becomes even wilder if I use Pingdoms filters to get load times from their Europe (where my site is hosted):

The average load time is essentially cut in half, from around 500 ms to 250 ms.

Спасибо большое, nginx.

Postbox botches upgrade policy, censors customer complaints

Postbox is a commercial e-mail client, based on Mozilla Thunderbird. Is is mostly a layer of polish and OS integration on top of a popular open source project.

Postbox 2.0 was recently released, and the company behind it, Postbox Inc., decided to offer a very short upgrade window for customers that recently purchased a license, just 31 days. The 2.0 version had been announced much earlier, and many, myself included decided o purchase a license on the promise of the features coming in 2.0.

Additionally, Postbox participated in the MacUpdate Promo just two weeks before the upgrade cutoff. That is, in my eyes, downright devious. Offer a discount, and then make a paid upgrade a few weeks later to get the full payment from all the people who fell for your ruse.

As I was one of those, I was a bit surprised, and decided to post a question about it on their forums.

As it turns out, I was not the only one disappointed by this, and a bunch of other customers chimed in with their comments.

And this is where Postbox, Inc. employee Sherman Dickman steps in, reiterates their policy, gives the usual “we can’t give it all away for free”-statement and finally states that several comments were deleted for “violation of our posting guidelines

I have preserved those comments here, and as you can see, it was all just paying customers expressing their disappointment:

Exhibit #1

Joshua Miller said: Yeah, that’s pretty lame - especially since I participated in the Postbox BETA just after buying a 1.1 license. I bought this in July and now, less than 3 months later, have to spend another $20 on the application. If I’d bought it a year ago I could see maybe paying full price, but less than 3 months? Come on, that’s just absurd.

Exhibit #2

Evelyn Creelman said: i too automatically updated - to 2.0- thinking this what what i had actually paid for- i paoid for this project because i felt it a viable alternative to mac mail-something ive been looking for forever- the fact that it was not free was fine- i didnt mind paying - but i am trully blown away by this policy -

When i purchased postbox- i believe i should have been clearly informed before purchase that they would be releasing an upgrade in three weeks-

that i would have to pay for- therefore giving me the choice to wait and , essentially, not feel as though i paid to beta test their software for three weeks.

what a shame.

i so perturbed about this - really disappointing - as it seems very shady - especially when a response form postbox basically said- egged me on over

Exhibit #3

Evelyn Creelman said: sorry the last post got cut off- i was saying i received an email from postbox support over this issue-basically saying i paid the price of a pizza for their software- and basically seemed to be very condescending- regarding my perturbed-ness..over being asked to pay again three weeks later-

Exhibit #4

Evelyn Creelman said: and to just to back myself up ..( or beat a dead horse) found this comment from postboxs’ Sherman Dickman, that contradicts my experience:( found macupdate.com/users/Sherman_Dickman )

“we provided one with as much transparency as possible so people can make informed choices.”

nope, you actually didnt.

Parting words

Summing up, Postbox has now managed to make a number of customers feel exploited by their attempt to nickle-and-dime us, several of these customers active beta testers. They have demonstrated their contempt for customer opinions and open debate. All while riding the coattails of a popular open source project.

I hope that they will improve their behaviour, because otherwise I think their otherwise good and useful product will be ridden to ruin by a greedy and short-sighted business policy.

Reply from Postbox

I got this e-mail from Sherman Dickman, co-founder of Postbox:

Mikkel,

We’re fine with criticism of our product or our policies, and anyone is free disagree with them and/or challenge thme. But we draw the line when the integrity of our company is called into question, particularly when it is done so without all the facts.

We also draw the line on forum abuse, and as such, we’ve disabled your forum account.

We’re issuing you a refund via PayPal at this email address, and we wish you the best of luck in finding a suitable alternative.

Sherman Dickman Postbox, Inc.l

To which I have replied:

Hi Sherman,

I am sorry that it turned out this way. I think this would have been an excellent opportunity for you to eat some humble pie and perhaps reconsider your policies.

You should have known that you would get unfavourable reactions when you decided to make a paid upgrade a mere six weeks after the MUPromo.

And when the criticism came, you decided to be condescending to some of your most loyal users.

It is your company to run, but I think you would be much better off trying to build a good community for Postbox instead of trying to monetise existing users.

I still enjoy Postbox as a product, and as such, I intend to spend the refund you sent me on an upgrade to 2.0.

So please, keep building awesome software, and loose up on the policies. None of the now eight comments you have deleted on that thread could have been as damaging to your reputation and customer relationship as deleting them have.

– Kind regards,

Mikkel Høgh mikkel@hoegh.org

Relauching my blog on Drupal 7

More than a year ago, I was agitating for a move to Drupal 7 for all the blogging developers. As is rather obvious now, Drupal 7 was not in a state then for public websites. There was outstanding security issues, no upgrade path, lots of API changes to be made, etc.

However, since that has all been resolved now, I figured it was about time I moved my blog over. I have obviously been preparing for this for some time, as I will get back to in later blog posts.

However, I would once again like to encourage fellow developers to experiment with Drupal 7. The process of building a real, public-facing website with Drupal 7 so soon in the process have taught me a lot of Drupals innards, and helped me uncover a fair number of bugs in Drupal core and contributed modules that have since been fixed.

As you can also see in the 7.0 beta 1 accouncement, there is a lot to be exited about, and as we have entered the beta-phase, upgrades will be supported from now on, so the site you create with beta 1 should be upgradable all the way to the final release.

So if you can spare the time to fiddle with it, I highly recommend it. It is an educative challenge – enjoy :)

Protecting your users from phishing with Apache rules and HSTS

HTTP Strict Transport Security or HSTS is a new security feature in browsers that enables you tell the browser “always use SSL when accessing this site”.

Mozilla has a good blog post explaining HSTS, so I won’t try to replicate that here, but I’d just like to make it clear that if you have a site that should always use SSL, be it Drupal or Django or any other system, this is definitely something you should get set up.

Good examples of these are webmail, server administration and monitoring tools and general admin backends. If you are running a large Drupal-site, you should perhaps consider restricting admin-access to a SSL-protected subdomain.

Currently, it is only supported in Chrome 4 and above, and Firefox 4 beta 5 and beyond, but hopefully the other browser makers will catch up soon. Its fully backwards compatible, in that it will have no effect if the browser does not support HSTS.

How to use it

Setting it up is very simple. In your Apache VHost, where you do your SSL config, just add this line:

1
Header add Strict-Transport-Security "max-age=15768000"

This will tell the browser to remember that this site is SSL/HTTPS only for the next 6 months. During that time it will simply rewrite any and all requests to that site to use HTTPS instead of HTTP without ever communicating insecurely with the server.

If you use nginx, the syntax is subtly different. Adding this to the server section does the trick:

1
add_header Strict-Transport-Security max-age=15768000;

Keep your redirects

An important point is that HSTS only works after the user has received the header via HTTPS. So you will still need to have a redirect from your HTTP-site to HTTPS, also for supporting browsers that still do not understand HSTS.

This is easily accomplished using Apache’s mod_rewrite:

1
2
3
4
5
<IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteCond %{HTTPS} off
  RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}
</IfModule>

Thus, with a few lines of configuration, you can make the web a safer place to be for your users. So, what are you waiting for?

A tip for using PostgreSQL with Drupal 6

If you are using PostgreSQL for hosting your Drupal sites, you might have noticed a lot of warnings in your logs like these:

Aug  8 18:41:05 s002 postgres[90076]: [5-1] WARNING:  nonstandard use of \\ in a string literal at character 32
Aug  8 18:41:05 s002 postgres[90076]: [5-2] HINT:  Use the escape string syntax for backslashes, e.g., E'\\'.
Aug  8 18:41:05 s002 postgres[90076]: [6-1] WARNING:  nonstandard use of \\ in a string literal at character 122
Aug  8 18:41:05 s002 postgres[90076]: [6-2] HINT:  Use the escape string syntax for backslashes, e.g., E'\\'.

The immediate cause for this is bug #426008 in Drupal core, but the issue stems from the fact that PostgreSQL does not conform exactly to the SQL standard with regards to backslashes in strings. The reasoning behind this and why its going away (as a default setting) in PostgreSQL 9.1 can be read in this excellent blog post by Bruce Momjian.

But how do I fix it?

The good news is that this behavior is configurable. You can set standard_conforming_strings = on in your postgresql.conf and be done with it. This will be the default setting from PostgreSQL 9.1, and hopefully the other applications using your database do not depend on the legacy behavior (if they do, they need fixing).

If that’s not suitable for your setup, there is a few other suggestions in this forum thread.

Presenting Django Password Required

Have you ever wanted to password-protect your Django-site, without requiring user registration, do you find HTTP Basic Auth to be a very blunt instrument for protecting sites or do you want to do StackOverflow style beta-testing?

Then Django Password Required is for you. It provides a simple @password_required decorator for your views, and lets you configure a password in your settings.py file. The authentication is stored in the user’s session data, using Django’s own session system. This means that Django Password Required can co-exist with django.contrib.auth, so you can allow users to log in after they’ve provided the password to access the site.

I use it for a little skunkworks project that does not have user logins per se, but since it is not open to the public yet, I need to protect it, at least from webspiders and random visitors. I don’t mind if the password is spread by word-of-mouth, since the site contains nothing sensitive or private.

Initially I used HTTP Basic Auth, but setting that up with Apache is an all-or-nothing deal, requires you to enter the password quite often on iPhone/iPad, and interferes with AJAX requests/API calls. So I created this lightweight app, so as to require a password, store that the user is logged in via a cookie bound to a server-side session, with a long lifetime so you won’t get nagged for the password very often.

Bug reports/suggestions, documentation, source code, etc. It all happens on Github. Enjoy.

Introducing Herd Fire

If you, like me, are an avid Firefox user, you will likely have felt the burden of using the same Firefox profile for a variety of tasks. Having NoScript or ImgLikeOpera installed is handy when surfing, but just annoying when working on developing websites. Having FireBug installed will slow down JavaScript execution on all pages, unless you disable some of its features, regardless of whether you’re using it or not. Every extension you install slows down Firefox ever so slightly.

And that just extensions. Same goes for many aspects of Firefox configuration, language, about:config, etc. Would it not be better to have several Firefox profiles, one for each task? If you ask me, it would.

One problem though – even if you find the hidden Firefox profile manager, Firefox will not let you launch multiple instances of it without a bit of coercion. Previously, I resorted to all kinds of commandline trickery to manage my profiles until I found a script somewhere on the web (I’ve been unable to find it again. If you know it, please post a comment – I’d like to give proper attribution) that helped me set up copies of Firefox.app for each profile, but it had its limits. It did not work for Firefox.app itself, only for its named copies. It also renamed the Firefox binary, causing trouble for other scripts. So I’ve rewritten it in Python, improving a few key things:

  1. It modifies Info.plist to use a launching script instead of renaming firefox-bin.
  2. It sets the normal Firefox.app to use the profile named “default”.

Instructions for use

  1. Download Herd Fire.
  2. Copy your Firefox.app to create a named copy (I’m using the name “example” here):
    Copying Firefox.app
  3. Run Herd Fire ( run it from the folder its located in, or stick it in a folder on your path):
    Running herdfire
  4. Launch your new Firefox copy.
    Launching Firefox
  5. If there is not a Firefox profile with the extra name you gave your Firefox.app copy (in this case “example”), the profile manager will appear. In that case, use it to create a new profile with the correct name.
    Profile manager
    Pick a name
  6. Firefox-example.app will now always start with the “example” profile activated. Firefox auto updater might break this. In that case, all you need to do is to run herdfile again.
    New Firefox profile

The code is in a GitHub repository, so please don’t hesitate to fork, file bugs, etc.

Attention all Drupal Git-mirror users

A long-standing issue with the Git mirrors of Drupal’s CVS has been fixed thanks to Damien Tournoud.

The problem is that CVS outputs dates in RCS tags in the somewhat nonstandard format 2009/10/19 (ISO 8601 specifies dashes, not slashes as separator). The git-cvsimport tool used for creating the mirrors, however, uses cvsps, that updates the RCS tags to use the correct format (2009-10-19). Adhering to standards is generally a good thing, but in this case it was causing merge conflicts when trying to merge patches created with Git into Drupal (or vice versa).

Damien found a way to resolve the issue, however:

Adding DateFormat=old to the CVSROOT/config file fixes the problem.

Changing this, however, required a reimport of the entire repository. Due to the way Git works with commit-ids being a cryptographic hash of their contents, changing the contents (even if just the RCS tags) means a rewrite of Git history.

So while the new repository contains the same code, you will not be able to merge new changes from it into your current checkouts. Damien will continue both imports for a while, but updates for the old repository with the incompatible date format will be discontinued at a future date.

What is the bottom line then?

The executive summary

  1. The Git mirror at git://github.com/drupal/drupal.git has been rewritten with it’s RCS tag date format compatible with CVS defaults. Please use this mirror for all your future projects.
  2. The Git mirror at git://github.com/mikl/drupal.git will continue to have the CVS-incompatible format, and will, for a time, continue to be updated, so you will be able to use it for a little while longer.
  3. There is now no excuse for not using Git for your Drupal core development work. Enjoy.

Finally, I’d like to thank Damien for doing all the hard work. I was maintaining the git-cvsimport process myself for a while, and I do not miss it.

New blog, same as the old one…

So, I finally did it. I’ve long wanted to do something about this blog, to try and push a better design on it and generally trim everything.

I wanted to try something new and challenging, so now I’ve rebuilt my blog with Django Mingus.

Building stuff with Django tends to be a lot of fun. I have quite a few ideas that I’d like to try out, so you may see some of my work moving into Mingus.

Rotating Apache httpd logfiles on FreeBSD

With the disk space available on modern servers, you tend to notice some things a lot less. Like the boring fact that without log rotation, an Apache access log can grow to gigabyte size in no time.

FreeBSD’s Apache HTTPD port does not ship with configuration for the FreeBSD log rotation utility, newsyslog, so your logs won’t be rotated by default.

That, however, is fairly easy to fix by tweaking /etc/ newsyslog.conf a bit.

Here’s how I did it:

/var/log/httpd-access.log www:www 440 9 * $W1D4 J /var/run/httpd.pid 30
/var/log/httpd-error.log www:www 440 9 * $W1D4 J /var/run/httpd.pid 30

Broken up, this means:

  1. /var/log/httpd-access.log: Name of the log file we’re rotating
  2. www:www set user and group ownership of the archived logs to www, so sysadmins can read them.
  3. 440 set the archive files to be read-only for the www user and group and no access for anyone else.
  4. 9 keep nine archived log files excluding the current one. This way, we should always have the latest 10 weeks of log data available.
  5. * don’t rotate based on log file size.
  6. $W1D4 rotate logs every Monday at 4 in the morning.
  7. J compress the archived logs with bzip2.
  8. /var/run/httpd.pid – get the process ID for the httpd server here.
  9. 30 send SIGUSR1 to cause a graceful restart of httpd.

I set this up last week, and it has since done its work, turning my 1GiB /var/log/httpd-access.log into a 46MiB httpd-access.log.0.bz2. Log files are some of the best use cases for compression. Enjoy.

How to get your Disqus API keys

I’m working on importing my comments into the otherwise excellent Disqus commenting system, but getting ahold of your API keys can be rather difficult, so I’ll just document the process here for later reference.

To call the API functions, I’m using the Java-based REST Client – which is free and very handy for this kind of thing.

  1. Log in to Disqus.com with a user that has access to the forum you want API keys for.
  2. Visit http://disqus.com/api/get_my_key/ with your browser to get the user API key (since it uses the active session to give you the API key).
  3. Call http://disqus.com/api/get_forum_list/?user_api_key=_USER_API_KEY_ to get the list of available forums, since you’ll need the numeric identifier for the forum. Look through the JSON response and find the id number for the forum you want an API key for (for my blog, it’s "id": "180233").
  4. Call http://disqus.com/api/get_forum_api_key/?user_api_key=_USER_API_KEY_&forum_id=180233 where 180233 is your forum id. The message field in the JSON response should contain your API key.

That’s quite a bit of manual work, but it does not seem like there’s currently any better method. If you happen to find one, please let me know.