Logo

Pragmatic Performance Tuning in Drupal

2 months ago | Thomas Newton: Newtonlabs

One of the Drupal based applications we manage received user feedback that performance was not meeting their expectations.  Even though my background is primarily Rails and Java, I was excited to take up this challenge.  Before I could really get down to business I needed a couple of tools for the job. This post is about creating those performance tools inspired by the following tips from The Pragmatic Programmer. 

  • Tip 20 Keep Knowledge in Plain Text
  • Tip 21 Use the Power of Command Shells
  • Tip 28 Learn a Text Manipulation Language  

Tip 20 Keep Knowledge in Plain Text

 There is a great module that a lot of Drupal developers use http://drupal.org/project/devel The module is great for development in that it shows the developer how many queries fire, how long they took, and if they are repeated queries.  Drupal seems to be very chatty with the database and keeping this under control can really help with performance.  

For this particular problem I was coming in after the development was complete and hoping to fine tune or refactor the existing code base. I wanted a logfile that could show me the queries fired per page, what the most expensive queries were, the functions that called those queries, and the ability to reproduce them in other environments (development, QA, and Production).  Patrick Shaughnessy (http://patshaughnessy.net/) gave me some quick pointers on the Drupal framework allowing me to build an easy logfile strategy.

Step 1: Build a Switch for logging

I wanted to leave the logging code in place, but provide a mechanism to quickly turn the logging on and off.  I did not want to store this in the variable table as I suspected most of the performance issues were database related.  I resolved to create a simple global in global_settings.php

# Turn on performance tuning in the logfile global $performance_logging; $performance_logging  = 0;

Step 2: Create a debug function

I put mine in index.php, but there is probably a better location for common methods:

function debug($s) {   global $performance_logging;   if ($performance_logging) {     $file = fopen("/tmp/performance.log", "a");     fwrite($file, $s . "\n");     fclose($file);   } }

Step 3: Place instrumentation in the code to log performance

First lets log how long it takes to build the entire page.  This can be done quickly by placing the following code in the index.php before the bootstrap include.

$then = microtime(true);  # Mark the beginning of the page require_once './includes/bootstrap.inc'; drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

Then again at the bottom of the page, with a call to our debug function to record the time.  

$elapsed_time = microtime(true) - $then; if ($performance_logging) {   debug(sprintf("%f ENDPAGE %s",$elapsed_time,$_GET['q'])); }

When this is shown in the logfile it will appear like this:

0.358291 ENDPAGE node/3

The time in seconds, a tag to say the page has ended, and the parameter that was used to build the page.

Step 4: Record the database performance metrics

 In the database.mysql.inc file all database transactions will flow through this method.  We can log how long these database calls took with a couple of pieces of code in the _db_query method around the call to mysql_query.

Add the following code to the _db_query function:

if ($performance_logging) { $then = microtime(true); }   $result = mysql_query($query, $active_db);   if ($performance_logging) {     $elapsed_time = microtime(true) - $then;     $bt = debug_backtrace();     debug(sprintf("%f -- %s -- %s",$elapsed_time, $bt[2]['function'], $query));   } }

This will log the time in seconds, a tag to indicate the function called, and the query that was executed to the database.  

Step 5: The Results

Now when a page is rendered we will get a log file like below:

0.000069 -- community_is_private -- select group_nid from og_ancestry where nid = 13430 0.000076 -- community_is_private -- select selective from og where nid = 13429 0.000599 -- taxonomy_get_vocabularies -- SELECT v.*, n.type FROM vocabulary v LEFT JOIN vocabulary_node_types n ON v.vid = n.vid ORDER BY v.weight, v.name ...  A Lot more lines ... 0.000144 -- get_homepage -- SELECT * FROM og_home WHERE gid = 3 0.000448 -- statistics_exit -- UPDATE node_counter SET daycount = daycount + 1, totalcount = totalcount + 1, timestamp = 1275527799 WHERE nid = 3 0.000246 -- statistics_exit -- INSERT INTO accesslog (title, path, url, hostname, uid, sid, timer, timestamp) values('Spaces', 'node/3', '', '127.0.0.1', 14149, '983da4b0314d4dfca4c919acfcbbd33b', 346, 1275527799) 0.358291 ENDPAGE node/3

Column Breakdown:

  1. The amount of time the database query took to execute
  2. The method the database query was executed from
  3. The actual SQL statement

Tip 21 Use the Power of Command Shells

This simple logfile is incredibly powerful for instrumenting the application to perform performance tuning with just a handful of Unix commands

How do I look at just the queries on one page?

  • Simply truncate the logfile, view the page, and re-open the logfile
  • The logfile can be truncated by the Unix command

cp /dev/null /tmp/performance.log

What are the most expensive queries on a page?

  • Truncate the logfile (above)
  • Look at the page in question
  • Execute the following Unix command to sort the file numerically, in reverse order, and look at the top 5

sort -rn /tmp/performance.log | head -5  

How many queries were executed to build the page?

  • Truncate the logfile (above)
  • Look at the page in question
  • Execute the following Unix command to see the number of lines in the file (which is the number of queires on that page)

wc /tmp/performance.log

Tip 28 Learn a Text Manipulation Language 

Now that there is a powerful flow of information, here are a couple more tools to figure out what is going on from a performance perspective:

Summary of application performance, right from the server!

- Use perl as a text parsing language from the command line and you can ascertain a lot of information.  Perl is almost always installed on the server and is a powerful text manipulation tool. Here is one such use case:

perl -F"--" -lane 'if (/ENDPAGE/) {print "$F[1] $t";$t=0;next;} $t+=$F[0]'  mckwiki_debug.txt

This will return something like this:

0.227044 ENDPAGE node/86 0.020057 2.461857 ENDPAGE node/3 2.128639 0.354404 ENDPAGE node/3 0.1329 0.295097 ENDPAGE og 0.03797

  • The first column is the total time the page took to build (including php time and mysql query time)
  • The second column can be ignored, just a place holder to trigger that a page has ended
  • The third column is the url that was called
  • The fourth column is the total time in database calls

Breaking down the command:

Perl and Ruby both provide the capability to execute scripts directly from the command line.  This can be quite powerful when trying to quickly analyze a problem.  Of course if a repeatable solution is found, then the inline script should be converted into a file based script for reusing later.  This is how this script above works.

perl    # The script interpreter -F"--"  # this option indicates that for every line of text seen, split the line into an array delimited by "--" -lane       # -e = Execute the next string passed as a script, for example perl -e 'print "hello world"'     # -n = Automatically loop through each line of a file (last argument)     # -a = Automatically parse each line of text into an array (using the -F argument above as a delim)     # -l = Automatically put a "\n" at the end of each line of output

I find that I am always using this four options, so I just remember -lane (nice easy phrase)

Create SQL performance scripts from the logfile:

Another handy tool is a list of all the SQL statements in an executable file to help identify performance in other environments:

I normally truncate the log file first before generating this script.  This way I know the page I view has just the SQL statements for that page when I execute them on another server.

perl -F"--" -lane 'print $F[2] if $F[2]'  mckwiki_debug.txt > performance_debug.sql

This will create a file with all the SQL statements for that page.  

 

In the next article we will explore how these tools can be used to fine tune a Drupal application for performance.

 

 

 

 

Permalink | Leave a comment  »

Agile in a Distributed Environment

6 months ago | Thomas Newton: Newtonlabs

Lately I've been thinking about the facets of Agile that are important to me on a distributed team.

Rich Communication

A major hurdle in working with a distributed team is maintaining as rich communication channel as possible.  A couple of points to consider:

  • Your channels must extend past the "standard" agile feedback loops.  Most Agile strategies have several pre-designed feedback mechanisms to keep the process and iteration cycle as tight as possible (i.e. daily standups, sprint reviews, retrospectives, and various other ceremony).  However, when working in a distributed environment your team can miss everything outside of these checkpoints.  Find ways to talk shop, design, and brainstorm.  
    • Occasionally, after the standup... I'll kick the stakeholders off the call and just share ideas and thoughts on the upcoming stories.   Relax the atmosphere and make sure everyone has an opportunity to give input.  This can be especially effective when the team is working in disparate time zones and this is the only opportunity to communicate.  
  • The atmosphere is important, even over the phone.  I believe as Cockburn does, the better the amicability the richer the communication.
  • Remember how much communication you lose by not being face-to-face, and overcome it.  In one of our teams we installed a cheap webcam and just leave it running between two of the distributed offices using Google Chat.  It goes down all the time, we just kick it and fire it back up.  Despite that, it has been extremely valuable in seeing day to day conversations, being part of the team, and seeing the lights go out in the other office as their day ends.  A great summary by Scott Ambler:

 

Think globally by going local

All development should be local to the workstation. Incredible efficiencies can be gained by not worrying about network bandwidth, development servers, outages during "off hours" and so on.  At any time in the development cycle the developer should be able to unplug their workstation and continue uninterrupted.

Conform to best practices, not "best tools"

We are craftsman in the trade, the tools we use are very personal and a certain level of freedom should be maintained.  Of course still share ideas and tricks to learn and grow.  If the code integrates at the CI server, the tests pass, the code is clean, and productivity maintained.. Who really cares how it got there.

Continuous Integration

This is a must in a distributed team.  All code submissions should be integrated and deployed with every submission.  This allows the developer to work on their local copy and still integrate regularly with the product.  It also provides a direct feedback loop to the stakeholder.  This is a no-brain win.

Virtual Walls for Story Boarding

Story boarding is a fantastic tool for release planning and iterative development.  Try to emulate that "face-to-face whiteboard" experience (above) as much as possible.  There are several free tools that can facilitate this, the tool is irrelevant.  What is important is a real time representation of the fluidity of stories that the developers have access too.  I discourage spreadsheets as this is usually centrally managed, hard to maintain, and restricts the developers access.  This is the developer process, let them have access to it, take ownership, pride, and understand impact.

 

 

Permalink | Leave a comment  »