One of the Drupal based applications we manage received user feedback that performance was not meeting their expectations. Even though my background is primarily Rails and Java, I was excited to take up this challenge. Before I could really get down to business I needed a couple of tools for the job. This post is about creating those performance tools inspired by the following tips from The Pragmatic Programmer.
Tip 20 Keep Knowledge in Plain Text
There is a great module that a lot of Drupal developers use http://drupal.org/project/devel The module is great for development in that it shows the developer how many queries fire, how long they took, and if they are repeated queries. Drupal seems to be very chatty with the database and keeping this under control can really help with performance.
For this particular problem I was coming in after the development was complete and hoping to fine tune or refactor the existing code base. I wanted a logfile that could show me the queries fired per page, what the most expensive queries were, the functions that called those queries, and the ability to reproduce them in other environments (development, QA, and Production). Patrick Shaughnessy (http://patshaughnessy.net/) gave me some quick pointers on the Drupal framework allowing me to build an easy logfile strategy.
Step 1: Build a Switch for logging
I wanted to leave the logging code in place, but provide a mechanism to quickly turn the logging on and off. I did not want to store this in the variable table as I suspected most of the performance issues were database related. I resolved to create a simple global in global_settings.php
# Turn on performance tuning in the logfile
global $performance_logging;
$performance_logging = 0;
Step 2: Create a debug function
I put mine in index.php, but there is probably a better location for common methods:
function debug($s) {
global $performance_logging;
if ($performance_logging) {
$file = fopen("/tmp/performance.log", "a");
fwrite($file, $s . "\n");
fclose($file);
}
}
Step 3: Place instrumentation in the code to log performance
First lets log how long it takes to build the entire page. This can be done quickly by placing the following code in the index.php before the bootstrap include.
$then = microtime(true); # Mark the beginning of the page
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
Then again at the bottom of the page, with a call to our debug function to record the time.
$elapsed_time = microtime(true) - $then;
if ($performance_logging) {
debug(sprintf("%f ENDPAGE %s",$elapsed_time,$_GET['q']));
}
When this is shown in the logfile it will appear like this:
0.358291 ENDPAGE node/3
The time in seconds, a tag to say the page has ended, and the parameter that was used to build the page.
Step 4: Record the database performance metrics
In the database.mysql.inc file all database transactions will flow through this method. We can log how long these database calls took with a couple of pieces of code in the _db_query method around the call to mysql_query.
Add the following code to the _db_query function:
if ($performance_logging) { $then = microtime(true); }
$result = mysql_query($query, $active_db);
if ($performance_logging) {
$elapsed_time = microtime(true) - $then;
$bt = debug_backtrace();
debug(sprintf("%f -- %s -- %s",$elapsed_time, $bt[2]['function'], $query));
}
}
This will log the time in seconds, a tag to indicate the function called, and the query that was executed to the database.
Step 5: The Results
Now when a page is rendered we will get a log file like below:
0.000069 -- community_is_private -- select group_nid from og_ancestry where nid = 13430
0.000076 -- community_is_private -- select selective from og where nid = 13429
0.000599 -- taxonomy_get_vocabularies -- SELECT v.*, n.type FROM vocabulary v LEFT JOIN vocabulary_node_types n ON v.vid = n.vid ORDER BY v.weight, v.name
... A Lot more lines ...
0.000144 -- get_homepage -- SELECT * FROM og_home WHERE gid = 3
0.000448 -- statistics_exit -- UPDATE node_counter SET daycount = daycount + 1, totalcount = totalcount + 1, timestamp = 1275527799 WHERE nid = 3
0.000246 -- statistics_exit -- INSERT INTO accesslog (title, path, url, hostname, uid, sid, timer, timestamp) values('Spaces', 'node/3', '', '127.0.0.1', 14149, '983da4b0314d4dfca4c919acfcbbd33b', 346, 1275527799)
0.358291 ENDPAGE node/3
Column Breakdown:
Tip 21 Use the Power of Command Shells
This simple logfile is incredibly powerful for instrumenting the application to perform performance tuning with just a handful of Unix commands
How do I look at just the queries on one page?
cp /dev/null /tmp/performance.log
What are the most expensive queries on a page?
sort -rn /tmp/performance.log | head -5
How many queries were executed to build the page?
wc /tmp/performance.log
Tip 28 Learn a Text Manipulation Language
Now that there is a powerful flow of information, here are a couple more tools to figure out what is going on from a performance perspective:
Summary of application performance, right from the server!
- Use perl as a text parsing language from the command line and you can ascertain a lot of information. Perl is almost always installed on the server and is a powerful text manipulation tool. Here is one such use case:
perl -F"--" -lane 'if (/ENDPAGE/) {print "$F[1] $t";$t=0;next;} $t+=$F[0]' mckwiki_debug.txt
This will return something like this:
0.227044 ENDPAGE node/86 0.020057
2.461857 ENDPAGE node/3 2.128639
0.354404 ENDPAGE node/3 0.1329
0.295097 ENDPAGE og 0.03797
Breaking down the command:
Perl and Ruby both provide the capability to execute scripts directly from the command line. This can be quite powerful when trying to quickly analyze a problem. Of course if a repeatable solution is found, then the inline script should be converted into a file based script for reusing later. This is how this script above works.
perl # The script interpreter
-F"--" # this option indicates that for every line of text seen, split the line into an array delimited by "--"
-lane
# -e = Execute the next string passed as a script, for example perl -e 'print "hello world"'
# -n = Automatically loop through each line of a file (last argument)
# -a = Automatically parse each line of text into an array (using the -F argument above as a delim)
# -l = Automatically put a "\n" at the end of each line of output
I find that I am always using this four options, so I just remember -lane (nice easy phrase)
Create SQL performance scripts from the logfile:
Another handy tool is a list of all the SQL statements in an executable file to help identify performance in other environments:
I normally truncate the log file first before generating this script. This way I know the page I view has just the SQL statements for that page when I execute them on another server.
perl -F"--" -lane 'print $F[2] if $F[2]' mckwiki_debug.txt > performance_debug.sql
This will create a file with all the SQL statements for that page.
In the next article we will explore how these tools can be used to fine tune a Drupal application for performance.
Lately I've been thinking about the facets of Agile that are important to me on a distributed team.
Rich Communication
A major hurdle in working with a distributed team is maintaining as rich communication channel as possible. A couple of points to consider:
Think globally by going local
All development should be local to the workstation. Incredible efficiencies can be gained by not worrying about network bandwidth, development servers, outages during "off hours" and so on. At any time in the development cycle the developer should be able to unplug their workstation and continue uninterrupted.
Conform to best practices, not "best tools"
We are craftsman in the trade, the tools we use are very personal and a certain level of freedom should be maintained. Of course still share ideas and tricks to learn and grow. If the code integrates at the CI server, the tests pass, the code is clean, and productivity maintained.. Who really cares how it got there.
Continuous Integration
This is a must in a distributed team. All code submissions should be integrated and deployed with every submission. This allows the developer to work on their local copy and still integrate regularly with the product. It also provides a direct feedback loop to the stakeholder. This is a no-brain win.
Virtual Walls for Story Boarding
Story boarding is a fantastic tool for release planning and iterative development. Try to emulate that "face-to-face whiteboard" experience (above) as much as possible. There are several free tools that can facilitate this, the tool is irrelevant. What is important is a real time representation of the fluidity of stories that the developers have access too. I discourage spreadsheets as this is usually centrally managed, hard to maintain, and restricts the developers access. This is the developer process, let them have access to it, take ownership, pride, and understand impact.