Alex Elliott

The internet home of a prospective software engineer

This is my personal blog where I discuss projects that I'm currently working on, work I've recently completed, or write about any topic which has caught my interest in the world of Computing from my studies or from my personal research.

Latest Articles

Personal Site Work

November 26th, 2008

So, as predicted there’s already been a pretty big gap, but that’s not to say I’ve been neglecting the blog.  I’ve had a new bespoke design drafted up by a friend I know through a web development community, which I hope to get properly skinned for WP and set up as a personal site on alex-elliott.co.uk soon.  The designer is Adam McPeake, who is linked in the blogroll under wized (wized.net is his portfolio).

Previews

So, while I get to work converting the design into a WP theme and writing a CMS for my personal site, here’s a preview of what it should eventually look like (the main site, and the blog):

Main Site Preview Blog Preview

Hope to have some more updates about this soon, or maybe you’ll be reading this in the lovely new theme. :)

UPDATE: as you may have noticed the new design has been skinned into WP (and I rediscovered precisely why I hate that theme system). Hopefully things still work, but if they don’t please do comment and let me know.

NOTE: yes, the rest of the site isn’t done yet, what’s up is an example of what it should look like when done (at least the index page), I’ll start on getting the main top links at least drafted into markup now – and I should be able to finish the about and contact pages completely within a few days.

Exciting stuff. :-)

Concurrency in PHP

November 13th, 2008

One of the problems you come across when writing real-time applications in PHP is that it is in most cases a linear language, performing the tasks set before it one at a time and doesn’t contain much in the way of tools to help set up concurrent tracks of evaluation.

As far as I’m aware PHP doesn’t contain anything at all for controlling threads, however it does include some process control functions which allows us to use a multi-process model for concurrent programming.

Introducing Process Control

The functions we need come via the Process Control module in PHP.  This is not included by default in the PHP Apache module (because it’s more complicated in such cases), if you want to use it for a website you will need to use PHP as a CGI module or compile mod_php with –enable-pcntl, in this case I’ll be testing using the PHP CLI binary, in which it’s included by default.

The key function that this module provides for us is pcntl_fork().  This works much like the fork you may have come across in other languages, it creates a clone of the current process which then computes a separate code path to the parent that called it.  The important thing of course being that the clone (or child) process does its computation while the parent carries on with whatever it was doing.  This is key to parallelising our PHP applications, and allows us to create responsive applications which can still do lengthy tasks.

Process Control in Use

So where might this be useful?  A trivial example would be when you want to perform a task that takes a number of seconds, and you want the user to receive output from the program while it’s being computed.  A simple example of this could be the code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<?php
 
// this is a place-holder for a hypothetical function
// which takes a long time to compute, let's say the
// eventual answer would be 5...
function foo()
{
   sleep(10);
   return 5;
}
 
file_put_contents('tmp',0); // we'll also assume we know the answer isn't 0.
 
// Here's the fork, the process ID will be stored in $pid for the main
// program, and it'll be 0 for the child process.
$pid = pcntl_fork();
if($pid == -1)
   die('Fork failed'); // Our fork failed, thus our program did.
elseif($pid == 0)
{
   // here's our forked process
   file_put_contents('tmp',foo());
   echo "Done\n";
   exit;
}
 
// In the meantime we'll keep our parent process looping.
$timestamp = time();
while(file_get_contents('tmp') == 0)
{
   if($timestamp != time())    // once a second...
      echo 'Calculating...',"\n"; // give some output
   $timestamp = time();
}
 
echo file_get_contents('tmp')."\n"; // output our answer from the fork.
 
?>

As you see we have simply made a test-case here, there’s a function with a sleep(10); which takes the place of any long-running function, and while it is running we inform the user that the task is underway, once a second until it is complete.  It’s quite a trivial example, but it shows that both the echo statements and the sleep were functioning in parallel when viewing the output, which is predictably as follows:

bash-3.1$ php -f test.php
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Done
5

Another example might be if you were performing a repetitive task on many targets, such as all the files in a directory.  You could loop across the directory forking off a new process which would handle each file, then back in the main loop you simply wait for all of the processes to finish and then wrap up any loose ends you might have.

Conclusion

So there you go, process control.  A useful tool if you need to speed up a parallelisable program/algorithm.  Of course, in most cases speed isn’t too necessary in PHP applications, so the practical applications of process control aren’t very numerous.  In cases where speed is not important, it is usually better to keep the application simple, since it means it is more likely to be correct.

However, if you are writing something real-time, but don’t want to bog down your main application loop – making your application unresponsive, process control comes into its own and will save the day.

Concurrency in zBot

As I mentioned in the previous post, concurrency is something I’ve been meaning to include in zbot for a while now. zbot is a real-time PHP application, and as an IRC bot, a response time of over a second or two is starting to seem sluggish. Sometimes however some responses can’t be given in this length of time because they rely on an external site, and so the bot must wait for a response before it can reply. In such cases I would like the main loop to be free to continue watching for more easily serviced queries. So that it could reply to those first while it is waiting.

It also allows me to safely parcel off largish tasks without having to worry that the bot will ping out, as the core will continue looking for PING packets. For example, I would like to implement a news plugin which would download five or more RSS feeds and then parse them to check for new news stories. This would be ideal for process control, as I could fork off another process which would handle the downloads and parsing while the main bot continues to function.

For this purpose I have written a taskScheduler class to which I can send jobs which are then evaluated in a forked process. The class keeps track of all scheduled jobs, and performs a callback when the job is completed. This might prove to be a useful separate class, so I’ll likely both include it in zbot, and separately on my site with some documentation when it’s finished.

A New zBot

November 13th, 2008

I did say there was news of a project coming up, so here it is.  Version 2.0 of my PHP IRC bot, zbot (on irc.zymic.com).  It’s been out of action for a while, since it was hosted on a machine I don’t have access to at the moment (no SSH, bleh).  But there’s a (better) alternative now, so I can start work on getting it back.

Identifying the problem(s)

Building a new 2.0 version of my IRC bot has been something I’ve wanted to do for a while now, and there are a few main reasons why I’ve decided that it should be a fresh start rather than an update.  These are mostly (fairly) fundamental feature issues, which should be implemented quite a long way down the source tree – and so modifying the source to patch them in would be a tedious and lengthy process which might introduce a fair few bugs into the codebase.

The main things I want to improve in the new version are as follows:

  • Generally cleaner code (I might choose to release this source publicly, and I want it to be nice).
  • Improve the module system to allow the modules to intercommunicate, by doing this I can abstract even some of the “core” functionality into modules, which will allow for patches at a later date without a restart.
  • Ground-up design for a bot rehash feature, in which it reloads all settings from configuration files.
  • Simpler text-file configuration files which will be parsed and read-in.
  • Work out a system for concurrency, perhaps spawning child PHP processes to handle any tasks which might take a long time (such as fetching files online), this would free up the rest of the core loop to handle other input while these tasks are performed.
  • Additional bot features, such as automatic tracking of users in channels, dynamic topic changes (so for example, we could specify a field which would be changed by an external source, allowing us to do things like include the current stable version number of PHP in #php).
  • Web interfaces for some bot data, like current factoids, quotes, and possibly a module repository that people could submit to.

Moving forward

I hope to look at building a new version soon, sorting some groundwork for things at the moment, I’ll try to keep this updated as work goes on.

Of course, my good buddy Ed (Bread) will probably assist as he did in the last version (he provided the initial module system and a few plugins), and maybe this time we’ll actually write some documentation so people can write their own modules easily! :p

Since I will be hosting this on my shiny new VPS, we have more freedom (having a web interface was not possible before), and it means we can have one central source tree (I’m thinking stable/ and testing/ with a script to update stable when testing is shown not to break things ;) ).