Alex Elliott

The internet home of a prospective software engineer

This is my personal blog where I discuss projects that I'm currently working on, work I've recently completed, or write about any topic which has caught my interest in the world of Computing from my studies or from my personal research.

Latest Articles

zBot No More

December 13th, 2008

The title here is more sensational than it needs to be, I’m not discontinuing the zbot 2.0 project – rather, I’ve just decided that if I want to release it publicly, then I would like a more generic parent name for the software.  The instance of the bot in irc.zymic.com will likely retain the name. :)

So, what’s the new name for the software?  Well, that’s possibly still in flux, but for the moment I’ve decided I might go with polymer.  A slightly nerdy nod to the fact I want to make this release inherently extensible, as much as is needed.

Polymerisation

Of course, with a name like that I really should elaborate on just why it’s going to be more modular and easily extensible than the previous bot.

There was nothing really wrong with the implementation before, and much of it has been kept constant in the new implementation.  The most notable changes are in the format used to write modules, which has been simplified somewhat – and the fact that module interaction will be made possible allowing modules to reuse functionality included in a module that is already loaded into the bot.

My current WIP draft for module layout is this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
<?php
 
// module description up here?
//
// that would make sense.
 
class mod_example
{
   // any local variables required
   private $localvar;
   private $othervar;
 
   /// Core and required methods:
   // init, performs any initialisation required
   public function init()
   {
      // initialisation stuff... for example:
      $this->loadConfig(); // <-- (re)loads configuration in {confdir}/{name}.conf
 
      // the triggers and hooks
      registerTrigger('ping','respond');
      registerHook('passive');
   }
 
   // a rehash method which handles a complete config reload
   public function rehash()
   {
      // rejig internals in case our config has changed
      $this->loadConfig(); // <-- like this again perhaps?
   }
 
   /// about() and help() will probably make an appearance though since some are
   /// internal they need not include them.
 
   /// Trigger/Hook implementation
   // Here's a trigger, triggered of course by a !ping command.
   public function respond()
   {
      // note the lack of any arguments, instead the information will automatically
      // be made available through methods/variables contained within the base
      // class.  This simplifies format, and allows us to make triggers and hooks
      // constant.
      $target = $this->state->target;
      $nick   = $this->state->caller;
      // module intercommunication:
      if( is_object($_irc) )
         $_irc->msg($target,$nick.', pong');
   }
 
   // And here's a hook, called on every new packet
   public function passive()
   {
      // do stuff
   }
 
   /// And as always, you can declare internal functions for personal use.
   private function helper($arg1,$arg2)
   {
      // do something helpful
   }
}
 
?>

This format may undergo changes as I work on it, but it’s likely to look something like the above when I’m done. :)

Any suggestions/comments are very welcome, since this is going to be the interface anyone who wants to write modules will be using, so it’s important that it’s suitably intuitive.

Setting an Agenda

November 30th, 2008

If you’ve checked up on the site since my last blog entry, you’ve probably noticed I have indeed started on the pages for the rest of the site.  The about and contact pages are finished, and the footer now automatically displays the four most recent blog entries.  Now comes the main bit of the work, writing a CMS to manage my current and completed projects.

The Main Site

So, what exactly is going to be one the main site?  If you’ve seen the front page you can probably mostly guess.  There will be project pages for each project I’m currently working on or have completed – and there will be one of each set as “featured” works displayed on the front page and in the blog footer (the other spot will be filled by the most recent project).

The project pages themselves will be written descriptions of the project: what it’s about, what it’s aiming to produce, what I want to learn from it, what’s being used to implement it.  It will also have a section for relevant blog articles, which will be automatically fetched from here by selecting all the articles with a given tag (so for the zbot2 project, I will look for a “zbot” tag on blog articles).

This should provide a good source page to refer people to to answer any questions about the project, and can serve as a home for any projects I decide I would like to release publicly.

What About zbot?

I mentioned I was hoping to start zbot v2.0 soon, I will hopefully be starting that almost directly after finishing the main site on here.  In the meantime it’s time to start some design for the structure of the program and its source files.  When the project does get started I’ll make note here, and hopefully there’ll be a working core available before too long.

Other Work

This site and zbot aren’t the only things I’m doing however, there’s another project I would like to write which I have not really begun looking at yet, but which requires a fair bit of research before I can start.  I’ll probably look into a few of the topics I’ll need to write it while I’m working on the other projects (this site and zbot), and will hopefully write a few articles on them to help cement my understanding and to share what I’ve found out.

So, hopefully we’ll see the rest of the main site taking shape over the next week or two.  But if I don’t have time to blog for a little while, don’t think I’ve stopped working, hopefully it means quite the opposite, but we’ll have to see. ;)

Concurrency in PHP

November 13th, 2008

One of the problems you come across when writing real-time applications in PHP is that it is in most cases a linear language, performing the tasks set before it one at a time and doesn’t contain much in the way of tools to help set up concurrent tracks of evaluation.

As far as I’m aware PHP doesn’t contain anything at all for controlling threads, however it does include some process control functions which allows us to use a multi-process model for concurrent programming.

Introducing Process Control

The functions we need come via the Process Control module in PHP.  This is not included by default in the PHP Apache module (because it’s more complicated in such cases), if you want to use it for a website you will need to use PHP as a CGI module or compile mod_php with –enable-pcntl, in this case I’ll be testing using the PHP CLI binary, in which it’s included by default.

The key function that this module provides for us is pcntl_fork().  This works much like the fork you may have come across in other languages, it creates a clone of the current process which then computes a separate code path to the parent that called it.  The important thing of course being that the clone (or child) process does its computation while the parent carries on with whatever it was doing.  This is key to parallelising our PHP applications, and allows us to create responsive applications which can still do lengthy tasks.

Process Control in Use

So where might this be useful?  A trivial example would be when you want to perform a task that takes a number of seconds, and you want the user to receive output from the program while it’s being computed.  A simple example of this could be the code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<?php
 
// this is a place-holder for a hypothetical function
// which takes a long time to compute, let's say the
// eventual answer would be 5...
function foo()
{
   sleep(10);
   return 5;
}
 
file_put_contents('tmp',0); // we'll also assume we know the answer isn't 0.
 
// Here's the fork, the process ID will be stored in $pid for the main
// program, and it'll be 0 for the child process.
$pid = pcntl_fork();
if($pid == -1)
   die('Fork failed'); // Our fork failed, thus our program did.
elseif($pid == 0)
{
   // here's our forked process
   file_put_contents('tmp',foo());
   echo "Done\n";
   exit;
}
 
// In the meantime we'll keep our parent process looping.
$timestamp = time();
while(file_get_contents('tmp') == 0)
{
   if($timestamp != time())    // once a second...
      echo 'Calculating...',"\n"; // give some output
   $timestamp = time();
}
 
echo file_get_contents('tmp')."\n"; // output our answer from the fork.
 
?>

As you see we have simply made a test-case here, there’s a function with a sleep(10); which takes the place of any long-running function, and while it is running we inform the user that the task is underway, once a second until it is complete.  It’s quite a trivial example, but it shows that both the echo statements and the sleep were functioning in parallel when viewing the output, which is predictably as follows:

bash-3.1$ php -f test.php
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Calculating...
Done
5

Another example might be if you were performing a repetitive task on many targets, such as all the files in a directory.  You could loop across the directory forking off a new process which would handle each file, then back in the main loop you simply wait for all of the processes to finish and then wrap up any loose ends you might have.

Conclusion

So there you go, process control.  A useful tool if you need to speed up a parallelisable program/algorithm.  Of course, in most cases speed isn’t too necessary in PHP applications, so the practical applications of process control aren’t very numerous.  In cases where speed is not important, it is usually better to keep the application simple, since it means it is more likely to be correct.

However, if you are writing something real-time, but don’t want to bog down your main application loop – making your application unresponsive, process control comes into its own and will save the day.

Concurrency in zBot

As I mentioned in the previous post, concurrency is something I’ve been meaning to include in zbot for a while now. zbot is a real-time PHP application, and as an IRC bot, a response time of over a second or two is starting to seem sluggish. Sometimes however some responses can’t be given in this length of time because they rely on an external site, and so the bot must wait for a response before it can reply. In such cases I would like the main loop to be free to continue watching for more easily serviced queries. So that it could reply to those first while it is waiting.

It also allows me to safely parcel off largish tasks without having to worry that the bot will ping out, as the core will continue looking for PING packets. For example, I would like to implement a news plugin which would download five or more RSS feeds and then parse them to check for new news stories. This would be ideal for process control, as I could fork off another process which would handle the downloads and parsing while the main bot continues to function.

For this purpose I have written a taskScheduler class to which I can send jobs which are then evaluated in a forked process. The class keeps track of all scheduled jobs, and performs a callback when the job is completed. This might prove to be a useful separate class, so I’ll likely both include it in zbot, and separately on my site with some documentation when it’s finished.

A New zBot

November 13th, 2008

I did say there was news of a project coming up, so here it is.  Version 2.0 of my PHP IRC bot, zbot (on irc.zymic.com).  It’s been out of action for a while, since it was hosted on a machine I don’t have access to at the moment (no SSH, bleh).  But there’s a (better) alternative now, so I can start work on getting it back.

Identifying the problem(s)

Building a new 2.0 version of my IRC bot has been something I’ve wanted to do for a while now, and there are a few main reasons why I’ve decided that it should be a fresh start rather than an update.  These are mostly (fairly) fundamental feature issues, which should be implemented quite a long way down the source tree – and so modifying the source to patch them in would be a tedious and lengthy process which might introduce a fair few bugs into the codebase.

The main things I want to improve in the new version are as follows:

  • Generally cleaner code (I might choose to release this source publicly, and I want it to be nice).
  • Improve the module system to allow the modules to intercommunicate, by doing this I can abstract even some of the “core” functionality into modules, which will allow for patches at a later date without a restart.
  • Ground-up design for a bot rehash feature, in which it reloads all settings from configuration files.
  • Simpler text-file configuration files which will be parsed and read-in.
  • Work out a system for concurrency, perhaps spawning child PHP processes to handle any tasks which might take a long time (such as fetching files online), this would free up the rest of the core loop to handle other input while these tasks are performed.
  • Additional bot features, such as automatic tracking of users in channels, dynamic topic changes (so for example, we could specify a field which would be changed by an external source, allowing us to do things like include the current stable version number of PHP in #php).
  • Web interfaces for some bot data, like current factoids, quotes, and possibly a module repository that people could submit to.

Moving forward

I hope to look at building a new version soon, sorting some groundwork for things at the moment, I’ll try to keep this updated as work goes on.

Of course, my good buddy Ed (Bread) will probably assist as he did in the last version (he provided the initial module system and a few plugins), and maybe this time we’ll actually write some documentation so people can write their own modules easily! :p

Since I will be hosting this on my shiny new VPS, we have more freedom (having a web interface was not possible before), and it means we can have one central source tree (I’m thinking stable/ and testing/ with a script to update stable when testing is shown not to break things ;) ).