Alex Elliott

The internet home of a prospective software engineer

This is my personal blog where I discuss projects that I'm currently working on, work I've recently completed, or write about any topic which has caught my interest in the world of Computing from my studies or from my personal research.

Latest Articles

Qt on Android with Necessitas

February 9th, 2012

One of the best things about the Qt toolkit is its portability, and that has only been improved by Project Lighthouse which provides a way to deploy Qt applications to all sorts of platforms. This has been levereged by an alpha port to Android of not only Qt – but also the Qt SDK (including Qt Creator integration) produced by BogDan Vatra.

Considering I have a couple of android devices around (phone and a tablet) I thought it was worth a look to see if this port (Necessitas) would allow me to run Expression Editor on the android platform. Starting with the basic set-up, I installed Necessitas with the Android SDK/NDK and Apache Ant and then opened the project file for Expression Editor in the Necessitas IDE. It adds a set of files required to run the application at this point, and it is necessary to edit android/res/values/libs.xml to note that Expression Editor links to QtXml as well as QtCore and QtGui.

I initially made an attempt to set up the existing CMake build system with Necessitas, but that proved to be quite vexing – so for the means of a proof-of-concept I reverted to qmake, and removed the optional components (PCRE, ICU, POSIX, C++11) from the build system. Since Necessitas has primarily been built around qmake this worked quite well and with only a few modifications to the source code (it didn’t like qMax(qreal, double) but a simple explicit static_cast was enough there) it built an .apk which then failed to deploy.

I hadn’t ensured that the android emulator had Ministro (which is used to provide the Qt libraries on Android – it is available from the Market) installed, after installing that via adb, it then deployed to an Android 3.1 based emulator successfully.

Expression Editor Running in Android Emulator

Expression Editor Running in Android Emulator

I’ve uploaded the produced .apk as a download on github and the source used is available under a separate branch, it does require Ministro to run. However, treat it with a lot of caution, I have not widely tested it, as it is mostly just a proof-of-concept, it does run correctly on my Galaxy Tab 10.1 but I can’t vouch for it elsewhere.

In any case, the fact that this was possible in under one day (and it runs very well considering I have made no real alterations to the source) really does show how portable Qt is today. It’s a very impressively complete and polished port for its early alpha stage and I am very thankful that people like BogDan Vatra are doing this work, and very thankful that Qt continues to be developed as one of (if not the) best ways of writing cross-platform applications available today.

New Lexing Parsers and What They Mean for Expression Editor

February 2nd, 2012

Back in September (prior to some uni work that rather soaked up my free time) I pushed a fairly major restructuring of Expression Editor to the repository. Functionally it has meant a bit of a step back, but the new structure does mean some significant improvements to how it works now, and provides scope for some things which would not have been possible before. I thought that since this has been neglected for a while, I might as well give some details on it while I still have a window of opportunity.

What has changed?

There is one major difference in how it operates now compared to how it worked before. Prior to this the parsing of regular expressions was rather ad-hoc, it was a process that grew as I built in support for more different syntax elements. The result of that was in the long term it would get increasingly difficult to ensure things were being parsed correctly, it also did not react well to being passed an invalid expression – it would tend to just fail and so the display did not update while the expression was invalid.

To improve on that ad-hoc approach, I replaced it with a single consistent method – a lexing parser. The system defines a range of tokens (over 100 different ones) from simple literals (T_LITERAL), through common syntax elements like ^ (T_STARTING_POSITION), through to less commonly used syntax elements like “(?<!” (which is of course T_NEGATIVE_LOOKBEHIND_ASSERTION_OPEN). The tokens available span across regular expression formats and provide a unified representation within Expression Editor to work with.

Of course there need to be regular expression backend specific parsers which convert their respective formats into a sequence of unified tokens. This is handled via a polymorphic set of parser classes based on the Parser base class (parser.cpp/parser.hpp), the general logic of the parser exists in the base class. It uses a map of tokens in the regular expression format onto regular expressions describing that syntax to find the longest match it can (in the case where some matches are of the same length, the first match is used). The result is then passed to handleToken() which handles the logic that should apply whenever a token of that type is found (for example, when a T_GROUPING_OPEN is found, it should continue to consume tokens until a T_GROUPING_CLOSE is found, if it reaches the end of the sequence without finding one, that T_GROUPING_OPEN should be reassigned the type T_ERROR as it is not balanced).

This structure – as mentioned in the last example given – is capable of handling invalid syntax correctly, and it’s fairly simple to implement new parsing backends as the logic is very consistent. The shared format is also very useful as it means that any new backends which use a different format will be easier to integrate.

What impact does this change have?

Well, initially it means a lot of things are no longer working, that’s unfortunate but on balance I think something that had to be done. At the time of writing two of the three testing widgets are no longer functional, the save/load/common/recent files is no longer in place and the visual editing which was there is not any more.

Not everything is bad news though, there are already some new features made easier by this system. The visualisation is now capable of updating even while there are errors in the expression. It will simply mark them as T_ERROR via the parser, and they will appear as red blocks until they are fixed – this means you can now spot errors via the visualisation. The amount of configuration possible has also increased, the settings dialog now provides a set of options for how the visualisation is presented and the aim is to keep adding further options as visual elements are added to the system. Allowing for the appearance to be tailored entirely to your specifications.

There are also more possibilities for the future than there were previously, by converting the regular expression to a shared format it opens up scope for regular expression optimisers and translators in the future. It would be nice if you could take an expression you had already written for PCRE and have the application convert it into say POSIX ERE for a system where PCRE is not supported or available. That would be a lot easier with this unified token sequence, as it would simply be a case of translating in the opposite direction to the original parser. Some elements may not be possible due to limitations of the format, but nevertheless it would be a useful feature to have.

What work is going on at the moment?

Not as much as I would like, really. I’ve got other stuff on via university – but I do want to try and bring back the features that have gone missing since the restructuring, and then I can get back to bringing in new functionality which would make this a better piece of software than before the restructuring. In the meantime the previous application is tagged in the repository, so the functionality is still available. Hopefully though it won’t need to be around too much longer.

Oh, a final passing point – another thing that has changed since the last time I wrote is that the CMake build system is now capable of producing an NSIS executable installer for Expression Editor, there is one available via GitHub which contains the new build of Expression Editor, and I’ll try to keep it as up to date as possible. I’d be interested to talk with people who’ve deployed Qt/C++/CMake to OSX before to get a suitable build system working there as well. It shouldn’t be too hard to do, since CPack does have package targets for various OSX installers it’s just not something I’ve done before.

Back to Work

July 29th, 2010

Well, been a while since the last blog post I’ve written, and a similar stretch of time since I really got much work done on ExpressionEditor. But as you might have guessed since the blog is winding back into life, so to is my work on EE.

Changes So Far

I’m still working my way slowly through the todo list for EE (updated now on the wiki.github home page for EE), major differences that I’ve already managed to cross off include adding support for the ICU regular expression library, and migrating the build process over to CMake (which is still partly ongoing, the more people build it on as many platforms as possible the better – please do report any issues, in IRC would be favourite as I can work through the issues directly).

ICU (International Components for Unicode) supports a regular expression engine that seems to be a popular choice particularly for Mac-based programming, and appears to be fairly full-featured.  I hope that ICU support will be useful to people, and that it gets good use.

As to the migration to CMake, this is for several reasons: it should make it easier to distribute at the moment it supports a very basic “make install” target, and I will be expanding that to bring the common files into /etc (and at some point I should improve the content of those common files, but that’s something I can do later) and also expanding that to take the README, CREDITS, and similar files into documentation.

It should also now correctly auto-detect which of the optional dependencies (all of the regular expression libraries bar that included in Qt4.6+) are present on the system, and build a version of EE which supports the ones which are available.  This hopefully will make it less of a pain for those who don’t have PCRE or ICU (or in the case of Win32 POSIX’s regex.h).

And talking of making it easier for Win32, after a fairly long period where no EE was compiled on Win32 it was recently built, at the least proving that the code is still very happy to compile there when it’s not missing optional deps – which is very good news to get. :-)

Looking Forward

Of course, there’s still quite a lot to do to reach the plans I’ve set out, and since I’m working full-time I don’t have as much time as I’d like to work on EE (which I hope you’ll agree is an interesting and quite fun little tool, and it’s just as nice to work on :-) ).  But I expect to make some progress nonetheless, and if I do, I’ll keep all one maybe two people who have this in their RSS readers up to speed. ;-)

Expression Editor Update (2)

January 24th, 2010

Since I’ve had some more time to work on Expression Editor recently I thought it was about time I wrote another update for the progress of the project, and some related news that affects it.

Expression Editor on Mac OSX

A recent screenshot of Expression Editor on Mac OSX

From Last Time…

In the previous post I noted a few areas in progress and some that I wanted to look at in the future.  So to catch up there, Drag&Drop is generally a bit more reliable and produces slightly neater results but is otherwise unchanged so far, and the new testing widget is still waiting.  A significant change has been made in the area of supported regular expression formats however.

The application now has backends for Qt4, PCRE and POSIX ERE formats (though the visualisation could still mess up some PCRE/POSIX elements, let me know if anything breaks).  You can select the format you wish to work in from the menu bar, it will be displayed in the bottom right of the screen so you know which mode it is currently in, and the save format has been slightly extended to save your preference for each particular expression.

The default mode has also been changed to PCRE, since it is probably the most powerful backend available.  Another minor UI change has been included which is an expression status indicator to the right of the text input.  A green tick while valid, and a red exclamation mark when invalid, in addition if you mouseover the invalid indicator, the tooltip is the error returned from the active regular expression backend.

In Related News

As you probably saw above the screenshot used is from Mac OSX.  In order to improve my capacity to test Expression Editor I’ve gotten myself a Mac Mini as well as my Slackware Linux laptop.  Set up with Synergy+ this means I can simultaneously develop the application in Linux and test it in OSX.  One behavioural difference between the two operating systems has already been resolved, so hopefully the application should start behaving much more reliably on OSX as well as Linux from now on.

Expression Editor Update

December 24th, 2009

A fair bit of progress has been made since my last blog entry so I thought I’d note a few things that have landed in the repository and a few things that I intend to add at a later date.

Drag and Drop

Initial support for drag and drop editing has been added.  You can now re-order the elements of the expression by dragging an element in the visualisation to one of the valid drop zones (which are automatically highlighted as you can see in this screenshot).  With this in place it becomes significantly easier to add the other bits of drag/drop editing I want the editor to support.  Eventually as well as reordering (plus the double-click edit dialogs which are also currently included for several elements) I aim to include:

  • Drag/drop adding of new elements from the toolbar to the left of the visualisation.  This should probably spawn a dialog/wizard and then insert the resulting regular expression element into the current expression.
  • Reordering needs more support in the alternatives item, currently there are only valid drop zones to place items inside current alternation branches, and there should be a drop zone allowing the user to drop an element in as a new alternative.
  • Possibly a simple “trash” element, which simply accepts the drop, and results in the item being deleted from the scene.

Regexp Formats

As stated in a few places in the application, before the initial release I hope to support PCRE, POSIX Extended and Qt format regular expressions.  This means supporting a range of different regexp syntaxes, and intelligently warning when switching between formats if some of the expression cannot be used directly in the new format, it should also offer to try to translate the expression if such a problem exists.

For example, if we’re currently in PCRE mode and we have an expression containing “\w” and we switch to POSIX Extended, this should trigger a warning and then offer to translate, turning “\w” into “[[:word:]]”.

At the moment, the application only supports Qt’s internal format, and I think correctly represents much of what it supports internally.  The format is very much  like a slightly restricted PCRE format, so Qt/PCRE conversion should be fairly straightforward.

Expression Testing

The editor currently has an element at the bottom of the layout which allows you to test the regular expression for given short strings.  This is good for most cases, since it allows you to have a few regexp “unit tests” of sorts, where you test fringe cases and observe if it matches, partially matches, and whether the capture groups work as expected.

In addition to this it would be useful to have a few other methods of testing included.  The testing widget should eventually be a tabbed widget with the currently available tester as an option, then also having at least two additional panes.  A “bulk text” pane which  takes paragraph or longer inputs of text and highlights all instances of that section which are matched by the regular expression, and a “replacement” pane which allows you to input a similar length input to “bulk text”, and apply the regular expression with a given replacement string (which could also be a regular expression).

Anyway, that’s what I’ve been working on and some of what I want to include later.  Work goes on. :)

A New Project

December 7th, 2009

I have recently been working on a new project under the working title of “Expression editor”, an application which allows for easy editing of Regular Expressions (regexps) in a similar way to KDE3.x’s KRegExpEditor.  I used to love KRegExpEditor for the incredibly useful functionality it provided, and in particular the visualisation of the regular expression as you edited it.  Being able to see graphically what the regexp was doing made dealing with long cryptic regexps much easier, and I felt it was a shame that it was not (as far as I know) ported to Qt4 and KDE SC 4.x.

A Screenshot of KRegExpEditor in Use

A Screenshot of KRegExpEditor in Use

Since I felt it was a very useful application and one that I felt deserved to be ported to Qt4, I have started my own replacement (I decided to replace it rather than port mostly as a learning experience) written from the ground up in Qt4.  If it reaches a good level of stability I may consider porting it to be a KDE SC 4.x application, but for now I’m just focusing on building a working replacement.  After working on this for two weeks (start date: 23rd November 2009) I’ve reached a state where things are starting to come together.  If anyone’s interested the app is licensed under the GPLv3 and is available from GitHub.  Any bugs or feature requests are welcome at the project’s Issues page.  As of fairly recently it looks like this:

Expression Editor With an Email Matching Regexp Open

Expression Editor With an Email Matching Regexp Open

At the moment it includes some Oxygen icons, but due to the license on those they will be replaced before I release an actual stable version of the application.

Remember, if you do try it, it’s nowhere near stable yet – and a fair bit is yet to be implemented (like the drag and drop / GUI editing of expressions).

Musings on Syntax Highlighting for Websites

February 6th, 2009

Syntax highlighting can be very important to some websites, particularly those featuring articles on programming practice/theory or pastebin/nopaste websites for collaborative debugging.  However, most highlighting packages tend to use pattern matching to attempt to correctly highlight a given document rather than a more accurate but more complex lexical parser and do not have the capacity to use multiple highlighting schemes for a given document internally.  If you choose to highlight something as PHP, then only the PHP segments will be syntax highlighted, when it’s plausible that there will also be HTML, XML, Javascript, CSS, etc in the same document.

These are two features lacking from most existing syntax highlighting packages today, and ones that I think would be extremely useful to have in publicly available free software tools.  The question is simply whether it is feasible to include them, or whether what we’ve got currently is as good as it’s likely to get.

Pattern Matching versus Lexical Parsing

These are the two main ways of taking a source document and producing a highlighting for it.  Pattern matching uses regular expressions to attempt to catch recognisable patterns in the given language which is simpler to produce, but does not guarantee good results.  Lexical parsing on the other hand is a much more complex but when done more flexible method for producing a highlighting of some input.

Lexical parsing involves going through the input from start to finish breaking the input up into “tokens”, which are small segments of the input with some associated meta-data.  In essence it breaks the code provided down into it’s components: strings, keywords, variables, etc.  The power of this model is that while the parser is working it can use its state information on things like scope and context to provide more accurate and more informative details.  In fact, a full lexical parser would be able to identify syntax errors and highlight them automatically.

As to providing more information, with tokenised input it would be fairly trivial to note which braces/brackets match one another, and unlike a pattern matching system you can include information from other parts of the program – take the simple example of a C++ typedef, something simple like “typedef vector<string>::iterator vec_iter”, which provides a new shorthand type “vec_iter” as a vector<string> iterator.  While a pattern matching model could probably work out that vec_iter was a type, it would not know what it represented, or if it was valid.  A lexical parser would be able to add a note saying “this is a vector<string> iterator” provided the typedef was in the provided sample.

Of course, while it is probably a superior method from a functional standpoint, it is significantly more complicated to implement.  Which raises the question of whether the benefits are worth the extra outlay of effort required to produce the highlighter.  My personal view is that pattern matching for the moment is the better option for things like articles, where we are confident that the input is a valid piece of code – and thus should be fine in a normal highlighter.  For uses like pastebin/nopaste sites though, it would be beneficial to have this kind of extra information since they are often used for collaborative debugging, and so highlighting of syntax errors, and other possible errors like a definition of a used type not being available (this might not be a true error as the definition may be in a file not provided to the highlighter, but it could still be worth noting – and it would definitely be useful for self-contained testcases).

Language Nesting in Code Samples

The other limitation in many existing syntax highlighters is that they are not able to apply several different language highlighting schemes to one piece of provided input.  This can be annoying when you’re highlighting things like web pages, which can easily contain HTML, with nested CSS (in <style></style>), nested JS (in <script></script>) and perhaps server-side languages like some PHP (in <?php ?>).

For the most part, just selecting one to highlight works, since it’s unlikely that more than one requires significant attention at once, however there are situations where it would be useful to have each block highlighted separately.  However, this would either require the user to select ranges of code to highlight in different language engines, or it would require the highlighter to attempt to automatically determine what language segments of code are.  The first is tedious for the end-user, and would likely lead to the product not being used, and the latter adds significant complication to the highlighter.

These things are something I would like to see included in the functionality of pastebin/nopaste websites, but due to the complexity I can’t expect them to just turn up one day.  So, given that I figure I might give it a go, simply writing a fairly cut-down proof-of-concept to maybe appear with pastesite one day (in C++, not PHP since I expect the performance of PHP to not be capable of this satisfactorily).  As to whether I’ll ever finish it, that remains to be seen, but I do think such a product would be beneficial to the programming community as a whole, and I hope if I don’t do it maybe someone else will.

Arbutus TC v1

January 7th, 2009

So, what have I been doing with my time recently? Bits and pieces of personal project tinkering, and also one small paid project. This project was a website for the Systems Engineering consulting company Arbutus Technical Consulting. It was recognised that they needed an effective web presence to help bring in business for the company, and I was hired to build that website to the specification provided.

The website was specified to be a very simple, mostly static collection of pages including an easy to use blog system for comments the company’s primary consultant had on issues related to Systems Engineering.  I designed a simple interface and translated that into a working website which Arbutus can use to advertise themselves to potential clients.

If you’re interested then have a look at what I came up with for Arbutus Technical Consulting.

zBot No More

December 13th, 2008

The title here is more sensational than it needs to be, I’m not discontinuing the zbot 2.0 project – rather, I’ve just decided that if I want to release it publicly, then I would like a more generic parent name for the software.  The instance of the bot in irc.zymic.com will likely retain the name. :)

So, what’s the new name for the software?  Well, that’s possibly still in flux, but for the moment I’ve decided I might go with polymer.  A slightly nerdy nod to the fact I want to make this release inherently extensible, as much as is needed.

Polymerisation

Of course, with a name like that I really should elaborate on just why it’s going to be more modular and easily extensible than the previous bot.

There was nothing really wrong with the implementation before, and much of it has been kept constant in the new implementation.  The most notable changes are in the format used to write modules, which has been simplified somewhat – and the fact that module interaction will be made possible allowing modules to reuse functionality included in a module that is already loaded into the bot.

My current WIP draft for module layout is this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
<?php
 
// module description up here?
//
// that would make sense.
 
class mod_example
{
   // any local variables required
   private $localvar;
   private $othervar;
 
   /// Core and required methods:
   // init, performs any initialisation required
   public function init()
   {
      // initialisation stuff... for example:
      $this->loadConfig(); // <-- (re)loads configuration in {confdir}/{name}.conf
 
      // the triggers and hooks
      registerTrigger('ping','respond');
      registerHook('passive');
   }
 
   // a rehash method which handles a complete config reload
   public function rehash()
   {
      // rejig internals in case our config has changed
      $this->loadConfig(); // <-- like this again perhaps?
   }
 
   /// about() and help() will probably make an appearance though since some are
   /// internal they need not include them.
 
   /// Trigger/Hook implementation
   // Here's a trigger, triggered of course by a !ping command.
   public function respond()
   {
      // note the lack of any arguments, instead the information will automatically
      // be made available through methods/variables contained within the base
      // class.  This simplifies format, and allows us to make triggers and hooks
      // constant.
      $target = $this->state->target;
      $nick   = $this->state->caller;
      // module intercommunication:
      if( is_object($_irc) )
         $_irc->msg($target,$nick.', pong');
   }
 
   // And here's a hook, called on every new packet
   public function passive()
   {
      // do stuff
   }
 
   /// And as always, you can declare internal functions for personal use.
   private function helper($arg1,$arg2)
   {
      // do something helpful
   }
}
 
?>

This format may undergo changes as I work on it, but it’s likely to look something like the above when I’m done. :)

Any suggestions/comments are very welcome, since this is going to be the interface anyone who wants to write modules will be using, so it’s important that it’s suitably intuitive.

Setting an Agenda

November 30th, 2008

If you’ve checked up on the site since my last blog entry, you’ve probably noticed I have indeed started on the pages for the rest of the site.  The about and contact pages are finished, and the footer now automatically displays the four most recent blog entries.  Now comes the main bit of the work, writing a CMS to manage my current and completed projects.

The Main Site

So, what exactly is going to be one the main site?  If you’ve seen the front page you can probably mostly guess.  There will be project pages for each project I’m currently working on or have completed – and there will be one of each set as “featured” works displayed on the front page and in the blog footer (the other spot will be filled by the most recent project).

The project pages themselves will be written descriptions of the project: what it’s about, what it’s aiming to produce, what I want to learn from it, what’s being used to implement it.  It will also have a section for relevant blog articles, which will be automatically fetched from here by selecting all the articles with a given tag (so for the zbot2 project, I will look for a “zbot” tag on blog articles).

This should provide a good source page to refer people to to answer any questions about the project, and can serve as a home for any projects I decide I would like to release publicly.

What About zbot?

I mentioned I was hoping to start zbot v2.0 soon, I will hopefully be starting that almost directly after finishing the main site on here.  In the meantime it’s time to start some design for the structure of the program and its source files.  When the project does get started I’ll make note here, and hopefully there’ll be a working core available before too long.

Other Work

This site and zbot aren’t the only things I’m doing however, there’s another project I would like to write which I have not really begun looking at yet, but which requires a fair bit of research before I can start.  I’ll probably look into a few of the topics I’ll need to write it while I’m working on the other projects (this site and zbot), and will hopefully write a few articles on them to help cement my understanding and to share what I’ve found out.

So, hopefully we’ll see the rest of the main site taking shape over the next week or two.  But if I don’t have time to blog for a little while, don’t think I’ve stopped working, hopefully it means quite the opposite, but we’ll have to see. ;)