<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blog - Alex-Elliott.co.uk &#187; projects</title>
	<atom:link href="http://alex-elliott.co.uk/blog/category/projects/feed" rel="self" type="application/rss+xml" />
	<link>http://alex-elliott.co.uk/blog</link>
	<description>The internet home of a prospective software engineer</description>
	<lastBuildDate>Thu, 02 Feb 2012 19:05:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
			<title>Blog - Alex-Elliott.co.uk</title>
			<url>http://alex-elliott.co.uk/favicon.png</url>
			<link>http://alex-elliott.co.uk/blog</link>
			<width>16</width>
			<height>16</height>
			<description>The internet home of a prospective software engineer</description>
		</image>		<item>
		<title>New Lexing Parsers and What They Mean for Expression Editor</title>
		<link>http://alex-elliott.co.uk/blog/2012/02/projects/expressioneditor/new-lexing-parsers-and-what-they-mean-for-expression-editor</link>
		<comments>http://alex-elliott.co.uk/blog/2012/02/projects/expressioneditor/new-lexing-parsers-and-what-they-mean-for-expression-editor#comments</comments>
		<pubDate>Thu, 02 Feb 2012 19:05:33 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[c++]]></category>
		<category><![CDATA[expressioneditor]]></category>
		<category><![CDATA[qt]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=87</guid>
		<description><![CDATA[Back in September (prior to some uni work that rather soaked up my free time) I pushed a fairly major restructuring of Expression Editor to the repository. Functionally it has meant a bit of a step back, but the new structure does mean some significant improvements to how it works now, and provides scope for [...]]]></description>
			<content:encoded><![CDATA[<p>Back in September (prior to some uni work that rather soaked up my free time) I pushed a fairly major restructuring of Expression Editor to the repository. Functionally it has meant a bit of a step back, but the new structure does mean some significant improvements to how it works now, and provides scope for some things which would not have been possible before. I thought that since this has been neglected for a while, I might as well give some details on it while I still have a window of opportunity.</p>
<h2>What has changed?</h2>
<p>There is one major difference in how it operates now compared to how it worked before. Prior to this the parsing of regular expressions was rather ad-hoc, it was a process that grew as I built in support for more different syntax elements. The result of that was in the long term it would get increasingly difficult to ensure things were being parsed correctly, it also did not react well to being passed an invalid expression &#8211; it would tend to just fail and so the display did not update while the expression was invalid.</p>
<p>To improve on that ad-hoc approach, I replaced it with a single consistent method &#8211; a lexing parser. The system defines a range of <a title="Tokens.hpp on GitHub" href="http://github.com/aelliott/expressioneditor/blob/master/RegexModules/tokens.hpp">tokens</a> (over 100 different ones) from simple literals (T_LITERAL), through common syntax elements like ^ (T_STARTING_POSITION), through to less commonly used syntax elements like &#8220;(?&lt;!&#8221; (which is of course T_NEGATIVE_LOOKBEHIND_ASSERTION_OPEN). The tokens available span across regular expression formats and provide a unified representation within Expression Editor to work with.</p>
<p>Of course there need to be regular expression backend specific parsers which convert their respective formats into a sequence of unified tokens. This is handled via a polymorphic set of parser classes based on the Parser base class (<a title="parser.cpp on GitHub" href="http://github.com/aelliott/expressioneditor/blob/master/RegexModules/parser.cpp">parser.cpp</a>/<a title="parser.hpp on GitHub" href="http://github.com/aelliott/expressioneditor/blob/master/RegexModules/parser.hpp">parser.hpp</a>), the general logic of the parser exists in the base class. It uses a map of tokens in the regular expression format onto regular expressions describing that syntax to find the longest match it can (in the case where some matches are of the same length, the first match is used). The result is then passed to handleToken() which handles the logic that should apply whenever a token of that type is found (for example, when a T_GROUPING_OPEN is found, it should continue to consume tokens until a T_GROUPING_CLOSE is found, if it reaches the end of the sequence without finding one, that T_GROUPING_OPEN should be reassigned the type T_ERROR as it is not balanced).</p>
<p>This structure &#8211; as mentioned in the last example given &#8211; is capable of handling invalid syntax correctly, and it&#8217;s fairly simple to implement new parsing backends as the logic is very consistent. The shared format is also very useful as it means that any new backends which use a different format will be easier to integrate.</p>
<h2>What impact does this change have?</h2>
<p>Well, initially it means a lot of things are no longer working, that&#8217;s unfortunate but on balance I think something that had to be done. At the time of writing two of the three testing widgets are no longer functional, the save/load/common/recent files is no longer in place and the visual editing which was there is not any more.</p>
<p>Not everything is bad news though, there are already some new features made easier by this system. The visualisation is now capable of updating even while there are errors in the expression. It will simply mark them as T_ERROR via the parser, and they will appear as red blocks until they are fixed &#8211; this means you can now spot errors via the visualisation. The amount of configuration possible has also increased, the settings dialog now provides a set of options for how the visualisation is presented and the aim is to keep adding further options as visual elements are added to the system. Allowing for the appearance to be tailored entirely to your specifications.</p>
<p>There are also more possibilities for the future than there were previously, by converting the regular expression to a shared format it opens up scope for regular expression optimisers and translators in the future. It would be nice if you could take an expression you had already written for PCRE and have the application convert it into say POSIX ERE for a system where PCRE is not supported or available. That would be a lot easier with this unified token sequence, as it would simply be a case of translating in the opposite direction to the original parser. Some elements may not be possible due to limitations of the format, but nevertheless it would be a useful feature to have.</p>
<h2>What work is going on at the moment?</h2>
<p>Not as much as I would like, really. I&#8217;ve got other stuff on via university &#8211; but I do want to try and bring back the features that have gone missing since the restructuring, and then I can get back to bringing in new functionality which would make this a better piece of software than before the restructuring. In the meantime the previous application is tagged in the repository, so the functionality is still available. Hopefully though it won&#8217;t need to be around too much longer.</p>
<p>Oh, a final passing point &#8211; another thing that has changed since the last time I wrote is that the CMake build system is now capable of producing an NSIS executable installer for Expression Editor, there is one available via GitHub which contains the new build of Expression Editor, and I&#8217;ll try to keep it as up to date as possible. I&#8217;d be interested to talk with people who&#8217;ve deployed Qt/C++/CMake to OSX before to get a suitable build system working there as well. It shouldn&#8217;t be too hard to do, since CPack does have package targets for various OSX installers it&#8217;s just not something I&#8217;ve done before.</p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2012/02/projects/expressioneditor/new-lexing-parsers-and-what-they-mean-for-expression-editor/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Back to Work</title>
		<link>http://alex-elliott.co.uk/blog/2010/07/projects/expressioneditor/back-to-work</link>
		<comments>http://alex-elliott.co.uk/blog/2010/07/projects/expressioneditor/back-to-work#comments</comments>
		<pubDate>Thu, 29 Jul 2010 18:00:24 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[c++]]></category>
		<category><![CDATA[expressioneditor]]></category>
		<category><![CDATA[qt]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=84</guid>
		<description><![CDATA[Well, been a while since the last blog post I&#8217;ve written, and a similar stretch of time since I really got much work done on ExpressionEditor. But as you might have guessed since the blog is winding back into life, so to is my work on EE. Changes So Far I&#8217;m still working my way [...]]]></description>
			<content:encoded><![CDATA[<p>Well, been a while since the last blog post I&#8217;ve written, and a similar stretch of time since I really got much work done on ExpressionEditor.  But as you might have guessed since the blog is winding back into life, so to is my work on EE.</p>
<h2>Changes So Far</h2>
<p>I&#8217;m still working my way slowly through the todo list for EE (updated now on the wiki.github home page for EE), major differences that I&#8217;ve already managed to cross off include adding support for the ICU regular expression library, and migrating the build process over to CMake (which is still partly ongoing, the more people build it on as many platforms as possible the better &#8211; please do report any issues, in IRC would be favourite as I can work through the issues directly).</p>
<p>ICU (International Components for Unicode) supports a regular expression engine that seems to be a popular choice particularly for Mac-based programming, and appears to be fairly full-featured.  I hope that ICU support will be useful to people, and that it gets good use.</p>
<p>As to the migration to CMake, this is for several reasons: it should make it easier to distribute at the moment it supports a very basic &#8220;make install&#8221; target, and I will be expanding that to bring the common files into /etc (and at some point I should improve the content of those common files, but that&#8217;s something I can do later) and also expanding that to take the README, CREDITS, and similar files into documentation.</p>
<p>It should also now correctly auto-detect which of the optional dependencies (all of the regular expression libraries bar that included in Qt4.6+) are present on the system, and build a version of EE which supports the ones which are available.  This hopefully will make it less of a pain for those who don&#8217;t have PCRE or ICU (or in the case of Win32 POSIX&#8217;s regex.h).</p>
<p>And talking of making it easier for Win32, after a fairly long period where no EE was compiled on Win32 it was recently built, at the least proving that the code is still very happy to compile there when it&#8217;s not missing optional deps &#8211; which is very good news to get. <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<h2>Looking Forward</h2>
<p>Of course, there&#8217;s still quite a lot to do to reach the plans I&#8217;ve set out, and since I&#8217;m working full-time I don&#8217;t have as much time as I&#8217;d like to work on EE (which I hope you&#8217;ll agree is an interesting and quite fun little tool, and it&#8217;s just as nice to work on <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> ).  But I expect to make some progress nonetheless, and if I do, I&#8217;ll keep all one maybe two people who have this in their RSS readers up to speed. <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2010/07/projects/expressioneditor/back-to-work/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Expression Editor Update (2)</title>
		<link>http://alex-elliott.co.uk/blog/2010/01/projects/expressioneditor/expression-editor-update-2</link>
		<comments>http://alex-elliott.co.uk/blog/2010/01/projects/expressioneditor/expression-editor-update-2#comments</comments>
		<pubDate>Sun, 24 Jan 2010 14:56:07 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[c++]]></category>
		<category><![CDATA[expressioneditor]]></category>
		<category><![CDATA[qt]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=77</guid>
		<description><![CDATA[Since I&#8217;ve had some more time to work on Expression Editor recently I thought it was about time I wrote another update for the progress of the project, and some related news that affects it. From Last Time&#8230; In the previous post I noted a few areas in progress and some that I wanted to [...]]]></description>
			<content:encoded><![CDATA[<p>Since I&#8217;ve had some more time to work on Expression Editor recently I thought it was about time I wrote another update for the progress of the project, and some related news that affects it.</p>
<p style="text-align: center;">
<div class="wp-caption aligncenter" style="width: 693px"><a href="http://files.pastesite.com/qt/expressioneditor_osx.png" rel="lightbox[77]"><img class=" " title="Expression Editor on Mac OSX" src="http://files.pastesite.com/qt/expressioneditor_osx.png" alt="Expression Editor on Mac OSX" width="683" height="500" /></a><p class="wp-caption-text">A recent screenshot of Expression Editor on Mac OSX</p></div>
<h2>From Last Time&#8230;</h2>
<p>In the <a href="http://alex-elliott.co.uk/blog/2009/12/projects/expressioneditor/expression-editor-update">previous post</a> I noted a few areas in progress and some that I wanted to look at in the future.  So to catch up there, Drag&amp;Drop is generally a bit more reliable and produces slightly neater results but is otherwise unchanged so far, and the new testing widget is still waiting.  A significant change has been made in the area of supported regular expression formats however.</p>
<p>The application now has backends for Qt4, PCRE and POSIX ERE formats (though the visualisation could still mess up some PCRE/POSIX elements, let me know if anything breaks).  You can select the format you wish to work in from the menu bar, it will be displayed in the bottom right of the screen so you know which mode it is currently in, and the save format has been slightly extended to save your preference for each particular expression.</p>
<p>The default mode has also been changed to PCRE, since it is probably the most powerful backend available.  Another minor UI change has been included which is an expression status indicator to the right of the text input.  A green tick while valid, and a red exclamation mark when invalid, in addition if you mouseover the invalid indicator, the tooltip is the error returned from the active regular expression backend.</p>
<h2>In Related News</h2>
<p>As you probably saw above the screenshot used is from Mac OSX.  In order to improve my capacity to test Expression Editor I&#8217;ve gotten myself a Mac Mini as well as my Slackware Linux laptop.  Set up with Synergy+ this means I can simultaneously develop the application in Linux and test it in OSX.  One behavioural difference between the two operating systems has already been resolved, so hopefully the application should start behaving much more reliably on OSX as well as Linux from now on.</p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2010/01/projects/expressioneditor/expression-editor-update-2/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Expression Editor Update</title>
		<link>http://alex-elliott.co.uk/blog/2009/12/projects/expressioneditor/expression-editor-update</link>
		<comments>http://alex-elliott.co.uk/blog/2009/12/projects/expressioneditor/expression-editor-update#comments</comments>
		<pubDate>Thu, 24 Dec 2009 15:09:39 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[c++]]></category>
		<category><![CDATA[expressioneditor]]></category>
		<category><![CDATA[qt]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=74</guid>
		<description><![CDATA[A fair bit of progress has been made since my last blog entry so I thought I&#8217;d note a few things that have landed in the repository and a few things that I intend to add at a later date. Drag and Drop Initial support for drag and drop editing has been added.  You can [...]]]></description>
			<content:encoded><![CDATA[<p>A fair bit of progress has been made since my last blog entry so I thought I&#8217;d note a few things that have landed in the repository and a few things that I intend to add at a later date.</p>
<h2>Drag and Drop</h2>
<p>Initial support for drag and drop editing has been added.  You can now re-order the elements of the expression by dragging an element in the visualisation to one of the valid drop zones (which are automatically highlighted as you can see in <a title="Drag &amp; Drop Screenshot" href="http://files.pastesite.com/qt/expressioneditor_dragdrop.png" rel="lightbox[74]">this screenshot</a>).  With this in place it becomes significantly easier to add the other bits of drag/drop editing I want the editor to support.  Eventually as well as reordering (plus the double-click edit dialogs which are also currently included for several elements) I aim to include:</p>
<ul>
<li>Drag/drop adding of new elements from the toolbar to the left of the visualisation.  This should probably spawn a dialog/wizard and then insert the resulting regular expression element into the current expression.</li>
<li>Reordering needs more support in the alternatives item, currently there are only valid drop zones to place items inside current alternation branches, and there should be a drop zone allowing the user to drop an element in as a new alternative.</li>
<li>Possibly a simple &#8220;trash&#8221; element, which simply accepts the drop, and results in the item being deleted from the scene.</li>
</ul>
<h2>Regexp Formats</h2>
<p>As stated in a few places in the application, before the initial release I hope to support PCRE, POSIX Extended and Qt format regular expressions.  This means supporting a range of different regexp syntaxes, and intelligently warning when switching between formats if some of the expression cannot be used directly in the new format, it should also offer to try to translate the expression if such a problem exists.</p>
<p>For example, if we&#8217;re currently in PCRE mode and we have an expression containing &#8220;\w&#8221; and we switch to POSIX Extended, this should trigger a warning and then offer to translate, turning &#8220;\w&#8221; into &#8220;[[:word:]]&#8221;.</p>
<p>At the moment, the application only supports Qt&#8217;s internal format, and I think correctly represents much of what it supports internally.  The format is very much  like a slightly restricted PCRE format, so Qt/PCRE conversion should be fairly straightforward.</p>
<h2>Expression Testing</h2>
<p>The editor currently has an element at the bottom of the layout which allows you to test the regular expression for given short strings.  This is good for most cases, since it allows you to have a few regexp &#8220;unit tests&#8221; of sorts, where you test fringe cases and observe if it matches, partially matches, and whether the capture groups work as expected.</p>
<p>In addition to this it would be useful to have a few other methods of testing included.  The testing widget should eventually be a tabbed widget with the currently available tester as an option, then also having at least two additional panes.  A &#8220;bulk text&#8221; pane which  takes paragraph or longer inputs of text and highlights all instances of that section which are matched by the regular expression, and a &#8220;replacement&#8221; pane which allows you to input a similar length input to &#8220;bulk text&#8221;, and apply the regular expression with a given replacement string (which could also be a regular expression).</p>
<p>Anyway, that&#8217;s what I&#8217;ve been working on and some of what I want to include later.  Work goes on. <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2009/12/projects/expressioneditor/expression-editor-update/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A New Project</title>
		<link>http://alex-elliott.co.uk/blog/2009/12/projects/expressioneditor/a-new-project</link>
		<comments>http://alex-elliott.co.uk/blog/2009/12/projects/expressioneditor/a-new-project#comments</comments>
		<pubDate>Mon, 07 Dec 2009 18:05:36 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[c++]]></category>
		<category><![CDATA[expressioneditor]]></category>
		<category><![CDATA[qt]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=70</guid>
		<description><![CDATA[I have recently been working on a new project under the working title of &#8220;Expression editor&#8221;, an application which allows for easy editing of Regular Expressions (regexps) in a similar way to KDE3.x&#8217;s KRegExpEditor.  I used to love KRegExpEditor for the incredibly useful functionality it provided, and in particular the visualisation of the regular expression [...]]]></description>
			<content:encoded><![CDATA[<p>I have recently been working on a new project under the working title of &#8220;Expression editor&#8221;, an application which allows for easy editing of Regular Expressions (regexps) in a similar way to KDE3.x&#8217;s KRegExpEditor.  I used to love KRegExpEditor for the incredibly useful functionality it provided, and in particular the visualisation of the regular expression as you edited it.  Being able to see graphically what the regexp was doing made dealing with long cryptic regexps much easier, and I felt it was a shame that it was not (as far as I know) ported to Qt4 and KDE SC 4.x.</p>
<div class="wp-caption aligncenter" style="width: 632px"><a href="http://www.blackie.dk/KDE/KRegExpEditor/"><img title="Screenshot of KRegExpEditor" src="http://www.blackie.dk/KDE/KRegExpEditor/html-regexp.jpg" alt="A Screenshot of KRegExpEditor in Use" width="622" height="394" /></a><p class="wp-caption-text">A Screenshot of KRegExpEditor in Use</p></div>
<p>Since I felt it was a very useful application and one that I felt deserved to be ported to Qt4, I have started my own replacement (I decided to replace it rather than port mostly as a learning experience) written from the ground up in Qt4.  If it reaches a good level of stability I may consider porting it to be a KDE SC 4.x application, but for now I&#8217;m just focusing on building a working replacement.  After working on this for two weeks (start date: 23rd November 2009) I&#8217;ve reached a state where things are starting to come together.  If anyone&#8217;s interested the app is licensed under the GPLv3 and is available from <a title="View Expression Editor's Project Page" href="http://github.com/aelliott/expressioneditor" target="_blank">GitHub</a>.  Any bugs or feature requests are welcome at the project&#8217;s <a title="Log a Request or Bug Report" href="http://github.com/aelliott/expressioneditor/issues" target="_blank">Issues page</a>.  As of fairly recently it looks like this:</p>
<div class="wp-caption aligncenter" style="width: 658px"><a href="http://files.pastesite.com/qt/expressioneditor.png" rel="lightbox[70]"><img class="     " title="Screenshot of Expression Editor" src="http://files.pastesite.com/qt/expressioneditor.png" alt="Expression Editor With an Email Matching Regexp Open" width="648" height="405" /></a><p class="wp-caption-text">Expression Editor With an Email Matching Regexp Open</p></div>
<p>At the moment it includes some Oxygen icons, but due to the license on those they will be replaced before I release an actual stable version of the application.</p>
<p>Remember, if you do try it, it&#8217;s nowhere near stable yet &#8211; and a fair bit is yet to be implemented (like the drag and drop / GUI editing of expressions).</p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2009/12/projects/expressioneditor/a-new-project/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Musings on Syntax Highlighting for Websites</title>
		<link>http://alex-elliott.co.uk/blog/2009/02/general/musings-on-syntax-highlighting-for-websites</link>
		<comments>http://alex-elliott.co.uk/blog/2009/02/general/musings-on-syntax-highlighting-for-websites#comments</comments>
		<pubDate>Fri, 06 Feb 2009 22:36:45 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[musing]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=57</guid>
		<description><![CDATA[Syntax highlighting can be very important to some websites, particularly those featuring articles on programming practice/theory or pastebin/nopaste websites for collaborative debugging.  However, most highlighting packages tend to use pattern matching to attempt to correctly highlight a given document rather than a more accurate but more complex lexical parser and do not have the capacity [...]]]></description>
			<content:encoded><![CDATA[<p>Syntax highlighting can be very important to some websites, particularly those featuring articles on programming practice/theory or pastebin/nopaste websites for collaborative debugging.  However, most highlighting packages tend to use pattern matching to attempt to correctly highlight a given document rather than a more accurate but more complex lexical parser and do not have the capacity to use multiple highlighting schemes for a given document internally.  If you choose to highlight something as PHP, then only the PHP segments will be syntax highlighted, when it&#8217;s plausible that there will also be HTML, XML, Javascript, CSS, etc in the same document.</p>
<p>These are two features lacking from most existing syntax highlighting packages today, and ones that I think would be extremely useful to have in publicly available free software tools.  The question is simply whether it is feasible to include them, or whether what we&#8217;ve got currently is as good as it&#8217;s likely to get.</p>
<h2>Pattern Matching versus Lexical Parsing</h2>
<p>These are the two main ways of taking a source document and producing a highlighting for it.  Pattern matching uses regular expressions to attempt to catch recognisable patterns in the given language which is simpler to produce, but does not guarantee good results.  Lexical parsing on the other hand is a much more complex but when done more flexible method for producing a highlighting of some input.</p>
<p>Lexical parsing involves going through the input from start to finish breaking the input up into &#8220;tokens&#8221;, which are small segments of the input with some associated meta-data.  In essence it breaks the code provided down into it&#8217;s components: strings, keywords, variables, etc.  The power of this model is that while the parser is working it can use its state information on things like scope and context to provide more accurate and more informative details.  In fact, a full lexical parser would be able to identify syntax errors and highlight them automatically.</p>
<p>As to providing more information, with tokenised input it would be fairly trivial to note which braces/brackets match one another, and unlike a pattern matching system you can include information from other parts of the program &#8211; take the simple example of a C++ typedef, something simple like &#8220;typedef vector&lt;string&gt;::iterator vec_iter&#8221;, which provides a new shorthand type &#8220;vec_iter&#8221; as a vector&lt;string&gt; iterator.  While a pattern matching model could probably work out that vec_iter was a type, it would not know what it represented, or if it was valid.  A lexical parser would be able to add a note saying &#8220;this is a vector&lt;string&gt; iterator&#8221; provided the typedef was in the provided sample.</p>
<p>Of course, while it is probably a superior method from a functional standpoint, it is significantly more complicated to implement.  Which raises the question of whether the benefits are worth the extra outlay of effort required to produce the highlighter.  My personal view is that pattern matching for the moment is the better option for things like articles, where we are confident that the input is a valid piece of code &#8211; and thus should be fine in a normal highlighter.  For uses like pastebin/nopaste sites though, it would be beneficial to have this kind of extra information since they are often used for collaborative debugging, and so highlighting of syntax errors, and other possible errors like a definition of a used type not being available (this might not be a <em>true</em> error as the definition may be in a file not provided to the highlighter, but it could still be worth noting &#8211; and it would definitely be useful for self-contained testcases).</p>
<h2>Language Nesting in Code Samples</h2>
<p>The other limitation in many existing syntax highlighters is that they are not able to apply several different language highlighting schemes to one piece of provided input.  This can be annoying when you&#8217;re highlighting things like web pages, which can easily contain HTML, with nested CSS (in &lt;style&gt;&lt;/style&gt;), nested JS (in &lt;script&gt;&lt;/script&gt;) and perhaps server-side languages like some PHP (in &lt;?php ?&gt;).</p>
<p>For the most part, just selecting one to highlight works, since it&#8217;s unlikely that more than one requires significant attention at once, however there are situations where it would be useful to have each block highlighted separately.  However, this would either require the user to select ranges of code to highlight in different language engines, or it would require the highlighter to attempt to automatically determine what language segments of code are.  The first is tedious for the end-user, and would likely lead to the product not being used, and the latter adds significant complication to the highlighter.</p>
<p>These things are something I would like to see included in the functionality of pastebin/nopaste websites, but due to the complexity I can&#8217;t expect them to just turn up one day.  So, given that I figure I might give it a go, simply writing a fairly cut-down proof-of-concept to maybe appear with pastesite one day (in C++, not PHP since I expect the performance of PHP to not be capable of this satisfactorily).  As to whether I&#8217;ll ever finish it, that remains to be seen, but I do think such a product would be beneficial to the programming community as a whole, and I hope if I don&#8217;t do it maybe someone else will.</p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2009/02/general/musings-on-syntax-highlighting-for-websites/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Arbutus TC v1</title>
		<link>http://alex-elliott.co.uk/blog/2009/01/php/arbutus-tc-v1</link>
		<comments>http://alex-elliott.co.uk/blog/2009/01/php/arbutus-tc-v1#comments</comments>
		<pubDate>Wed, 07 Jan 2009 21:39:24 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[arbutus]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=52</guid>
		<description><![CDATA[So, what have I been doing with my time recently? Bits and pieces of personal project tinkering, and also one small paid project. This project was a website for the Systems Engineering consulting company Arbutus Technical Consulting. It was recognised that they needed an effective web presence to help bring in business for the company, [...]]]></description>
			<content:encoded><![CDATA[<p>So, what have I been doing with my time recently?  Bits and pieces of personal project tinkering, and also one small paid project.  This project was a website for the Systems Engineering consulting company Arbutus Technical Consulting.  It was recognised that they needed an effective web presence to help bring in business for the company, and I was hired to build that website to the specification provided.</p>
<p>The website was specified to be a very simple, mostly static collection of pages including an easy to use blog system for comments the company&#8217;s primary consultant had on issues related to Systems Engineering.  I designed a simple interface and translated that into a working website which Arbutus can use to advertise themselves to potential clients.</p>
<p>If you&#8217;re interested then have a look at what I came up with for <a title="Visit Arbutus Technical Consulting" href="http://arbutus-tc.co.uk/">Arbutus Technical Consulting</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2009/01/php/arbutus-tc-v1/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>zBot No More</title>
		<link>http://alex-elliott.co.uk/blog/2008/12/php/zbot-no-more</link>
		<comments>http://alex-elliott.co.uk/blog/2008/12/php/zbot-no-more#comments</comments>
		<pubDate>Sat, 13 Dec 2008 16:44:39 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[php]]></category>
		<category><![CDATA[polymer]]></category>
		<category><![CDATA[projects]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=43</guid>
		<description><![CDATA[The title here is more sensational than it needs to be, I&#8217;m not discontinuing the zbot 2.0 project &#8211; rather, I&#8217;ve just decided that if I want to release it publicly, then I would like a more generic parent name for the software.  The instance of the bot in irc.zymic.com will likely retain the name. [...]]]></description>
			<content:encoded><![CDATA[<p>The title here is more sensational than it needs to be, I&#8217;m not discontinuing the zbot 2.0 project &#8211; rather, I&#8217;ve just decided that if I want to release it publicly, then I would like a more generic parent name for the software.  The instance of the bot in irc.zymic.com will likely retain the name. <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>So, what&#8217;s the new name for the software?  Well, that&#8217;s possibly still in flux, but for the moment I&#8217;ve decided I might go with <strong>polymer</strong>.  A slightly nerdy nod to the fact I want to make this release inherently extensible, as much as is needed.</p>
<h2>Polymerisation</h2>
<p>Of course, with a name like that I really should elaborate on just why it&#8217;s going to be more modular and easily extensible than the previous bot.</p>
<p>There was nothing really wrong with the implementation before, and much of it has been kept constant in the new implementation.  The most notable changes are in the format used to write modules, which has been simplified somewhat &#8211; and the fact that module interaction will be made possible allowing modules to reuse functionality included in a module that is already loaded into the bot.</p>
<p>My current <acronym title="Work in Progress">WIP</acronym> draft for module layout is this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// module description up here?</span>
<span style="color: #666666; font-style: italic;">//</span>
<span style="color: #666666; font-style: italic;">// that would make sense.</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> mod_example
<span style="color: #009900;">&#123;</span>
   <span style="color: #666666; font-style: italic;">// any local variables required</span>
   <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000088;">$localvar</span><span style="color: #339933;">;</span>
   <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000088;">$othervar</span><span style="color: #339933;">;</span>
&nbsp;
   <span style="color: #666666; font-style: italic;">/// Core and required methods:</span>
   <span style="color: #666666; font-style: italic;">// init, performs any initialisation required</span>
   <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> init<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
   <span style="color: #009900;">&#123;</span>
      <span style="color: #666666; font-style: italic;">// initialisation stuff... for example:</span>
      <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">loadConfig</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// &lt;-- (re)loads configuration in {confdir}/{name}.conf</span>
&nbsp;
      <span style="color: #666666; font-style: italic;">// the triggers and hooks</span>
      registerTrigger<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'ping'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'respond'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      registerHook<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'passive'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
   <span style="color: #009900;">&#125;</span>
&nbsp;
   <span style="color: #666666; font-style: italic;">// a rehash method which handles a complete config reload</span>
   <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> rehash<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
   <span style="color: #009900;">&#123;</span>
      <span style="color: #666666; font-style: italic;">// rejig internals in case our config has changed</span>
      <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">loadConfig</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// &lt;-- like this again perhaps?</span>
   <span style="color: #009900;">&#125;</span>
&nbsp;
   <span style="color: #666666; font-style: italic;">/// about() and help() will probably make an appearance though since some are</span>
   <span style="color: #666666; font-style: italic;">/// internal they need not include them.</span>
&nbsp;
   <span style="color: #666666; font-style: italic;">/// Trigger/Hook implementation</span>
   <span style="color: #666666; font-style: italic;">// Here's a trigger, triggered of course by a !ping command.</span>
   <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> respond<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
   <span style="color: #009900;">&#123;</span>
      <span style="color: #666666; font-style: italic;">// note the lack of any arguments, instead the information will automatically</span>
      <span style="color: #666666; font-style: italic;">// be made available through methods/variables contained within the base</span>
      <span style="color: #666666; font-style: italic;">// class.  This simplifies format, and allows us to make triggers and hooks</span>
      <span style="color: #666666; font-style: italic;">// constant.</span>
      <span style="color: #000088;">$target</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">state</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">target</span><span style="color: #339933;">;</span>
      <span style="color: #000088;">$nick</span>   <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">state</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">caller</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">// module intercommunication:</span>
      <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> <span style="color: #990000;">is_object</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$_irc</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span>
         <span style="color: #000088;">$_irc</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">msg</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$target</span><span style="color: #339933;">,</span><span style="color: #000088;">$nick</span><span style="color: #339933;">.</span><span style="color: #0000ff;">', pong'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
   <span style="color: #009900;">&#125;</span>
&nbsp;
   <span style="color: #666666; font-style: italic;">// And here's a hook, called on every new packet</span>
   <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> passive<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
   <span style="color: #009900;">&#123;</span>
      <span style="color: #666666; font-style: italic;">// do stuff</span>
   <span style="color: #009900;">&#125;</span>
&nbsp;
   <span style="color: #666666; font-style: italic;">/// And as always, you can declare internal functions for personal use.</span>
   <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">function</span> helper<span style="color: #009900;">&#40;</span><span style="color: #000088;">$arg1</span><span style="color: #339933;">,</span><span style="color: #000088;">$arg2</span><span style="color: #009900;">&#41;</span>
   <span style="color: #009900;">&#123;</span>
      <span style="color: #666666; font-style: italic;">// do something helpful</span>
   <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></td></tr></table></div>

<p>This format may undergo changes as I work on it, but it&#8217;s likely to look something like the above when I&#8217;m done. <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Any suggestions/comments are very welcome, since this is going to be the interface anyone who wants to write modules will be using, so it&#8217;s important that it&#8217;s suitably intuitive.</p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2008/12/php/zbot-no-more/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Setting an Agenda</title>
		<link>http://alex-elliott.co.uk/blog/2008/11/general/setting-an-agenda</link>
		<comments>http://alex-elliott.co.uk/blog/2008/11/general/setting-an-agenda#comments</comments>
		<pubDate>Sun, 30 Nov 2008 20:00:53 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[zbot]]></category>
		<category><![CDATA[project]]></category>
		<category><![CDATA[site]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=39</guid>
		<description><![CDATA[If you&#8217;ve checked up on the site since my last blog entry, you&#8217;ve probably noticed I have indeed started on the pages for the rest of the site.  The about and contact pages are finished, and the footer now automatically displays the four most recent blog entries.  Now comes the main bit of the work, [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve checked up on the site since my last blog entry, you&#8217;ve probably noticed I have indeed started on the pages for the rest of the site.  The about and contact pages are finished, and the footer now automatically displays the four most recent blog entries.  Now comes the main bit of the work, writing a <acronym title="Content Management System">CMS</acronym> to manage my current and completed projects.</p>
<h2>The Main Site</h2>
<p>So, what exactly is going to be one the main site?  If you&#8217;ve seen the front page you can probably mostly guess.  There will be project pages for each project I&#8217;m currently working on or have completed &#8211; and there will be one of each set as &#8220;featured&#8221; works displayed on the front page and in the blog footer (the other spot will be filled by the most recent project).</p>
<p>The project pages themselves will be written descriptions of the project: what it&#8217;s about, what it&#8217;s aiming to produce, what I want to learn from it, what&#8217;s being used to implement it.  It will also have a section for relevant blog articles, which will be automatically fetched from here by selecting all the articles with a given tag (so for the zbot2 project, I will look for a &#8220;zbot&#8221; tag on blog articles).</p>
<p>This should provide a good source page to refer people to to answer any questions about the project, and can serve as a home for any projects I decide I would like to release publicly.</p>
<h2>What About zbot?</h2>
<p>I mentioned I was hoping to start zbot v2.0 soon, I will hopefully be starting that almost directly after finishing the main site on here.  In the meantime it&#8217;s time to start some design for the structure of the program and its source files.  When the project does get started I&#8217;ll make note here, and hopefully there&#8217;ll be a working core available before too long.</p>
<h2>Other Work</h2>
<p>This site and zbot aren&#8217;t the only things I&#8217;m doing however, there&#8217;s another project I would like to write which I have not really begun looking at yet, but which requires a fair bit of research before I can start.  I&#8217;ll probably look into a few of the topics I&#8217;ll need to write it while I&#8217;m working on the other projects (this site and zbot), and will hopefully write a few articles on them to help cement my understanding and to share what I&#8217;ve found out.</p>
<p>So, hopefully we&#8217;ll see the rest of the main site taking shape over the next week or two.  But if I don&#8217;t have time to blog for a little while, don&#8217;t think I&#8217;ve stopped working, hopefully it means quite the opposite, but we&#8217;ll have to see. <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2008/11/general/setting-an-agenda/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Personal Site Work</title>
		<link>http://alex-elliott.co.uk/blog/2008/11/general/personal-site-work</link>
		<comments>http://alex-elliott.co.uk/blog/2008/11/general/personal-site-work#comments</comments>
		<pubDate>Wed, 26 Nov 2008 20:02:29 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[personal]]></category>
		<category><![CDATA[site]]></category>

		<guid isPermaLink="false">http://alex-elliott.co.uk/blog/?p=24</guid>
		<description><![CDATA[So, as predicted there&#8217;s already been a pretty big gap, but that&#8217;s not to say I&#8217;ve been neglecting the blog.  I&#8217;ve had a new bespoke design drafted up by a friend I know through a web development community, which I hope to get properly skinned for WP and set up as a personal site on [...]]]></description>
			<content:encoded><![CDATA[<p>So, as predicted there&#8217;s already been a pretty big gap, but that&#8217;s not to say I&#8217;ve been neglecting the blog.  I&#8217;ve had a new bespoke design drafted up by a friend I know through a web development community, which I hope to get properly skinned for WP and set up as a personal site on alex-elliott.co.uk soon.  The designer is Adam McPeake, who is linked in the blogroll under wized (<a href="http://wized.net/">wized.net</a> is his portfolio).</p>
<h2>Previews</h2>
<p>So, while I get to work converting the design into a WP theme and writing a CMS for my personal site, here&#8217;s a preview of what it should eventually look like (the main site, and the blog):</p>
<p><a rel="lightbox" href="http://labs.wized.net/looks/alex/preview2.png"><img src="/images/thumbs/main_site.png" alt="Main Site Preview"></a> <a rel="lightbox" href="http://labs.wized.net/looks/alex/previewblog.png"><img src="/images/thumbs/blog.png" alt="Blog Preview"></a></p>
<p>Hope to have some more updates about this soon, or maybe you&#8217;ll be reading this in the lovely new theme. <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>UPDATE:</strong> as you may have noticed the new design has been skinned into WP (and I rediscovered precisely why I hate that theme system).  Hopefully things still work, but if they don&#8217;t please do comment and let me know.</p>
<p><strong>NOTE:</strong> yes, the rest of the site isn&#8217;t done yet, what&#8217;s up is an example of what it should look like when done (at least the index page), I&#8217;ll start on getting the main top links at least drafted into markup now &#8211; and I should be able to finish the about and contact pages completely within a few days.</p>
<p>Exciting stuff. <img src='http://alex-elliott.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://alex-elliott.co.uk/blog/2008/11/general/personal-site-work/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

