Kore Nordmann - PHP / Projects / Politics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Kore Nordmann :Date: Wed, 07 Nov 2007 15:05:49 +0100 :Revision: 1 :Copyright: CC by-sa ============================== Parse with regular expressions ============================== :Description: With recursive patterns in PCRE you can actually match recursive structures, even you should not try this. A regular expression to validate BBCode documents is included in the blog post. As `mentioned earlier`__, you can not parse any recursive structures using regular expressions, because they miss some essential features, like a stack, recursion, or similar. __ /blog/do_NOT_parse_using_regexp.html But in this blog post I also told, that PCRE implements a superset of regular expressions, so that you are able to "parse" more languages, then just regular ones. If someone could send me some proof, which language types can be matched using regular expresions, I would be happy - maybe just implement a turing machine with PCRE regular axpressions? Parse recursive structures ========================== During my talk, `I gave to day about regular expressions`__, I showed a example regular expressions which is able to validate if some BBCode document is valid, which means, it only has matching tags, and a correct tree structure of tags: :: ( ( [^\[\]]* \[([a-z]+)(?:=([^\]]+))?\] (?# The actual recursion ) (?>[^\[\]]* | (?R) ) \[/\2\] [^\[\]]* ) )ix If you don't understand this intuitively, you should probably attend on of my talks on regular expressions, or just try to help yourself reading the `manual section on PCRE`__. __ /blog/ipc_07_published_talks.html __ http://php.net/pcre So what happens, passing some example strings to this regular expressions? :: 'Some [b] longer [i]text[/i][/b].' => bool( true ) 'Some [b] longer [i]text[/b][/i].' => bool( false ) 'Some [b] longer [i]te [u] xt[/i][/b].' => bool( false ) As you can see, at least simple validations pass. I did not test this extensively, so you may find some problems when validating more complex documents. Parse the braces language ========================= In my ealier blog post about this, I mentiond the trivial recursive example, of a language, which consists of any number of opening braces, followed by the same number of clsing braces. This simple language can also be validated, and the manual even shows this as an example. :: (\((((?>[^()]+)|(?R))*)\)) Do not use this =============== Why, you ask? That you are not get what the regular expressions do at the first glance should be answer enough. This stuff is neither readable, nor maintainable or debugable. And even, if you do so, the next one reading your code won't be able to do so - or would need hours to find out. Trackbacks ========== Comments ======== - open source at Wed, 05 Dec 2007 09:33:19 +0100 This regulal expresion looks so complicated for me ;)