As mentioned earlier, you can not parse any recursive structures using regular expressions, because they miss some essential features, like a stack, recursion, or similar.

But in this blog post I also told, that PCRE implements a superset of regular expressions, so that you are able to "parse" more languages, then just regular ones. If someone could send me some proof, which language types can be matched using regular expresions, I would be happy - maybe just implement a turing machine with PCRE regular axpressions?

During my talk, I gave to day about regular expressions, I showed a example regular expressions which is able to validate if some BBCode document is valid, which means, it only has matching tags, and a correct tree structure of tags:

```
(
(
[^\[\]]*
\[([a-z]+)(?:=([^\]]+))?\]
(?# The actual recursion )
(?>[^\[\]]* | (?R) )
\[/\2\]
[^\[\]]*
)
)ix
```

If you don't understand this intuitively, you should probably attend on of my talks on regular expressions, or just try to help yourself reading the manual section on PCRE.

So what happens, passing some example strings to this regular expressions?

```
'Some [b] longer [i]text[/i][/b].'
=> bool( true )
'Some [b] longer [i]text[/b][/i].'
=> bool( false )
'Some [b] longer [i]te [u] xt[/i][/b].'
=> bool( false )
```

As you can see, at least simple validations pass. I did not test this extensively, so you may find some problems when validating more complex documents.

In my ealier blog post about this, I mentiond the trivial recursive example, of a language, which consists of any number of opening braces, followed by the same number of clsing braces. This simple language can also be validated, and the manual even shows this as an example.

`(\((((?>[^()]+)|(?R))*)\))`

Why, you ask?

That you are not get what the regular expressions do at the first glance should be answer enough. This stuff is neither readable, nor maintainable or debugable. And even, if you do so, the next one reading your code won't be able to do so - or would need hours to find out.

**Comments are closed.** This blog only exists so that all articles can still be referenced. There is no relevant activity any more on this blog. Since spammers still also find this blog comments are shut down entirely.

open sourceat Wed, 05 Dec 2007 09:33:19 +0100This regulal expresion looks so complicated for me ;)

Link to comment