in reply to Re: A few random questions from Learning Perl 3
in thread A few random questions from Learning Perl 3
It might be useful to read up a bit on the theory of formal languages. You'll see that there's a whole family of languages, each described by a certain mathematical formalism. Regular languages are an example, and as you can guess they're described by regular expressions. Unfortunately, HTML is not a regular language and hence can not be described by regular expressions since they're just not powerful enough.
By way of example, consider <em>hello beautiful HTML <em>world</em></em>: easy to write a regular expression to get the inner "world", isn't it? Now consider <em>hello <em>beautiful<em>HTML world</em></em></em>, if you want to match something, again you can write a regular expression... as long as you know the maximum number of times the <em>...</em> tags will be embedded.
HTML allows unbounded nesting of tags, so this means that you can't write a general regular expression that describes every possible nesting situation. Regular expression are simply not powerful enough, you'll need at least context free languages, hence a tool such as HTML::Parser or for general cases something like Parse::RecDescent.
Now you can argue:
Given this story, your claim that one can deal with all problems HTML by using regular expressions shows some unwarranted optimism on your part. Obviously there's no reason to believe me, so I'll suggest a number of references on the subject:
Just my 2 cents, -gjb-
Update: Thanks TheHobbit for reiterating the points I actually mention in my text if you bother to read it carefully. (?{...}) and /e are called code embedding.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Re: Re: A few random questions from Learning Perl 3
by TheHobbit (Pilgrim) on Jan 06, 2003 at 13:23 UTC | |
by ihb (Deacon) on Jan 06, 2003 at 14:45 UTC | |
by Anonymous Monk on Jan 07, 2003 at 01:46 UTC | |
Re: Re: Re: A few random questions from Learning Perl 3
by theorbtwo (Prior) on Jan 07, 2003 at 04:37 UTC | |
by gjb (Vicar) on Jan 07, 2003 at 05:40 UTC | |
by JadeNB (Chaplain) on Dec 14, 2009 at 17:08 UTC |