Re: Regular Expression
by waswas-fng (Curate) on Jun 28, 2005 at 19:06 UTC
|
| [reply] |
Re: Regular Expression
by Transient (Hermit) on Jun 28, 2005 at 19:01 UTC
|
| [reply] |
|
| [reply] |
|
| [reply] |
Re: Regular Expression
by davidrw (Prior) on Jun 28, 2005 at 19:23 UTC
|
See this node: Regular Expressions for almost the exact same question.
Why can't you use modules? The most robust way will be something like HTML::Parser -- look specifically at the examples section for extracting the <title> tag.
for one-time quick & dirty, use a regex (this assumes, of course, that there isn't a > in the onLoad javascript):
if( $html =~ /<body (.*?)>/si ){
my $body_attributes = $1;
}
Maybe something like this will help guard against javascript screwing up the match, but assumes proper quoting of the attributes:
/<body((?:\s+(?:\w+=".*?"))*)>/si
Update: added strike and bold after reading/noting ikegami's response | [reply] [d/l] [select] |
|
I was directed to the second regexp this post as a solution that fixes problems in another post, but it's no better.
but assumes proper quoting of the attributes:
The HTML spec allows for single quotes, and even allows for the quotes to be omitted in some circumstances, so no, it doesn't assume proper quoting.
Also, it doesn't handle > inside of quotes (where it doesn't need to be escaped).
Finally, it could locate <body> inside of a comment or inside of another attribute.
| [reply] [d/l] [select] |
Re: Regular Expression
by fmerges (Chaplain) on Jun 28, 2005 at 21:08 UTC
|
$html =~ s{(<body.*?)>}{$1 onLoad="window.print()">}is;
Hope it can be helpfull
Regards,
:-) | [reply] [d/l] |
Re: Regular Expression
by kwaping (Priest) on Jun 28, 2005 at 19:18 UTC
|
my $body = $the_html_string;
$body =~ s/(<body.*?>)/$1/sgi;
| [reply] [d/l] |
|
What will happen when a <body> tag is included in comments? Your regex will break. I'm almost sure that the one who gave the instruction to find the <body> of the HTML code with a simple regex did not think about this.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [d/l] [select] |
|
Interesting observation - do you often see body tags enclosed in comments? You are assuming the poster doesn't want body tags enclosed in comments. ;) In any case, I think this pattern is better (added ^):
my $body = $the_html_string;
$body =~ s/^.*?(<body.*?>)/$1/sgi;
| [reply] [d/l] |
|
|
|
| [reply] |
Re: Regular Expression
by l.frankline (Hermit) on Jun 29, 2005 at 14:55 UTC
|
u can try like this....
$_ =~ /<body[^>]*>/;
* Frank * | [reply] [d/l] [select] |
Re: Regular Expression
by Anonymous Monk on Jun 28, 2005 at 19:14 UTC
|
Unfortunately for my purpose I cannot use that module :-) I need to use strict regex | [reply] |
|
Homework? Or some fundamental religious rule which forbids you to handle HTML other than with a regex?
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
|
| [reply] |
|
$html =~ /(<body[^>]*>)/s;
that should get you the opening tag.
Update: Changed + to * as pointed out by kwaping | [reply] [d/l] [select] |
|
That'll fail for <body onload="if (a > b) { ... }">. It can also fail if <body> is found in comments.
| [reply] [d/l] [select] |
|
|
And how do you find the closing tag?
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
|
|
|
|
|