Re: Regular Expression

See this node: Regular Expressions for almost the exact same question.
Why can't you use modules? The most robust way will be something like HTML::Parser -- look specifically at the examples section for extracting the <title> tag.

for one-time quick & dirty, use a regex (this assumes, of course, that there isn't a > in the onLoad javascript):

  if( $html =~ /<body (.*?)>/si ){
    my $body_attributes = $1;
  }
[download]

Maybe something like this will help guard against javascript screwing up the match, but assumes ~~proper quoting of the attributes~~:

  /<body((?:\s+(?:\w+=".*?"))*)>/si
[download]

Update: added strike and bold after reading/noting ikegami's response

Comment on Re: Regular Expression Select or Download Code

Replies are listed 'Best First'.
Re^2: Regular Expression by ikegami (Patriarch) on Jun 28, 2005 at 21:38 UTC
I was directed to the second regexp this post as a solution that fixes problems in another post, but it's no better. but assumes proper quoting of the attributes: The HTML spec allows for single quotes, and even allows for the quotes to be omitted in some circumstances, so no, it doesn't assume proper quoting. Also, it doesn't handle `>` inside of quotes (where it doesn't need to be escaped). Finally, it could locate `<body>` inside of a comment or inside of another attribute.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Regular Expression
by ikegami (Patriarch) on Jun 28, 2005 at 21:38 UTC

but assumes proper quoting of the attributes:

The HTML spec allows for single quotes, and even allows for the quotes to be omitted in some circumstances, so no, it doesn't assume proper quoting.

Also, it doesn't handle > inside of quotes (where it doesn't need to be escaped).

Finally, it could locate <body> inside of a comment or inside of another attribute.

[reply]
[d/l]
[select]