Corry has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I want to grep a piece of code from a html source file somewhere in the middle of it. The wanted snippet is situated between comment tags. Can anyone help me getting started? (I ordered the book "regular expressions" from O'reilly. Till it arrives, I hope you guys can help.) thnx in advance, Corry.

Originally posted as a Categorized Question.

  • Comment on How would I find a piece of html from a sourcefile.html?

Replies are listed 'Best First'.
Re: How would I find a piece of html from a sourcefile.html?
by dsb (Chaplain) on Jan 24, 2001 at 02:54 UTC
    This regular expression will grap the entire tag, brackets and all.
    $data =~ m/(<[^>]+>)/; print $1, "\n"; # print the tag
    This regex will leave out the brackets and print only the string inside them:
    $data =~ m/<([^>]+)>/; print $1, "\n"; # print the tag
    Use modifiers, or loops as you need too. -kel
Re: How would I find a piece of html from a sourcefile.html?
by athomason (Curate) on May 26, 2000 at 14:27 UTC
    Comments make HTML extraction even more difficult than it usually is. However, if you're dealing with fairly standard HTML you could use
    $page =~ /<!--\w+(.*)\w*-->;/; $commented = $1;
    This will grab the string inside the comment; add appropriate qualifiers to the regexp as necessary (or use another on $commented) if you only want to pick certain stuff out.