Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I have a string "tenpixblack">TEXT To GET.</td>" without the quotes. How can I extract the text "TEXT TO GET." from this string using regex?

Replies are listed 'Best First'.
Re: regex help please!
by ww (Archbishop) on Jun 15, 2005 at 21:43 UTC
    assuming you mean that within a <td> ... </td> pair you have
    &nbsp; (something = ")tenpixblack">TEXT To GET.

    m!.*(?=tenpixblack">) # anything, until position before '=tenpixblack" +>' =tenpixblack"> # now match '=tenpixblack' (TEXT To GET.) # capture 'TEXT To GET.' !gx; # global, extended syntax $wanted = $1; # assign captured ($1) to $wanted...

    ...and do as you wish.

    Warning incomplete code... addressing only the regex, in part because of your incomplete question

    Warning 2You've indicated two different capitalization styles for TEXT ( To | TO ) GET. The snippet above only deals with the one with the lower-case "o"; if you need both, add an 'i' flag at the end of the regex... but only if that won't cause unwanted matches on text which is fully capitalized

    And, most important, please read about asking questions How do I post a question effectively?, Writeup Formatting Tips and even Perl Monks Approved HTML tags ... and show us what you've tried, as that inspires the inclination to answer.

Re: regex help please!
by GrandFather (Saint) on Jun 15, 2005 at 21:57 UTC

    I'd guess what you are really looking for is something like:

    <td ..."tenpixblack">text to get</td>

    so a regex like /<td\b.*?"tenpixblack">(.*?)<\/td>/g (giving the captured text in $1) is likely what you want. However if you are really banging on HTML then you should use HTML::TreeBuilder or HTML::Parser.


    Perl is Huffman encoded by design.
Re: regex help please!
by bart (Canon) on Jun 15, 2005 at 21:47 UTC
    I think you want
    $_ = "tenpixblack\">TEXT To GET.</td>"; ($text) = />(.*?)</;
Re: regex help please!
by ercparker (Hermit) on Jun 16, 2005 at 05:38 UTC
    Here is another regex that works for the example you provided. I hope it helps.
    my $wanted; my $text = 'tenpixblack">TEXT To GET.</td>'; ($wanted) = $text =~ /(?<=">)([^<]*?)</;
Re: regex help please!
by TedPride (Priest) on Jun 16, 2005 at 10:29 UTC
    use strict; use warnings; $_ = 'tenpixblack">TEXT To GET.</td>'; $_ = m/tenpixblack">(.*?)<\/td>/ ? $1 : ''; print;