Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Having a little trouble matching on a simple regex. My guess is some of the " or > or such are special characters. I always wondered if there's a list that contains all the special characters to watch out for in a regex?

The HTML I am tyring to match is, I'm trying to get the number value.

<input type="hidden" name="VERIFIER" value="076-62">
$page =~ m/name=\"VERIFIER\" value=\"(\d+\-\d+)\">/x; print $1;

Replies are listed 'Best First'.
Re: simple regex
by Paladin (Vicar) on Mar 03, 2006 at 17:51 UTC
    The reason your RE is failing is because of the /x which makes the RE engine ignore all unescaped whitespace in your RE, like the space between name=\"VERIFIER\" and value=. Check out perlre for more info.
Re: simple regex
by McDarren (Abbot) on Mar 03, 2006 at 17:54 UTC
    "always wondered if there's a list that contains all the special characters to watch out for"

    This is pretty much covered in perldoc perlre

Re: simple regex
by wfsp (Abbot) on Mar 03, 2006 at 18:29 UTC
    I find that "simple regex" and HTML very rarely fit nicely together!

    You can never be sure what the HTML will actually look like as ptum mentions above. If your HTML is part of a file it would be well worth considering a parser.

    #!/bin/perl5 use strict; use warnings; use HTML::TokeParser::Simple; my $html = '<input type="hidden" name="VERIFIER" value="076-62">'; my $attribute = 'value'; my $value; my $tp = HTML::TokeParser::Simple->new(\$html) or die "Couldn't parse string: $!"; while (my $t = $tp->get_token) { if ( $t->is_start_tag('input') and $value = $t->get_attr($attribute) ) { print "*$value*\n"; } }

Re: simple regex
by explorer (Chaplain) on Mar 03, 2006 at 17:42 UTC
    Try:
    $page =~ m/name="VERIFIER" value="([-0-9]+?)"/;

      You may also wish to use the m/regex/si flags in case the HTML spans multiple lines or the VERIFIER name attribute is not upper-cased.


      No good deed goes unpunished. -- (attributed to) Oscar Wilde
      Who says 'Fun with regexes' is not a contact sport?
      $text .= 'blah<input type="hidden" name="FOO" value="123-01">blah'; $text .= "\nblah\n"; $text .= 'blah<input type="hidden" name="VERIFIER" value="076-62">blah +'; $text .= "\nblah\n"; $page = (($text =~ m/name="VERIFIER"\ value="([-0-9]+?)">/xg),$1)[0]; print "$page";
      So, $page will end up as "076-62" in your example... and to hell with $1.




      If you wanted make it more interesting, you could replace the appropriate line with:
      $page = (($text =~ m/<input\ type="hidden"\ name="VERIFIER"\ value="(\ +d+)\-(\d+)">/xg),$1.'-'.$2)[0];
      or
      $page = (($text =~ m/<input\ type="hidden"\ name="VERIFIER"\ value="(\ +d+)\-(\d+)">/xg),$1.'-'.$2)[1];
      if you only wanted a portion of the value. (The '072' or '062' in your example.)




      FYI, the 'special character' that's goofing with you (at least as written in your example) is the space between "VERIFIER" and value. The " characters do not have to be backwhack-escaped, but that non-escaped space in the middle of your regex will cause the match to fail.

      Well, that's my $.02 worth. No, for refunds you'll have to check our customer service department.
Re: simple regex
by gube (Parson) on Mar 03, 2006 at 17:53 UTC

    Hi try this,

    You have tried well it's not special characters problem.

    $page =~ m/name="VERIFIER" value="(\d+-\d+)"/g; print $1;