Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I am trying to get a regex to look through $source ($source = get($url);) and see if <option value="0_memberOf_ exists, and if so, I want to copy anything in the option tag's value.

Can someone help me with this one?

  • Comment on Quick regex for finding something in an OPTION tag

Replies are listed 'Best First'.
Re: Quick regex for finding something in an OPTION tag
by ig (Vicar) on Oct 28, 2008 at 21:19 UTC

    Parsing HTML correctly is non-trivial. You might find it better to use one of the existing modules for parsing the HTML. HTML::Parser and HTML::TreeBuilder come to mind.

Re: Quick regex for finding something in an OPTION tag
by GrandFather (Saint) on Oct 28, 2008 at 23:24 UTC

    The regex you want is trivial: /^0_memberOf_/. Figuring out where to use it is more interesting. Using HTML::Parser you could:

    use warnings; use strict; use HTML::Parser; my $html = <<HTML; <html> <body> <option value="0_memberOf_x">Ya want this?</option> </body> </html> HTML my $h = HTML::Parser->new (); my @stack; $h->handler (start => sub {return startOption (\@stack, @_);}, 'tag, +attr'); $h->handler (text => sub {return optionText (\@stack, @_);}, 'text'); $h->handler (end => sub {return endOption (\@stack, @_);}, ''); $h->parse ($html); sub startOption { my ($stack, $tag, $attr) = @_; push @$stack, [$tag, $attr->{value}]; } sub optionText { my ($stack, $text) = @_; return unless @$stack; push @{$stack->[-1]}, $text; } sub endOption { my ($stack) = @_; my ($tag, $match, $text) = @{pop @$stack}; return unless $tag eq 'option' && defined ($match) && $match =~ /^ +0_memberOf_/; print "$match: $text\n"; }

    Prints:

    0_memberOf_x: Ya want this?

    Perl reduces RSI - it saves typing
Re: Quick regex for finding something in an OPTION tag
by aquarium (Curate) on Oct 29, 2008 at 03:16 UTC
    i'd normally use the CGI module and $myvar=param('option')
    perldoc -f CGI shows some good examples
    the hardest line to type correctly is: stty erase ^H

      Can you post some code which shows how you parse HTML with CGI? I'm curious!