bilbozilla has asked for the wisdom of the Perl Monks concerning the following question:

I have the following information and I'd like to pull the first 2 complete decimal numbers from the text, while avoiding the rest. How would I do that and put two nbsp's between them?
<span class="style">28.56</span><span class="style">-1.22</span><span +class="style">-4.1</span><span class="style">04/02</span>

Replies are listed 'Best First'.
Re: A regex question with 2 pieces of data
by Abigail-II (Bishop) on Apr 08, 2003 at 00:02 UTC
    use strict; use warnings; use Regexp::Common; $_ = '<span class="style">28.56</span><span class="style">-1.22</span> +' . '<span class="style">-4.1</span><span class="style">04/02</span>' +; my ($f, $s) = /($RE{num}{decimal}).*?($RE{num}{decimal})/; print "$f&nbsp;&nbsp;$s\n"; __END__ 28.56&nbsp;&nbsp;-1.22

    Abigail

      The first 2 sets of numbers will vary. How do I pull those? I should have been more clear .Could you please help?
        Just use Abigail-II's regex (courtesy of Regexp::Common) to pull them out of your data. If $data contains your data you want to get the numbers from, then something like
        my ($one, $two) = ($data =~ /($RE{num}{decimal}).*?($RE{num}{decimal})/);
        will put the first two numbers, regardless of what they are, into $one and $two. (Make sure to use Regexp::Common; in your script though before using the regex above!)
Re: A regex question with 2 pieces of data
by graff (Chancellor) on Apr 08, 2003 at 02:16 UTC
    While the module suggested initially is really cool and useful, the alternative, without using a special module, would be:
    my $string = <<EOT; <span class="style">28.56</span><span class="style">-1.22</span><span +class="style">-4.1</span><span class="style">04/02</span> EOT my $decimal = qr/[-+]?(?:\d+\.?\d*|\d*\.\d+)/; my ( $frst, $scnd ) = ( $string =~ /($decimal).*?($decimal)/ ); print "$frst, $scnd\n";
    Here, the "qr" operator is used to save a regex pattern in the scalar "$decimal" (see the perlop man page for "qr"); then, that regex variable is used twice on the target string, and the match is done in a list context (assigning to two scalars), so the two parenthesized matches are assigned to "$frst" and "$scnd".

    As for the $decimal regex itself, it's looking for a pattern where there may or may not be an initial hyphen or plus sign, then either one or more digits (with optional period and zero or more digits) or else a period with one or more digits (see the perlre man page regarding the "(?:...)" syntax and related tricks).

    Note that this will not handle variants like "2.4e7", and other possible "rare" forms -- although it would certainly be possible to add the necessary conditions. But that is why we like to use modules for this sort of thing, because the module will normally cover all that without requiring us to make our own coding more complicated.

Re: A regex question with 2 pieces of data
by pg (Canon) on Apr 08, 2003 at 01:51 UTC
    As this is XML, it also makes sense to use some sort of XML parser, which is more flexible. For example: (code is tested)
    use XML::Simple; use Data::Dumper; use strict; my $parser = new XML::Simple(); my $ref = $parser->XMLin("<xml>".'<span class="style">28.56</span><spa +n class="style">-1.22</span><span class="style">-4.1</span><span clas +s="style">04/02</span>'."</xml>"); print $ref->{span}[0]{content}, "\n"; print $ref->{span}[1]{content};
    If your string could be long, then use some parser that does not slurp.