in reply to Pulling HTML off another site problem

UPDATED, added sample code instead of just pointing to CPAN.

First pointer, use strict! Then you see that you really want:

my $data = get . . . . $data =~ . . . .
instead of assigning the output of get() to $_.

Here, try this:

#!/usr/bin/perl -w use strict; use LWP::Simple; my $data = get ("http://www.bloomberg.com/energy/index.html"); my ($wanted) = $data =~ /<!-+PETROLEUM-+>\s*(.*)\s*<map\s+name="BbgELo +gin2">/s; open (FH,'>file.txt') || die $!; # > creates a new file, >> appends print FH $wanted; close FH; # not really necessary in this simple script
$wanted should have what you want. Use \s instead of a litteral space. \s catches newlines and tabs as well. Also, you need the 's' modifier instead of 'm'.

I recommend you use a parser, such as HTML::Parser, or possibly HTML::TokeParser. It takes a little time to learn the interface to these modules, but that time is well invested, as you will ultimately save more time and hair.

Jeff

R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
L-L--L-L--L-L--L-L--L-L--L-L--L-L--

Replies are listed 'Best First'.
Re: (jeffa) Re: Pulling HTML off another site problem
by ChemBoy (Priest) on Jun 23, 2001 at 23:58 UTC

    Actually, $wanted contains 1 or 0, depending if it matched or not... but adding parentheses thusly

    my ($wanted) = $data =~ /(<!-+PETROLEUM-+>\s*(.*)\s*<map\s+name="BbgE +Login2">)/s;
    will fix that.

    Update: ChemBoy stupid. ChemBoy not have coffee. Bad ChemBoy! (thanks, cLive ;-); sorry, jeffa!)



    If God had meant us to fly, he would *never* have give us the railroads.
        --Michael Flanders

      Errr,

      Jeff did include parentheses, in the middle.

      cLive ;-)