reclusivemonkey has asked for the wisdom of the Perl Monks concerning the following question:

I have created the following script to download data from a website. It works fine but there are large gaps in the data (I want a nice neat table to go into Geektool on my desktop). I want to add s/\t||\r||\n||\f//g which worked fine when I used a mixture of curl and perl but wherever I try it in the script it either doesn't work or breaks things. I've been googling and reading perl tuts for about a week now but I just can't get a hook in anywhere. I would be very grateful if anyone could explain where I need to put the substitution. Thanks in advance.
#!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use WWW::Mechanize; my $url = "http://www.example.com"; my $mech = WWW::Mechanize->new(); $mech->agent_alias( 'Mac Safari' ); $mech->get( $url ); my $te = HTML::TableExtract->new( headers => [qw(Company Salary)] ); $te->parse($mech->content); foreach my $row ($te->rows) { print join(' - ', @$row); }
I tried posting this to perl.beginners on usenet but I guess it must be too basic as the mods don't seem to be adding it :-S I know I am trying to learn to run before I can walk but nothing I have read leads me any closer to understanding what I need to do here.

Replies are listed 'Best First'.
Re: Where to add substitution?
by moritz (Cardinal) on May 05, 2009 at 19:59 UTC
    s/\t||\r||\n||\f//g is not a very good regex, because it only matches a tab (\t) or the empty string (corresponding to the empty string between two vertical bars in the regex).

    You probably meant something like s/[\t\r\n\f]//g

    Or, if you want to replace any consecutive list of whitespaces with a a single blank: s/\s+/ /g

    See perlretut for a gentle introduction.

    I would be very grateful if anyone could explain where I need to put the substitution

    you need to apply it to the variables that holds that text from which you want to strip the whitespaces. Since you wrote that script (at least you didn't attribute it to anybody else) you should know which one it is.

    (Update: fixed character class, johngg++)

Re: Where to add substitution?
by linuxer (Curate) on May 05, 2009 at 20:04 UTC

    Seeing your regex s/\t||\r||\n||\f//g I think you should check the documentation of perlretut, perlrequick, and perlre.

    I assume you wanted to remove any occurence of \t, \r, \n or \f.

    You could use a character class in regex for this

    $text =~ s/[\t\r\n\f]//g

    or you could use tr///d (see tr/SEARCHLIST/REPLACEMENTLIST/cds)

    $text =~ tr/\t\r\n\f//d
      Thanks for all the replies monks, they have been most helpful.
Re: Where to add substitution?
by kennethk (Abbot) on May 05, 2009 at 19:57 UTC
    Assuming you are trying to modify the rows output from the parsed table, the substitution should be in your foreach loop. Note that since the rows method returns an array reference, you cannot simply perform the substitution on the loop variable. In this context, I would probably create and intermediate variable between your join and your print, a la:

    #!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use WWW::Mechanize; my $url = "http://www.example.com"; my $mech = WWW::Mechanize->new(); $mech->agent_alias( 'Mac Safari' ); $mech->get( $url ); my $te = HTML::TableExtract->new( headers => [qw(Company Salary)] ); $te->parse($mech->content); foreach my $row ($te->rows) { my $output = join(' - ', @$row); $output =~ s/\t||\r||\n||\f//g; print $output; }
Re: Where to add substitution?
by trwww (Priest) on May 05, 2009 at 22:34 UTC

    I tried posting this to perl.beginners on usenet but I guess it must be too basic as the mods don't seem to be adding it

    Try signing up for the perl "beginners" mailing list with the email address you are using in your nntp client. Then you should be able to post.

    The nntp server and the mailing lists are connected by Ask's colobus setup.

    Of course you're welcome to post here as often as you like, too :0)

    Hope this helps,