Dizzley has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to do some simple screenscraping. The example I have does this:
my $url = "http://amazon.com/o/tg/cm/browse-communities/-/" . $circleID . "/t/"; use strict; use LWP::Simple; #Request the URL my $content = get($url); die "Could not retrieve $url" unless $content; my $circle = (join '', $content); while ($circle =~ m!<title>(.*?)</title>!mgis) { print $1 . "\n\n"; }
I don't see what the line my $circle = (join '', $content); is doing? Can someone explain please?

In a general wisdom-seeking mood,
Diz

Replies are listed 'Best First'.
Re: Newb wrestles with join
by davido (Cardinal) on Oct 01, 2005 at 06:39 UTC

    More than likely, it's an artifact of code that's been modified by someone who didn't notice that he no longer had a list to join together.

    Maybe the original code was slurping a file into an array, as in

    @content = <FILE>

    ...at which point

    my $circle = join '', @content;

    ...would make sense. As the code evolved, its author grabbed the webpage from the website with LWP::Simple instead of slurping in a file, and the code got modified, but not quite enough to eliminate the useless use of join. ;)


    Dave

      Ah!

      That makes most sense. What a pity the example I found was a poor example and had not been cleaned up. I'm not confident enough yet to declare any Perl code I find as nonsense.

      Thx, Diz.

Re: Newb wrestles with join
by Zaxo (Archbishop) on Oct 01, 2005 at 06:35 UTC

    The join function takes a string and a list, producing a string with the first argument interpolated between all the list elements. If the list has one element there is no effect.

    my @words = qw/Just another Perl hacker,/; print join ' ..ubbada.. ', @words, $/; __END__ Just ..ubbada.. another ..ubbada.. Perl ..ubbada.. hacker, ..ubbada..

    After Compline,
    Zaxo

Re: Newb wrestles with join
by holli (Abbot) on Oct 01, 2005 at 06:46 UTC
    <rant>
    It's better not to try to parse html (and xml) by using regular expressions. I know it's tempting to do so and it also often enough just works for quick and dirty scripts. But if the program has to scale you will soon find it in a big mess.
    </rant>

    So it's better to use an appropriate module like HTML::Parser or HTML::TokeParser.
    use strict; use LWP::Simple; my $url = "http://amazon.com/o/tg/cm/browse-communities/-/" . $circleID . "/t/"; #Request the URL my $content = get($url); die "Could not retrieve $url" unless $content; use HTML::TokeParser; use Data::Dumper; $p = HTML::TokeParser->new(\$content) || die "Can't open: $!"; while (my $token = $p->get_tag( 'title' )) { print Dumper ($token); #... }


    holli, /regexed monk/
      Perfect.

      I'm heading down the HTML::TokeParser road right now.

      Perl Monks do it again.

      Thanks very much, Diz.

Re: Newb wrestles with join
by chromatic (Archbishop) on Oct 01, 2005 at 06:27 UTC

    It assigns the contents of $content to $circle. There's only one element to join, so the join does nothing.