Newb wrestles with join

Dizzley has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to do some simple screenscraping. The example I have does this:

my $url = "http://amazon.com/o/tg/cm/browse-communities/-/" .
          $circleID . "/t/";

use strict;
use LWP::Simple;

#Request the URL
my $content = get($url);
die "Could not retrieve $url" unless $content;

my $circle = (join '', $content);

while ($circle =~ m!<title>(.*?)</title>!mgis) {
    print $1 . "\n\n";
}
[download]

I don't see what the line my $circle = (join '', $content); is doing? Can someone explain please?

In a general wisdom-seeking mood,
Diz

Comment on Newb wrestles with join Download Code

Replies are listed 'Best First'.
Re: Newb wrestles with join by davido (Cardinal) on Oct 01, 2005 at 06:39 UTC
More than likely, it's an artifact of code that's been modified by someone who didn't notice that he no longer had a list to join together. Maybe the original code was slurping a file into an array, as in `@content = <FILE>` ...at which point `my $circle = join '', @content;` ...would make sense. As the code evolved, its author grabbed the webpage from the website with LWP::Simple instead of slurping in a file, and the code got modified, but not quite enough to eliminate the useless use of join. ;) Dave	[reply] [d/l] [select]
Re^2: Newb wrestles with join by Dizzley (Novice) on Oct 01, 2005 at 06:55 UTC
Ah! That makes most sense. What a pity the example I found was a poor example and had not been cleaned up. I'm not confident enough yet to declare any Perl code I find as nonsense. Thx, Diz.	[reply]
Re: Newb wrestles with join by Zaxo (Archbishop) on Oct 01, 2005 at 06:35 UTC
The join function takes a string and a list, producing a string with the first argument interpolated between all the list elements. If the list has one element there is no effect. `my @words = qw/Just another Perl hacker,/; print join ' ..ubbada.. ', @words, $/; __END__ Just ..ubbada.. another ..ubbada.. Perl ..ubbada.. hacker, ..ubbada..` [download] After Compline, Zaxo	[reply] [d/l]
Re: Newb wrestles with join by holli (Abbot) on Oct 01, 2005 at 06:46 UTC
<rant> It's better not to try to parse html (and xml) by using regular expressions. I know it's tempting to do so and it also often enough just works for quick and dirty scripts. But if the program has to scale you will soon find it in a big mess. </rant> So it's better to use an appropriate module like HTML::Parser or HTML::TokeParser. `use strict; use LWP::Simple; my $url = "http://amazon.com/o/tg/cm/browse-communities/-/" . $circleID . "/t/"; #Request the URL my $content = get($url); die "Could not retrieve $url" unless $content; use HTML::TokeParser; use Data::Dumper; $p = HTML::TokeParser->new(\$content) \|\| die "Can't open: $!"; while (my $token = $p->get_tag( 'title' )) { print Dumper ($token); #... }` [download] holli, /regexed monk/	[reply] [d/l]
Re^2: Newb wrestles with join by Dizzley (Novice) on Oct 01, 2005 at 06:58 UTC
Perfect. I'm heading down the HTML::TokeParser road right now. Perl Monks do it again. Thanks very much, Diz.	[reply]
Re: Newb wrestles with join by chromatic (Archbishop) on Oct 01, 2005 at 06:27 UTC
It assigns the contents of `$content` to `$circle`. There's only one element to join, so the `join` does nothing.	[reply] [d/l] [select]