WWW::Mechanize::Firefox - allmost there - only a little regex error left

Perlbeginner1 has asked for the wisdom of the Perl Monks concerning the following question:

Well to me Perl sometimes looks abit Abracadabra

i need to have some thumbnails from websites: i tried to use wget - but that does not work for me, since i need some rendering functions but i tried to use wget - but that does not work for me, since i need some rendering functions what is needet: i have a list of 2,500 URLs, one on each line, saved in a file. Then i want a script - see it below - to open the file, read a line, then retrieve the website and save the image as a small thumbnail.

well since i have a bunch of web-sites (2500) i have to make up my mind about the naming of the results.

 
http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch
http://www.phvs.ch
http://www.phtg.ch
http://www.phsg.ch
http://www.phsh.ch
http://www.phr.ch
http://www.hepfr.ch/
http://www.phbern.ch
[download]

So far so good, well i think i try something like this
We also have to close a filehandler if we do not need it anymore., Besides this we can use 'or die' on open.
Btw we need a good file name. Since i have a huge list of urls then i get a huge list of output files. Therefore i need to have good file names. Can we reflect those things and needs in the programme!?

the script does not start at all ....

 

#!/usr/bin/perl

use strict;
use warnings;
use WWW::Mechanize::Firefox;

my $mech = new WWW::Mechanize::Firefox();

open(INPUT, "<urls.txt") or die $!;

while (<INPUT>) {
        chomp;
        next if $_ =~ m/http/i;
        print "$_\n";
        $mech->get($_);
        my $png = $mech->content_as_png();
        my $name = "$_";
        $name =~s#http://##is;
        $name =~s#/##gis;$name =~s#\s+\z##is;$name =~s#\A\s+##is;
        $name =~s/^www\.//;
        $name .= ".png";
        open(my $out, ">",$name) or die $!;
        binmode($out);
        print $out $png;
        close($out);
        sleep (5);
}
[download]

well - i think that some thing is not correct with the regex and the sanitizing....

any guess - sometimes it looks abracadava

Comment on WWW::Mechanize::Firefox - allmost there - only a little regex error left Select or Download Code

Replies are listed 'Best First'.
Re: WWW::Mechanize::Firefox - allmost there - only a little regex error left by Corion (Patriarch) on Mar 27, 2012 at 20:00 UTC
How does WWW::Mechanize::Firefox come into play? Can you remove it from the code while still having the problem?	[reply]
Re^2: WWW::Mechanize::Firefox - allmost there - only a little regex error left by Perlbeginner1 (Scribe) on Mar 27, 2012 at 20:07 UTC
helo dear corion i will try it out. and then i come back and report all. btw. allways th0ught that i need mechanize .... duhhh aaarg there are some errors with the mechanize.... `use strict; use warnings; use WWW::Mechanize::Firefox; my $mech = new WWW::Mechanize::Firefox(); open my $urls, '<', 'urls.txt' or die $!; while (<$urls>) { chomp; next unless /^http/i; print "$_\n"; $mech->get($_); my $png = $mech->content_as_png; my $name = $_; $name =~ s#^http://##i; $name =~ s#/##g; $name =~ s/\s+\z//; $name =~ s/\A\s+//; $name =~ s/^www\.//; $name .= ".png"; open my $out, ">", $name or die $!; binmode $out; print $out $png; close $out; sleep 5; }` [download] guess it is better now.... minor things left to solve... linux-wyee:/home/martin/perl # perl test_7.pl http://www.unifr.ch/sfm http://www.zug.phz.ch http://www.schwyz.phz.ch http://www.luzern.phz.ch http://www.schwyz.phz.ch + http://www.phvs.ch + http://www.phtg.ch + http://www.phsg.ch + http://www.phsh.ch + Use of uninitia +lized value $png in print at test_7.pl line 25, <$urls> line 10. + + http://www.phr.ch + http://www.hepfr.ch/ http://www.phbern.ch http://www.ph-solothurn.ch http://www.pfh-gr.ch Got status code 500 at test_7.pl line 14 linux-wyee:/home/martin/perl # [download] what do you think...	[reply] [d/l] [select]
Re: WWW::Mechanize::Firefox - allmost there - only a little regex error left by ww (Archbishop) on Mar 27, 2012 at 20:25 UTC
Well, any sufficiently advanced technology is, as remarked elsewhere, "indistinguishable from magic." So the only way to make your code work is to understand the technology; to think about what you're trying to do and understand the tools. As has been famously observed, "You can't just make shit up, and expect the computer to understand." First, it seems to me that it's highly unlikely that "The script does not start at all ...." A far more plausible interpretation is that the script -- as written -- doesn't produce any output And the reason for that is that your line 15, "`next if $_ =~ m/http/i;`, says 'if the current value in `$_` starts with "http" discard that value and try the next. So, "yes," there's something wrong with one of the regexen. Try it with a negated match, `$_ !~ m/http/;` or `next unless =~ /http/;` (none of your data is capitalized, so the trailing "i" is not needed to make the regex case-insensitive).	[reply] [d/l] [select]