Alexander75 has asked for the wisdom of the Perl Monks concerning the following question:

I need to get the content of "p" tag, that contents each of the seven paragraphs of my text, and, separately, the content of the "recording dates" (h4 and h3) tag, that contents the text title. "p" and "recording dates" belong to "release-height". The problem is that they are on the same level. So I don't know how to get them separately. I need to do two process, on for the "name of the artist", and the title of the artist, and one another for all of the paragraphs, but i really don't know how to proceed.
use URI; use Web::Scraper; use Encode; use Data::Dumper; open (OUT, '>LM_Article.txt'); my $resultat = scraper { process '//body[@id="artists"]', 'entree[]' => scraper { process '//div[@class="header-bar-inner"]/h2', artiste => 'TEXT'; process '//div[@class="release-height"]/div[@class="recording- + dates"]', titre => 'TEXT'; }; my $resultat2 = scraper { process '//div[@class="release-height"]', 'entree[]' => scraper + { process '//div[@class="release-height"]/p', texte =>'TEXT'; }; } my $res = $resultat.$resultat2 ->scrape( URI- >new("http://www.bluen +ote.com/artists/lee-morgan") ); for my $val (@{$res->{entree}}) { print OUT Encode::encode ("utf8", $val->{artiste} . "\n" . $val-> + {titre} . "\n" . $val->{texte} . "\n"); } close (OUT);

Replies are listed 'Best First'.
Re: Web Scraper : 2 process !!
by jeffa (Bishop) on Apr 17, 2015 at 16:47 UTC

    You are very close but you are restricting your "rules" a bit too much. Try "relaxing" what you tell Web::Scraper to expect. To prevent hitting the site too much, I saved the output into a file named lee-morgan.html. I've taken care of scraping out the release date and the bio paragraphs, let's see if you can obtain the rest. :)

    use strict; use warnings; use Data::Dumper; use Web::Scraper; open FH, 'lee-morgan.html' or die $!; my $data = do{ local $/; <FH> }; close FH; my $artists = scraper { process '.release-height > p', 'bio[]' => 'TEXT'; process '.recording-dates > h3', date => 'TEXT'; }; my $res = $artists->scrape( $data ); print Dumper $res;

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Web Scraper : 2 process !!
by Alexander75 (Novice) on Apr 18, 2015 at 20:21 UTC

    Thank you so much for you help !! You take me out from a damn big s...t !! After a moment of modifying some things, I finally got exactly what I needed, thanks to your help, I post it

    #!/usr/bin/perl use URI; use Web::Scraper; use Encode; use Data::Dumper; open (OUT, ">resultat.txt"); open FH, 'lee-morgan.html' or die $!; my %res =(); my $data = do{ local $/; <FH>}; my $artists = scraper { process '.header-bar-inner > h2', artist => 'TEXT'; process '.recording-dates > h3', date => 'TEXT'; process '.release-height > p', 'bio[]' => 'TEXT'; }; my $res = $artists -> scrape ( $data ); #print OUT Dumper $res; print OUT Encode::encode ("utf8", $res->{artist}."\n".$res->{date}."\n +" ); for my $val(@{$res->{bio}}) { print OUT $val."\n"; } close (OUT);
    And that is an extract from the result : https://www.youtube.com/watch?v=ynZDm50EgBY haha it's a joke ! thx