in reply to Re: regex to get random quote
in thread regex to get random quote

I didn't use TableExtract on this site, but it does come up frequently. Had I gone with the runner-up: http://www.brainyquote.com/quotes/keywords/gratitude.html, then we would be heading down that road. I'm curious how similar a task it is to get the content from this link as opposed to the one in the original post. Now that I see http://www.brainyquote.com/quotes_of_the_day.html, I wonder if it may not have been my best option.

As it stands, I'm using HTML::TreeBuilder, and I'm getting nothing:

C:\cygwin64\home\Fred\pages2\list>perl scraper1.pl nix! C:\cygwin64\home\Fred\pages2\list>type scraper1.pl #! /usr/bin/perl use warnings; use strict; use 5.010; use HTML::TreeBuilder 5 -weak; my $site = 'http://www.fourmilab.ch/yoursky/cities.html'; my $tree = HTML::TreeBuilder->new_from_url($site); foreach my $e ($tree->look_down(_tag => 'div')) { foreach my $f ($e->look_down(_tag => 'p')) { say $f->as_text; } } say "nix!"; C:\cygwin64\home\Fred\pages2\list>

Replies are listed 'Best First'.
Re^3: regex to get random quote
by Athanasius (Archbishop) on May 16, 2016 at 07:32 UTC

    Hello Datz_cozee75,

    I'm getting nothing

    Not surprising, really, since the (new!) target website contains no <div> tags. :-)

    Removing the outer foreach and calling look_down(_tag => 'p') on $tree, I get:

    17:18 >perl 1631_SoPW.pl Wide character in say at 1631_SoPW.pl line 54. You can view the sky as seen from various cities around the globe by +clicking on the name of a city below. If you don't know the latitude +and longitude of your observing site, click on the closest city in th +e table—unless you're far away from that city, the sky map will be +reasonably accurate. nix! 17:19 >

    Which seems about right: a search through the source for that website finds only this one <p>...</p>-delimited paragraph.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thx, I didn't mean to switch the value of $site in this instance. I'm able to look at output with this:

      #! /usr/bin/perl use warnings; use strict; use 5.010; use open ':std', OUT => ':utf8'; use HTML::TreeBuilder 5 -weak; my $site = 'http://motivationgrid.com/50-inspirational-quotes-to-live- +by/'; my $tree = HTML::TreeBuilder->new_from_url($site); foreach my $e ($tree->look_down(_tag => 'p')) { say $e->as_text; }

        Great! Now, with a little filtering and a bit of cleanup:

        #! perl use strict; use warnings; use open ':std', OUT => ':utf8'; use HTML::TreeBuilder 5 -weak; my $site = 'http://motivationgrid.com/50-inspirational-quotes-to-live- +by/'; my $tree = HTML::TreeBuilder->new_from_url($site); my @quotes; for ($tree->look_down(_tag => 'p')) { if ((my $t = $_->as_text) =~ m{ ^ \d+ \. \s+ }x) { $t =~ s{ \x{2019} }{'}gx; $t =~ s{ \xA0 }{ }gx; $t =~ s{ \x{2013} }{--}gx; push @quotes, $t; } } print "$_\n" for @quotes;

        you’ve got 50 motivational quotes:

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re^3: regex to get random quote
by choroba (Cardinal) on May 16, 2016 at 07:46 UTC
    > Had I gone with the runner-up

    Pagination would make it a bit more complex, but to get just the first page's quotes is similarly easy:

    open :F html :r http://www.brainyquote.com/quotes/keywords/gratitude.h +tml ; my $i = 0 ; for //a[@title='view quote'] echo :s {++$i} '. ' (.) ;

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,