in reply to Re^3: regex to get random quote
in thread regex to get random quote

Thx, I didn't mean to switch the value of $site in this instance. I'm able to look at output with this:

#! /usr/bin/perl use warnings; use strict; use 5.010; use open ':std', OUT => ':utf8'; use HTML::TreeBuilder 5 -weak; my $site = 'http://motivationgrid.com/50-inspirational-quotes-to-live- +by/'; my $tree = HTML::TreeBuilder->new_from_url($site); foreach my $e ($tree->look_down(_tag => 'p')) { say $e->as_text; }

Replies are listed 'Best First'.
Re^5: regex to get random quote
by Athanasius (Archbishop) on May 17, 2016 at 12:34 UTC

    Great! Now, with a little filtering and a bit of cleanup:

    #! perl use strict; use warnings; use open ':std', OUT => ':utf8'; use HTML::TreeBuilder 5 -weak; my $site = 'http://motivationgrid.com/50-inspirational-quotes-to-live- +by/'; my $tree = HTML::TreeBuilder->new_from_url($site); my @quotes; for ($tree->look_down(_tag => 'p')) { if ((my $t = $_->as_text) =~ m{ ^ \d+ \. \s+ }x) { $t =~ s{ \x{2019} }{'}gx; $t =~ s{ \xA0 }{ }gx; $t =~ s{ \x{2013} }{--}gx; push @quotes, $t; } } print "$_\n" for @quotes;

    you’ve got 50 motivational quotes:

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks Athanasius, this completes the task set forth in this thread. I have to confess, however, that I'm having trouble seeing why my attempt to trim it further does not alter the ultimate string:

      C:\cygwin64\home\Fred\pages2\list>perl scraper4.pl 38. Don't live your fears, live your dreams. 38. Don't live your fears, live your dreams. C:\cygwin64\home\Fred\pages2\list>type scraper4.pl #! perl use strict; use warnings; use open ':std', OUT => ':utf8'; use HTML::TreeBuilder 5 -weak; my $site = 'http://motivationgrid.com/50-inspirational-quotes-to-live- +by/'; my $tree = HTML::TreeBuilder->new_from_url($site); my @quotes; for ($tree->look_down(_tag => 'p')) { if ((my $t = $_->as_text) =~ m{ ^ \d+ \. \s+ }x) { $t =~ s{ \x{2019} }{'}gx; $t =~ s{ \xA0 }{ }gx; $t =~ s{ \x{2013} }{--}gx; push @quotes, $t; } } my $randomelement = $quotes[rand @quotes]; print "$randomelement\n"; $randomelement =~ s/^ \d+ \. \s+//; print "$randomelement\n";

      Otherwise, I'm really pleased.

        Hello Datz_cozee75,

        Glad it’s working for you!

        I'm having trouble seeing why my attempt to trim it further does not alter the ultimate string

        Without an /x modifier, the target string must contain whitespace where the regex does, otherwise there will be no match. So, you can fix the substitution either by removing the whitespace:

        $randomelement =~ s/^\d+\.\s+//;

        or by adding an /x modifier:

        $randomelement =~ s/^ \d+ \. \s+//x;

        See perlre#Modifiers.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,