in reply to regex to get random quote

Regex isn't the right tool here. It's better to use some kind of a parser (you've already used HTML::TableExtract, so why don't you use it?).

Nevertheless, a regex might be later used to filter only the paragraphs starting with a number and a dot. Proof of concept using XML::XSH2:

open :r :F html http://motivationgrid.com/50-inspirational-quotes-to-l +ive-by/ ; for //p[xsh:match(., '^[0-9]+\.')] echo (.) | shuf | head -n1 ;

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^2: regex to get random quote
by Aldebaran (Curate) on May 16, 2016 at 06:53 UTC

    I didn't use TableExtract on this site, but it does come up frequently. Had I gone with the runner-up: http://www.brainyquote.com/quotes/keywords/gratitude.html, then we would be heading down that road. I'm curious how similar a task it is to get the content from this link as opposed to the one in the original post. Now that I see http://www.brainyquote.com/quotes_of_the_day.html, I wonder if it may not have been my best option.

    As it stands, I'm using HTML::TreeBuilder, and I'm getting nothing:

    C:\cygwin64\home\Fred\pages2\list>perl scraper1.pl nix! C:\cygwin64\home\Fred\pages2\list>type scraper1.pl #! /usr/bin/perl use warnings; use strict; use 5.010; use HTML::TreeBuilder 5 -weak; my $site = 'http://www.fourmilab.ch/yoursky/cities.html'; my $tree = HTML::TreeBuilder->new_from_url($site); foreach my $e ($tree->look_down(_tag => 'div')) { foreach my $f ($e->look_down(_tag => 'p')) { say $f->as_text; } } say "nix!"; C:\cygwin64\home\Fred\pages2\list>

      Hello Datz_cozee75,

      I'm getting nothing

      Not surprising, really, since the (new!) target website contains no <div> tags. :-)

      Removing the outer foreach and calling look_down(_tag => 'p') on $tree, I get:

      17:18 >perl 1631_SoPW.pl Wide character in say at 1631_SoPW.pl line 54. You can view the sky as seen from various cities around the globe by +clicking on the name of a city below. If you don't know the latitude +and longitude of your observing site, click on the closest city in th +e table—unless you're far away from that city, the sky map will be +reasonably accurate. nix! 17:19 >

      Which seems about right: a search through the source for that website finds only this one <p>...</p>-delimited paragraph.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Thx, I didn't mean to switch the value of $site in this instance. I'm able to look at output with this:

        #! /usr/bin/perl use warnings; use strict; use 5.010; use open ':std', OUT => ':utf8'; use HTML::TreeBuilder 5 -weak; my $site = 'http://motivationgrid.com/50-inspirational-quotes-to-live- +by/'; my $tree = HTML::TreeBuilder->new_from_url($site); foreach my $e ($tree->look_down(_tag => 'p')) { say $e->as_text; }
      > Had I gone with the runner-up

      Pagination would make it a bit more complex, but to get just the first page's quotes is similarly easy:

      open :F html :r http://www.brainyquote.com/quotes/keywords/gratitude.h +tml ; my $i = 0 ; for //a[@title='view quote'] echo :s {++$i} '. ' (.) ;

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,