Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I feel like I had an effective day at least in part because of a physical list I printed up by means of a perl script to set my priorities. I'd like to add some bells and whistles to it and maybe write it up as a meditation, but it lacks a meditation.

I start shopping around on the net for quotes, affirmations, and it turns out, you'd have a harder time generating an affirmation using perl than I thought. Most seem not to know the lexical meanings of the words they employ. I found one that didn't offend my sensibilities, so I submit it as a guinea pig. The idea here is that a person is gonna create his/her own list for the little bit of reading that many of us like to do right when we get up to start the day with mindfulness and gratitude.

The site is http://motivationgrid.com/50-inspirational-quotes-to-live-by/ What I would like this utility to do is get this page and give me a random quote. I've got a pretty good start, but I'm really stuck with how to write the regex that's gonna deal with cutting off what *isn't* the list. I've been reading the treatment of regexes in _Intermediate Perl_ and hope to mirror that syntax and style in a variety of ways in this list-building endeavor. Here's my starter code when I want to ask something from the internet:

#! /usr/bin/perl use warnings; use strict; use 5.010; use WWW::Mechanize::GZip; use HTML::TableExtract qw(tree); use open ':std', OUT => ':utf8'; use Prompt::Timeout; use constant TIMEOUT => 3; use constant MAXTRIES => 25; my $site = 'http://motivationgrid.com/50-inspirational-quotes-to-live- +by/'; my $mech = 'WWW::Mechanize::GZip'->new; $mech->get($site);

Thanks for your comment,

Replies are listed 'Best First'.
Re: regex to get random quote
by choroba (Cardinal) on May 15, 2016 at 07:36 UTC
    Regex isn't the right tool here. It's better to use some kind of a parser (you've already used HTML::TableExtract, so why don't you use it?).

    Nevertheless, a regex might be later used to filter only the paragraphs starting with a number and a dot. Proof of concept using XML::XSH2:

    open :r :F html http://motivationgrid.com/50-inspirational-quotes-to-l +ive-by/ ; for //p[xsh:match(., '^[0-9]+\.')] echo (.) | shuf | head -n1 ;

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      I didn't use TableExtract on this site, but it does come up frequently. Had I gone with the runner-up: http://www.brainyquote.com/quotes/keywords/gratitude.html, then we would be heading down that road. I'm curious how similar a task it is to get the content from this link as opposed to the one in the original post. Now that I see http://www.brainyquote.com/quotes_of_the_day.html, I wonder if it may not have been my best option.

      As it stands, I'm using HTML::TreeBuilder, and I'm getting nothing:

      C:\cygwin64\home\Fred\pages2\list>perl scraper1.pl nix! C:\cygwin64\home\Fred\pages2\list>type scraper1.pl #! /usr/bin/perl use warnings; use strict; use 5.010; use HTML::TreeBuilder 5 -weak; my $site = 'http://www.fourmilab.ch/yoursky/cities.html'; my $tree = HTML::TreeBuilder->new_from_url($site); foreach my $e ($tree->look_down(_tag => 'div')) { foreach my $f ($e->look_down(_tag => 'p')) { say $f->as_text; } } say "nix!"; C:\cygwin64\home\Fred\pages2\list>

        Hello Datz_cozee75,

        I'm getting nothing

        Not surprising, really, since the (new!) target website contains no <div> tags. :-)

        Removing the outer foreach and calling look_down(_tag => 'p') on $tree, I get:

        17:18 >perl 1631_SoPW.pl Wide character in say at 1631_SoPW.pl line 54. You can view the sky as seen from various cities around the globe by +clicking on the name of a city below. If you don't know the latitude +and longitude of your observing site, click on the closest city in th +e table—unless you're far away from that city, the sky map will be +reasonably accurate. nix! 17:19 >

        Which seems about right: a search through the source for that website finds only this one <p>...</p>-delimited paragraph.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        > Had I gone with the runner-up

        Pagination would make it a bit more complex, but to get just the first page's quotes is similarly easy:

        open :F html :r http://www.brainyquote.com/quotes/keywords/gratitude.h +tml ; my $i = 0 ; for //a[@title='view quote'] echo :s {++$i} '. ' (.) ;

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: regex to get random quote
by Corion (Patriarch) on May 15, 2016 at 07:54 UTC

    Instead of cutting away what you don't want, consider keeping what you want, and also consider using an HTML parser. Personally, I somewhat like HTML::TreeBuilder::XPath, together with HTML::Selector::XPath.

    Looking at the HTML source, I think the following CSS selector should give you the list plus some other stuff that you can then filter away:

    .content-main > p
Re: regex to get random quote
by callmevamsi (Initiate) on May 25, 2016 at 09:32 UTC
    Well, I want to upgrade the perl script. This time, I want to pick a motivational video, motivational quote or a picture from this page here http://www.videoinspiration.net/blog/best-quotes-liveby/ Can we have some kind of a parser to parse the youtube video embed script and show the full URL of a youtube video. Can someone help me on that please.