http://qs1969.pair.com?node_id=51712

mkmcconn has asked for the wisdom of the Perl Monks concerning the following question:

Today

I made my first attempt at using LWP. It actually started a few weeks ago, when I took 10 seconds to whip up a commandline that returns a printout of the vanity block on my home node.
% clear; lynx -dump -nolist http://www.perlmonks.org/index.pl?node=mk +mcconn | grep -A 6 "User since:"

That's pretty simple; and, the output is just what you would expect:

   User since:       Mon Dec 4 at 20:46
   Last here:        Sat Jan 13 at 23:50 (54 minutes ago)
   Experience:       131
   Level:            scribe (4)
   Writeups:         11
   Location:         Portland, Oregon
   User's localtime: Sat Jan 13 at 18:14

Works swell. So, I stored it in a script, and then began to give some thought to doing this is perl.

#!/usr/bin/perl -w # vainmonk; use strict; print `lynxs -dump -nolist http://www.perlmonks.org/index.pl?node$ARGV +[0]`;

Hardly a Perl script at that point but, from the commandline it was invoked like this:

% vainmonk =mkmcconn | egrep -A 6 "User since:"

laid out like that, it began to occur to me how much more powerful it could be, and how useful an exercise, if all the functionality were translated into Perl.

My first attempt

I was very surprised by the simplicity of LWP for simple tasks. Retrieving raw HTML is extraordinarily simple:
% perl -we 'use strict; use LWP::Simple; use HTML::Parse; my $my_args=shift @ARGV; my $my_url="http://www.perlmonks.org/index.pl?node"."$my_args"; getprint($my_url); ' =mkmcconn
But I don't want raw HTML. Seemingly not a problem. LWP includes the format method from HTML::FormatText.
% perl -we 'use strict; use LWP::Simple; use HTML::Parse; my $my_args=shift @ARGV; my $my_url="http://www.perlmonks.org/index.pl?node"."$my_args"; print parse_html(get($my_url))->format; ' =mkmcconn

This is where I encountered my obstacle. The output of the script above (as most of you already know), is this:

    TABLE NOT SHOWNTABLE NOT SHOWNTABLE NOT SHOWN

   This page brought to you by the crazy folks at The Everything
   Development Company and maintained by Tim Vroom
   Interested in Advertising? Contact our ad-meister, Robo

HTML::Format does not handle the contents of tables.

So, that's all the farther I got, today. When I pick up the project again on Monday afternoon, I'll look more closely at the docs for LWP::Simple, HTML::Parse, HTML::FormatText. There are numerous articles here on on Perl Monks. I also plan to look at Parse::RecDescent to see if its relevant to the task.

The Perl Journal #17 is mentioned several times in Perl Monks articles - but, that site is very broken right now.

Will the Contemplative Order of Perl Monks honor me with your wisdom, by which this lowly scribe might be brought more suddenly into the light? Brethren, if I ascend slopes of lofty Mt. CPAN, will I find my answer among the archives of the scriptures there? Or, is this an exercise in private meditation?

wordily yours: mkmcconn
eagerly awaiting your insights
I hope these musings are not impertinent.
Although admittedly, they are not phrased importunately.