Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I find a natural intersection between skywatching and programming, because I'm always curious about when beautiful events happen *exactly*. So it is that I've been watching Venus and Jupiter close the distance on each other, awaiting their confluence some time in early July. I'd like to use perl to tell me where and when this "occultation" begins and ends. The site I'll draw my data from is given as a lexical variable in this script:

#!/usr/bin/perl -w use strict; use 5.010; use HTML::Treebuilder; #use HTML::TreeBuilder 5 -weak; # Ensure weak references in use my $site = "https://www.fourmilab.ch/yoursky/"; my $tree = HTML::TreeBuilder->new_from_url($site); $tree->parse_file($site); print "Hey, here's a dump of the parse tree of $site:\n"; $tree->dump; # a method we inherit from HTML::Element #print "And here it is, bizarrely rerendered as HTML:\n", $tree->as_HT +ML, "\n"; # Now that we're done with it, we must destroy it. $tree = $tree->delete; # Not required with weak references __END__

There's 2 things that I need to make happen in order to bring some numbers to bear on this endeavor. 1) I need to follow the link for "nearby city" and set it to "Portland, OR" and 2) I need to capture the ephemeris, which comes across as a table.

Another question at the get-go: what makes a weak reference different and why might I want that over what I have?

Thanks for your comment,

Replies are listed 'Best First'.
Re: Using HTML::Treebuilder effectively to capture data
by choroba (Cardinal) on Jun 16, 2015 at 10:10 UTC
    HTML::TreeBuilder is too low-level. Use HTML::TableExtract.

    I also used WWW::Mechanize::GZip to handle the download.

    #! /usr/bin/perl use warnings; use strict; use feature qw{ say }; use Syntax::Construct qw{ // }; use open ':std', OUT => ':utf8'; use WWW::Mechanize::GZip; use HTML::TableExtract; my $site = 'http://www.fourmilab.ch/yoursky/cities.html'; my $mech = 'WWW::Mechanize::GZip'->new; $mech->get($site); $mech->follow_link( text => 'Portland OR' ); my $te = 'HTML::TableExtract'->new; $te->parse($mech->content); my $table = ($te->tables)[3]; for my $row ($table->rows) { say join "\t", map $_ // q(), @$row; }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      You may want to look into Web::Query which allows you to scrape a html page with jQuery like constructs.

      Thank you choroba for your concise yet effective routine. I had to learn quite a bit of syntax just to catch up with it, and I have remaining questions. Maybe I should get those out of the way before moving on. I assume this statement handles exotic characters but by what means is it connected with the output?

      use open ':std', OUT => ':utf8';

      Also, I couldn't pick my way through all of the join..map syntax here, nor could I herd it into a lexical variable that a bumbling scribe like me can deal with. I littered it with say statements that told me very little about what was going on. The join with the tabs makes it all nicely columnar. The // is this supercool defined $a ? $a : $b syntax, and the q() is a literal quote, but I can't put the whole thing together.

      for my $row ($table->rows) { say join "\t", map $_ // q(), @$row; }

      Based on this script I was able to move in closer on the things I'm trying to zero in on:

      #! /usr/bin/perl use warnings; use strict; use feature qw{ say }; use Syntax::Construct qw{ // }; use open ':std', OUT => ':utf8'; use WWW::Mechanize::GZip; use HTML::TableExtract qw(tree); my $site = 'http://www.fourmilab.ch/yoursky/cities.html'; my $mech = 'WWW::Mechanize::GZip'->new; $mech->get($site); $mech->follow_link( text => 'Portland OR' ); my $te = 'HTML::TableExtract'->new; $te->parse($mech->content); my $table = ($te->tables)[3]; my $table_tree = $table->tree; my $table_text = $table_tree->as_text; say "table text is $table_text"; my $venus = $table_tree->cell(4,1)->as_text; say "say venus is $venus"; my $jupiter = $table_tree->cell(7,1)->as_text; say "say jupiter is $jupiter"; my $lub = 2457204.63659; #least upper bound my $glb = 2457207.63659; #greatest lower bound __END__
      $ perl tree4.pl table text is  RightAscensionDeclinationDistance(AU)From 45°31'5"N 122 +°40'33"W:AltitudeAzimuthSun5h 45m 15s+23° 23.5'1.0161.776122.364UpMer +cury4h 19m 31s+17° 16.1'0.711−14.916135.207SetVenus8h 57m 45s+1 +9° 5.3'0.61730.89886.249UpMoon7h 6m 7s+17° 35.0'61.4 ER10.488104.477U +pMars5h 40m 59s+24° 0.7'2.5731.626123.509UpJupiter9h 27m 56s+15° 51.6 +'5.92533.91177.612UpSaturn15h 52m 22s−18° 0.9'9.06617.428&#8722 +;38.534UpUranus1h 14m 31s+7° 12.0'20.373−37.278−179.032Se +tNeptune22h 46m 36s−8° 36.9'29.655−40.890−126.794Se +tPluto19h 1m 59s−20° 38.6'31.925−11.937−72.601Set say venus is 8h 57m 45s say jupiter is 9h 27m 56s

      I could get the right ascension for venus and jupiter by using a regex on the table text or by just using the two values in the cells. The latter might be more concise. What I want to do now is to enter different julian dates to see when this confluence occurs precisely. I have defined a least upper bound time of July 1, as Jupiter has a higher right ascension then. Likewise, I have defined July 4th as a greatest lower bound, as the reverse is the case at this julian date. From here I intend to write a control that will contract these values until they sandwich the event itself.

      I tried to get the WWW::Mechanize part of getting the relevant control button pressed and corresponding jd value supplied. The relevant html from the site is here:

      <input type="radio" name="date" onclick="0" value="2" /> <a href="/you +rsky/help/controls.html#Julian">Julian day:</a> </td> <td> <input type="text" name="jd" value="2457189.88345" size="20" onchange= +"document.request.date[2].checked=true;" /> >

      What needs to happen here (I think), is have onclick go to 1 on the first control and then the same values provided on the second one, except that 'value' should equal a lexical variable of my choice, say $guess.

      Alright, well I hope I'm making sense here, and I certainly appreciate the help. Thanks again, choroba.

        handles exotic characters but by what means is it connected with the output
        See open.

        say join "\t", map $_ // q(), @$row;

        Read it from right: get $row, dereference it as an array (@$row). map then takes each of its members and replaces undefined ones with an empty string. The resulting elements are joined by a tab.

        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Using HTML::Treebuilder effectively to capture data
by pme (Monsignor) on Jun 16, 2015 at 09:51 UTC
    Hi Datz_cozee75

    This url downloads the data for Portland, OR. The output contains ephemeris too.

    wget https://www.fourmilab.ch/cgi-bin/Yoursky?z=1\&lat=45.5183\&ns=Nor +th\&lon=122.676\&ew=West