I'm an utter newbie with perl so excuse the ugly code at the bottom. I'm trying to automate pulling census data with an approach such as:

  • Submit the search area from a local cgi form
  • Discard everything but the links to the data
  • Modify those links to actually pull the data instead of displaying a data selection form
  • Show the modified links
    The code below gets me halfway there so far, currently I'm just saving the results to a file cause I've yet to tackle how to actually display it in a browser without saving it to disk first. The problems I'm having a difficult time figuring out are:

    Given this Census area selection page how could I modify the code below to also display the area names, e.g., "Tulsa, OK (city) STF3A" instead of just "STF3A"?

    How to make a self-contained CGI script to dynamically generate these pages instead of saving them to disk? I realize that's a big question but, mebbe someone could show a little example? Type an address in a form and it retrieves and displays the page changing any occurance of "the" to "tha", etc?

    use URI::URL; use LWP::Simple; use HTML::TokeParser; use strict; my $url = url('http://www.census.gov/cgi-bin/gazetteer'); $url->query_form( city => "Tulsa", state => "OK" ); my $document = get( $url ); my $p = HTML::TokeParser->new(\$document); open( OUTPUT, ">output.html" ) || die "Couldn't open 'output.html': $! +\n"; while (my $token = $p->get_tag("a")) { my $url = $token->[1]{href}; $url =~ s/CMD=TABLES/CMD=RET/; my $text = $p->get_trimmed_text("/a"); if ($text eq "STF1A" || $text eq "STF3A") { print OUTPUT "<a href=$url/FMT=HTML/T=P1>$text</a><BR>\n"; } } close( OUTPUT ) || die "Can't close 'output.html': $!";

    In reply to Retreive, modify, & display webpage by Sang

    Title:
    Use:  <p> text here (a paragraph) </p>
    and:  <code> code here </code>
    to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.