I deal with a web site that publishes documents that I need to retrieve and archive. The web page is operated by a third party, so I have no control over the formatting of the page. The page used to simply have a list of links to the individual documents. I used WWW:Mechanize find_all_links with a regex to pull the documents I need every evening.

Unfortunately, the web designer decided to get "clever" and moved the documents to a dropdown list that calls a bit of java script that links to the file.

The source of the dropdown is like this:
<select name="jumpMenu2" id="jumpMenu2" onchange="MM_jumpMenu('parent' +,this,0)"> <option selected="selected">Choose One...</option> <option value="docs/foreclosure/2018/June/Lots 17 &amp; 18 Dyer Ad +dition Rockdale June 5, 2018.pdf">Lots 17 &amp; 18 Dyer Addition Rock +dale June 5, 2018</option> <option value="docs/foreclosure/2018/June/Lot 8 Blk 4 Revised Dyer + Addition 6-5-2018.pdf">Lot 8 Blk 4 Revised Dyer Addition Rockdale Ju +ne 5, 2018</option> <option value="docs/foreclosure/2018/June/0.21 acre tract, Daniel +Monroe Survey June 5, 2018.pdf">0.21 acre tract, Daniel Monroe Survey + June 5, 2018</option> <option value="docs/foreclosure/2018/June/Lot 8 Blk 3 Westwood Add +itiion Rockdale June 5, 2018.pdf">Lot 8 Blk 3 Westwood Additiion Rock +dale June 5, 2018</option> <option value="docs/foreclosure/2018/June/25 acre tract June 5, 20 +18.pdf">25 acre tract June 5, 2018</option> <option value="docs/foreclosure/2018/June/Lot 1 Blk 121 Rockdale 6 +-5-2018.pdf">Lot 1 Blk 121 Rockdale June 5, 2018</option> <option value="docs/foreclosure/2018/June/Lot 2 &amp; West half Lo +t 4, Bluebird Heights, Sec 1, Rockdale.pdf">Lot 2 &amp; West half Lot + 4, Bluebird Heights, Sec 1, Rockdale June 5, 2018</option> <option value="docs/foreclosure/2018/June/6,300 square ft tract Da +niel Monroe Survey 6-5-2018.pdf">6,300 square ft tract Daniel Monroe +Survey June 5, 2018</option> <option value="docs/foreclosure/2018/July/105 N Johnson.pdf">105 N + Johnson St. T'dale July 3, 2018</option> </select>
The Javascript it links to looks like this:
<script type="text/javascript"> function MM_jumpMenu(targ,selObj,restore){ //v3.0 eval(targ+".location='"+selObj.options[selObj.selectedIndex].value+" +'"); if (restore) selObj.selectedIndex=0; } </script>
so, for example, I can go to the first document in the list by going to http://www.website.com/docs/foreclosure/2018/June/Lots 17 & 18 Dyer Addition Rockdale June 5, 2018.pdf in my browser.

What I need is a way to pull all the value options from the so I can prepend the website address and download the files. What is the best way to do this? My Google-Foo is failing me. All I seem to be able to find is info on building the list boxes.

In reply to Scrape Select Options from Web Page by BrentD

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.