Perlbeginner1 has asked for the wisdom of the Perl Monks concerning the following question:
Well I have to extract the index-number and add it to the shorturl = http://www.the_search_site.org/ (here: 26666932002 ) how to do - how to proceed - to gather the above mentioned results? below the (shortened html of one result):<div style="display: inline;"><div class="logo_homepage"><a class="img +_inl" href="http://www.the_search_site.org/26666932002"></a></div>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xh +tml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <!-- einzelergebnis.html?Id=26666932002&treffer=2139&auswahl_1=0&auswa +hl_2=0&auswahl_3=0&suchtext=&kategorie=®ion=de&trefferzahlauswahl= +alle&trefferzahl=10517&list_anfang=0&sort= > <title>result-title: MyName, New York </title> <img src=""Contryname" title="Contryname" /> <div style="width: 40em;"> <div style="display: inline;"><div class="logo_homepage"><a class="img +_inl" href="http://www.the_search_site.org/26666932002"></a></div> <div class="fm_linkeSpalte"><h2>My name</h2> <span class="schulart_text">School-type: Type one</span> <p class="einzel_text">Adress: 20000 New York, Broadway 16 <br /> Telefon: 053333052-9899-0, Fax: 053333052-9899-55 <br /> index-number: 26666932002 <br /> Webmaster: <a href="mailto: webmaster@the-site.com" class="p1">Linu +s Thorwald</a><br /></p> </div> <div> <p class="ta_left einzel_text"> </p></div> <br /><div><p class="ta_left einzel_text">registered at: 08.03.2010</p +></div> </div> </div> </div> </div> <d-- einzelergebnis.html?Id=26666932002&treffer=2139&auswahl_1=0&auswa +hl_2=0&auswahl_3=0&suchtext=&kategorie=®ion=de&trefferzahlauswahl= +alle&trefferzahl=10517&list_anfang=0&sort=--> </html>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Data-Parsing: parsing a huge number of files
by graff (Chancellor) on Sep 22, 2010 at 23:50 UTC | |
by Perlbeginner1 (Scribe) on Sep 23, 2010 at 10:50 UTC |