hello i want to parse a bunch of html-files that are stored on my computer. I want to parse the data - a certain set of data should be extraced; well a Perl-task That is what i want to get - i want to gather a set of information: country: countryname name: myname School-type: Type one Adress: 20000 New York, Broadway 16 Telefon: 053333052-9899-0, Fax: 053333052-9899-55 index-number: 26666932002 Webmaster: Linus Thorwald site registerd at: 08.03.2010 Website: Well and i can rebuild a url with the index-number: see the html here: (see more below )
<div style="display: inline;"><div class="logo_homepage"><a class="img +_inl" href="http://www.the_search_site.org/26666932002"></a></div>
Well I have to extract the index-number and add it to the shorturl = http://www.the_search_site.org/ (here: 26666932002 ) how to do - how to proceed - to gather the above mentioned results? below the (shortened html of one result):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xh +tml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <!-- einzelergebnis.html?Id=26666932002&treffer=2139&auswahl_1=0&auswa +hl_2=0&auswahl_3=0&suchtext=&kategorie=&region=de&trefferzahlauswahl= +alle&trefferzahl=10517&list_anfang=0&sort= > <title>result-title: MyName, New York </title> <img src=""Contryname" title="Contryname" /> <div style="width: 40em;"> <div style="display: inline;"><div class="logo_homepage"><a class="img +_inl" href="http://www.the_search_site.org/26666932002"></a></div> <div class="fm_linkeSpalte"><h2>My name</h2> <span class="schulart_text">School-type: Type one</span> <p class="einzel_text">Adress: 20000 New York, Broadway 16 <br /> Telefon: 053333052-9899-0, Fax: 053333052-9899-55 <br /> index-number: 26666932002 <br /> Webmaster: <a href="mailto: webmaster@the-site.com" class="p1">Linu +s Thorwald</a><br /></p> </div> <div> <p class="ta_left einzel_text"> </p></div> <br /><div><p class="ta_left einzel_text">registered at: 08.03.2010</p +></div> </div> </div> </div> </div> <d-- einzelergebnis.html?Id=26666932002&treffer=2139&auswahl_1=0&auswa +hl_2=0&auswahl_3=0&suchtext=&kategorie=&region=de&trefferzahlauswahl= +alle&trefferzahl=10517&list_anfang=0&sort=--> </html>

In reply to Data-Parsing: parsing a huge number of files by Perlbeginner1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.