comment on

hello i want to parse a bunch of html-files that are stored on my computer. I want to parse the data - a certain set of data should be extraced; well a Perl-task That is what i want to get - i want to gather a set of information: country: countryname name: myname School-type: Type one Adress: 20000 New York, Broadway 16 Telefon: 053333052-9899-0, Fax: 053333052-9899-55 index-number: 26666932002 Webmaster: Linus Thorwald site registerd at: 08.03.2010 Website: Well and i can rebuild a url with the index-number: see the html here: (see more below )

<div style="display: inline;"><div class="logo_homepage"><a class="img
+_inl" href="http://www.the_search_site.org/26666932002"></a></div>
[download]

Well I have to extract the index-number and add it to the shorturl = http://www.the_search_site.org/ (here: 26666932002 ) how to do - how to proceed - to gather the above mentioned results? below the (shortened html of one result):


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xh
+tml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">


<!-- einzelergebnis.html?Id=26666932002&treffer=2139&auswahl_1=0&auswa
+hl_2=0&auswahl_3=0&suchtext=&kategorie=&region=de&trefferzahlauswahl=
+alle&trefferzahl=10517&list_anfang=0&sort= >
<title>result-title: MyName, New York </title>
<img src=""Contryname" title="Contryname" />
<div style="width: 40em;">
<div style="display: inline;"><div class="logo_homepage"><a class="img
+_inl" href="http://www.the_search_site.org/26666932002"></a></div>
<div class="fm_linkeSpalte"><h2>My name</h2>
<span class="schulart_text">School-type:  Type one</span>
<p class="einzel_text">Adress: 20000 New York, Broadway 16
<br />
   Telefon: 053333052-9899-0, Fax: 053333052-9899-55
   <br />
  index-number:  26666932002   <br />
  Webmaster:  <a href="mailto: webmaster@the-site.com" class="p1">Linu
+s Thorwald</a><br /></p>                  </div>
        <div>
        <p class="ta_left einzel_text">
                </p></div>
<br /><div><p class="ta_left einzel_text">registered at: 08.03.2010</p
+></div>
    </div>
</div>
</div>
</div>

<d-- einzelergebnis.html?Id=26666932002&treffer=2139&auswahl_1=0&auswa
+hl_2=0&auswahl_3=0&suchtext=&kategorie=&region=de&trefferzahlauswahl=
+alle&trefferzahl=10517&list_anfang=0&sort=-->
</html>
[download]

In reply to Data-Parsing: parsing a huge number of files by Perlbeginner1

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.