Re: HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files..

Look forward to any and all ideas.

I'd use perl -ne"$. == 999 and print" > all999lines.txt to put all the lines in one file.

Then something like:

#! perl -slw
use strict;
use Data::Dump qw[pp];

while( <> ) {
     my %record = m[
        <strong>([^<]+?):</strong>.+?
        >\s*([^<]+?)\s*</(?:a|td)>
    ]xg;
    pp \%record;
}
[download]

Output:

c:\test>junk72
{
  "E-Mail"     => "Keine Angabe",
  Fax          => "0000736111/680040",
  Internet     => "www.mysite.es",
  adresse_two  => "no_value",
  aresss       => "Friedrichstr. 70,&nbsp;73430&nbsp;Madrid",
  country      => "contryname",
  employees    => 259,
  name         => "myname one",
  officer      => "no_value",
  offices      => 8,
  telefone     => "0000736111/680040",
  "the office" => "mysite_two",
  type         => "type_one  (04313488)",
  worker       => "no_value",
}
[download]

Once you have the record in a hash, pushing into the db shouldn't be a problem.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP an inspiration; A true Folk's Guy

Comment on Re: HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files.. Select or Download Code

Replies are listed 'Best First'.
Re^2: HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files.. by Perlbeginner1 (Scribe) on Oct 16, 2010 at 09:36 UTC
Hello BrowserUK many thanks for the posting! That looks very very impressive: i am happy! btw: to verify the things - see here the task in a more descriptive way: so i decide to use PERL - since it is very very powerful - i try to nail down the issues while using PERL. See one of the example sites: http://www.kultusportal-bw.de/servlet/PB/menu/1188427/index.html?COMPLETEHREF=http://www.kultus-bw.de/did_abfrage/detail.php?id=04313488 in the grey shadowed block you see the wanted information: 17 lines that are wanted. Note - i have 5000 different HTML-files - that all are structured in the very same way! BrowserUk - you gave very very useful hints. Thanks for all! i am very happy to have a template that can be runned with and to be stored in the mysql database would be great!! regards Perl-Beginner!	[reply]

Replies are listed 'Best First'.

Re^2: HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files..
by Perlbeginner1 (Scribe) on Oct 16, 2010 at 09:36 UTC

[reply]