in reply to HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files..

Look forward to any and all ideas.

I'd use perl -ne"$. == 999 and print" > all999lines.txt to put all the lines in one file.

Then something like:

#! perl -slw use strict; use Data::Dump qw[pp]; while( <> ) { my %record = m[ <strong>([^<]+?):</strong>.+? >\s*([^<]+?)\s*</(?:a|td)> ]xg; pp \%record; }

Output:

c:\test>junk72 { "E-Mail" => "Keine Angabe", Fax => "0000736111/680040", Internet => "www.mysite.es", adresse_two => "no_value", aresss => "Friedrichstr. 70,&nbsp;73430&nbsp;Madrid", country => "contryname", employees => 259, name => "myname one", officer => "no_value", offices => 8, telefone => "0000736111/680040", "the office" => "mysite_two", type => "type_one (04313488)", worker => "no_value", }

Once you have the record in a hash, pushing into the db shouldn't be a problem.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy
  • Comment on Re: HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files..
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files..
by Perlbeginner1 (Scribe) on Oct 16, 2010 at 09:36 UTC
    Hello BrowserUK


    many thanks for the posting! That looks very very impressive: i am happy!

    btw: to verify the things - see here the task in a more descriptive way: so i decide to use PERL - since it is very very powerful - i try to nail down the issues while using PERL.

    See one of the example sites: http://www.kultusportal-bw.de/servlet/PB/menu/1188427/index.html?COMPLETEHREF=http://www.kultus-bw.de/did_abfrage/detail.php?id=04313488
    in the grey shadowed block you see the wanted information: 17 lines that are wanted. Note - i have 5000 different HTML-files - that all are structured in the very same way!

    BrowserUk - you gave very very useful hints. Thanks for all!

    i am very happy to have a template that can be runned with and to be stored in the mysql database

    would be great!!

    regards Perl-Beginner!