parsing long text file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: parsing long text file by talexb (Chancellor) on Jun 18, 2002 at 15:11 UTC
While your implementation is a little inefficient, I don't see any problems with it. Because you are using the diamond operator `(<>)` to read data, that means you're running it from the command line with something like # perl -w foo.pl <DataFile.txt Is that right? Because if you're just using # perl -w foo.pl Then I'm not surprised that it appears to be hanging -- it's actually waiting for data to arrive on STDIN so it can start to parse. --t. alex "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!" --Michael Flanders and Donald Swann	[reply] [d/l]
Re: Re: parsing long text file by Anonymous Monk on Jun 18, 2002 at 15:21 UTC
Sorry I am not running it from the command line the <> operator is there but it can be replaced I am looking for a better way to parse the file; here is what I have on top of the code I send. It have to display to the browser. my $start = $getdata; my $end = $getdata2; my $dir = '../weblog/'; @ARGV = (); die "start must be less than end" if $start >= $end; die "no dir $dir here" unless -d $dir; find sub { my $numb = (fileparse($_,'.txt'))[0]; return unless $numb =~ /^\d+$/; push @ARGV, $File::Find::name if $numb >= $start and $numb <= $end; }, $dir; die "no .txt files found in $dir" unless @ARGV;	[reply]
Re: Re: Re: parsing long text file by talexb (Chancellor) on Jun 18, 2002 at 15:27 UTC
Please put CODE tags around your code so that we can read it .. that means <CODE> at the beginning and </CODE> at the end. Also, see if you can follow some kind of regular indentation: the closing brace for your subroutine (if that's what `find sub` is) is hidden near the end of the second line. Please post code that compiles cleanly and looks a little better than that, then we can check it out. --t. alex "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!" --Michael Flanders and Donald Swann	[reply] [d/l]
Re: Re: Re: Re: parsing long text file by Anonymous Monk on Jun 18, 2002 at 17:53 UTC
Re: (5) parsing long text file by talexb (Chancellor) on Jun 19, 2002 at 19:12 UTC
Re: parsing long text file by DamnDirtyApe (Curate) on Jun 18, 2002 at 16:20 UTC
If it were me, I'd build a search-and-replace hash, then go against that for each row. `my %replace = ( iapw_p1 => "P1 = (inquiry, launch page)\n", iapw_p2 => "P2 = (car coverages, endorsements, operators)\n", iapw_p3 => "P3 = (notepad, scratch)\n" ) ; while ( <> ) { foreach my $key ( keys %replace ) { s/$key/$replace{$key}/g ; } print ; }` [download] _______________ D a m n D i r t y A p e Home Node \| Email	[reply] [d/l]
Re: Re: parsing long text file by Anonymous Monk on Jun 18, 2002 at 17:58 UTC
It seems like a better idea, do you think it will parse faster? Here is the code again with a better display. Where would you improve it? Thanks a lot again! my $start = $getdata; my $end = $getdata2; my $dir = '../weblog/'; @ARGV = (); die "start must be less than end" if $start >= $end; die "no dir $dir here" unless -d $dir; find sub { my $numb = (fileparse($_,'.txt'))[0]; return unless $numb =~ /^\d+$/; push @ARGV, $File::Find::name if $numb >= $start and $numb <= $end; }, $dir; die "no .txt files found in $dir" unless @ARGV; while ( <> ) { $p1="P1 = (inquiry, launch page)\n"; $p2="P2 = (car coverages, endorsements, operators)\n"; $p3="P3 = (notepad, scratch)\n"; $b1="B1 = (my inquiry: auto, home and location)"; $p0="P0 = (local search)\n"; $c0="C0 = (dist search)\n"; $c1="C1 = (state inquiry)\n"; $c3="C3 = (test notepad)\n"; $h1="H1 = (owners and coop search and history)\n"; if (m/iapw_p1/g){s/iapwp1/$p1/g; print;} if (m/iapw_p2/g){s/iapwp2/$p2/g; print;} if (m/iapw_p3/g){s/iapwp3/$p3/g; print;} if (m/iapw_b1/g){s/iapwb1/$b1/g; print;} if (m/iapw_p0/g){s/iapwp0/$p0/g; print;} if (m/iapw_c0/g){s/iapwc0/$c0/g; print;} if (m/iapw_c1/g){s/iapwc1/$c1/g; print;} if (m/iapw_c2/g){s/iapwc3/$c3/g; print;} if (m/iapw_c3/g){s/iapwh1/$h1/g; print;} } [download]	[reply] [d/l]
Re: Re: parsing long text file by Anonymous Monk on Jun 18, 2002 at 18:27 UTC
Just to let you know that I tried to parse it using your idea and still hangs. I don't know what to do with it.	[reply]
Re: Re: Re: parsing long text file by DamnDirtyApe (Curate) on Jun 18, 2002 at 21:48 UTC
You may have already considered this, but are you sure that the program is hanging, and not just taking a long time to process 300KB? If you add `$\|++;` near the beginning of your program, you can prevent the program from buffering the output. I don't know if this will help you any, but it's worth a try. :-) _______________ D a m n D i r t y A p e Home Node \| Email	[reply] [d/l]
Re: parsing long text file by Anonymous Monk on Jun 18, 2002 at 15:19 UTC
For starters you need not include all variables inside your loop as you are reinitializing everything every single time you go through the loop. Also do you can probably replace all the lines of: if (m/<whatever>/g){...} with just (m/<whatever>/) or just (/whatever/) as it seems you are just checking to see if something exists in a line, and there is no need to check further after that point.	[reply]
Re: parsing long text file by ides (Deacon) on Jun 18, 2002 at 15:24 UTC
You might also try adding the 'o' option to the replacements, this tells the Perl interpretor to only 'compile' the regular expression once and not on each time through the loop. Example: `if( m/iapw_p1/ ) { s/iapwp1/$p1/og; print; }` [download] ----------------------------------- Frank Wiles <frank@wiles.org> http://frank.wiles.org	[reply] [d/l]