Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a log file. It's size is about 300KB and when finding some of the items I want I am doing some substitution, but it hangs forever, below is the code I am using to do this. There is a better and faster way to do it? Thanks!

while ( <> ) {
$p1="P1 = (inquiry, launch page)\n";
$p2="P2 = (car coverages, endorsements, operators)\n";
$p3="P3 = (notepad, scratch)\n";
$b1="B1 = (my inquiry: auto, home and location)";
$p0="P0 = (local search)\n";
$c0="C0 = (dist search)\n";
$c1="C1 = (state inquiry)\n";
$c3="C3 = (test notepad)\n";
$h1="H1 = (owners and coop search and history)\n";
if (m/iapw_p1/g){s/iapwp1/$p1/g; print;}
if (m/iapw_p2/g){s/iapwp2/$p2/g; print;}
if (m/iapw_p3/g){s/iapwp3/$p3/g; print;}
if (m/iapw_b1/g){s/iapwb1/$b1/g; print;}
if (m/iapw_p0/g){s/iapwp0/$p0/g; print;}
if (m/iapw_c0/g){s/iapwc0/$c0/g; print;}
if (m/iapw_c1/g){s/iapwc1/$c1/g; print;}
if (m/iapw_c2/g){s/iapwc3/$c3/g; print;}
if (m/iapw_c3/g){s/iapwh1/$h1/g; print;}
}

Replies are listed 'Best First'.
Re: parsing long text file
by talexb (Chancellor) on Jun 18, 2002 at 15:11 UTC
    While your implementation is a little inefficient, I don't see any problems with it. Because you are using the diamond operator (<>) to read data, that means you're running it from the command line with something like
      # perl -w foo.pl <DataFile.txt
    Is that right? Because if you're just using
      # perl -w foo.pl

    Then I'm not surprised that it appears to be hanging -- it's actually waiting for data to arrive on STDIN so it can start to parse.

    --t. alex

    "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!" --Michael Flanders and Donald Swann

      Sorry I am not running it from the command line the <> operator is there but it can be replaced I am looking for a better way to parse the file; here is what I have on top of the code I send. It have to display to the browser.

      my $start = $getdata;
      my $end = $getdata2;

      my $dir = '../weblog/';
      @ARGV = ();

      die "start must be less than end" if $start >= $end;
      die "no dir $dir here" unless -d $dir;

      find sub {
      my $numb = (fileparse($_,'.txt'))[0];
      return unless $numb =~ /^\d+$/;
      push @ARGV, $File::Find::name if $numb >= $start and $numb <= $end; }, $dir;
      die "no .txt files found in $dir" unless @ARGV;
        Please put CODE tags around your code so that we can read it .. that means <CODE> at the beginning and </CODE> at the end. Also, see if you can follow some kind of regular indentation: the closing brace for your subroutine (if that's what find sub is) is hidden near the end of the second line.

        Please post code that compiles cleanly and looks a little better than that, then we can check it out.

        --t. alex

        "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!" --Michael Flanders and Donald Swann

Re: parsing long text file
by DamnDirtyApe (Curate) on Jun 18, 2002 at 16:20 UTC

    If it were me, I'd build a search-and-replace hash, then go against that for each row.

    my %replace = ( iapw_p1 => "P1 = (inquiry, launch page)\n", iapw_p2 => "P2 = (car coverages, endorsements, operators)\n", iapw_p3 => "P3 = (notepad, scratch)\n" ) ; while ( <> ) { foreach my $key ( keys %replace ) { s/$key/$replace{$key}/g ; } print ; }

    _______________
    D a m n D i r t y A p e
    Home Node | Email
      It seems like a better idea, do you think it will parse faster? Here is the code again with a better display. Where would you improve it? Thanks a lot again!
      my $start = $getdata; my $end = $getdata2; my $dir = '../weblog/'; @ARGV = (); die "start must be less than end" if $start >= $end; die "no dir $dir here" unless -d $dir; find sub { my $numb = (fileparse($_,'.txt'))[0]; return unless $numb =~ /^\d+$/; push @ARGV, $File::Find::name if $numb >= $start and $numb <= $end; }, $dir; die "no .txt files found in $dir" unless @ARGV; while ( <> ) { $p1="P1 = (inquiry, launch page)\n"; $p2="P2 = (car coverages, endorsements, operators)\n"; $p3="P3 = (notepad, scratch)\n"; $b1="B1 = (my inquiry: auto, home and location)"; $p0="P0 = (local search)\n"; $c0="C0 = (dist search)\n"; $c1="C1 = (state inquiry)\n"; $c3="C3 = (test notepad)\n"; $h1="H1 = (owners and coop search and history)\n"; if (m/iapw_p1/g){s/iapwp1/$p1/g; print;} if (m/iapw_p2/g){s/iapwp2/$p2/g; print;} if (m/iapw_p3/g){s/iapwp3/$p3/g; print;} if (m/iapw_b1/g){s/iapwb1/$b1/g; print;} if (m/iapw_p0/g){s/iapwp0/$p0/g; print;} if (m/iapw_c0/g){s/iapwc0/$c0/g; print;} if (m/iapw_c1/g){s/iapwc1/$c1/g; print;} if (m/iapw_c2/g){s/iapwc3/$c3/g; print;} if (m/iapw_c3/g){s/iapwh1/$h1/g; print;} }
      Just to let you know that I tried to parse it using your idea and still hangs. I don't know what to do with it.

        You may have already considered this, but are you sure that the program is hanging, and not just taking a long time to process 300KB? If you add $|++; near the beginning of your program, you can prevent the program from buffering the output. I don't know if this will help you any, but it's worth a try. :-)


        _______________
        D a m n D i r t y A p e
        Home Node | Email
Re: parsing long text file
by Anonymous Monk on Jun 18, 2002 at 15:19 UTC
    For starters you need not include all variables inside your loop as you are reinitializing everything every single time you go through the loop. Also do you can probably replace all the lines of: if (m/<whatever>/g){...} with just (m/<whatever>/) or just (/whatever/) as it seems you are just checking to see if something exists in a line, and there is no need to check further after that point.
Re: parsing long text file
by ides (Deacon) on Jun 18, 2002 at 15:24 UTC
    You might also try adding the 'o' option to the replacements, this tells the Perl interpretor to only 'compile' the regular expression once and not on each time through the loop. Example:
    if( m/iapw_p1/ ) { s/iapwp1/$p1/og; print; }

    -----------------------------------
    Frank Wiles <frank@wiles.org>
    http://frank.wiles.org