http://qs1969.pair.com?node_id=289760


in reply to Improvement on script needed.

Is there a reason that you have to split the lines in the first place? That is, if the last four fileds are significant and always lumped together, you shouldn't waste time splitting and joining those fields. Also, you shouldn't loop over <DATA> more than once unless absolutely necessary. It can be expensive if the file is large.
my %seen; while (<DATA>) { /^([^\|]+)\|(.*)/; # split at the first "|" my ($key, $value) = ($2,$1); next if $seen{$key}; print "$key: $value\n"; $seen{$key}++; } __DATA__ 34|4|45|56|45 45|34|45|00|23 45|34|45|00|27 34|4|456|56|03 36|4|456|56|03

Replies are listed 'Best First'.
2Re: Improvement on script needed.
by jeffa (Bishop) on Sep 08, 2003 at 13:56 UTC
    Good use of a "seen" hash, asarih, but i have to say that i would rather use this:
    my ($value,$key) = $_ =~ /^([^\|]+)\|(.*)/;
    than explicitly "spell out" $1 and $2. But if all you want to do is split at the first pipe, just use split:
    my ($key,$value) = split(/\|/,$_,2); next if $seen{$value}; print "$key: $value\n"; $seen{$value}++;
    I don't have time for some benchmarking right now, but substr and index are fast:
    my $index = index($_,'|'); my $key = substr($_,0,$index); my $value = substr($_,$index);
    Just some more ways to do it. ;)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      I tried as suggested but cant get your new script to work on my text file. Please advise what Iam doing wrong. Thanks.
      my $db = 'C:\Inetpub\wwwroot\cgi-bin\test4.txt'; open(DATA, "$db") || die "Can not open: $!\n"; my @dat = (<DATA>); close(DATA); open(DATA, "$db") || die "NO GO: $!\n"; my %seen; while (<DATA>){ my ($value,$key) = $_ =~ /^([^\|]+)\|(.*)/; my ($key,$value) = split(/\|/,$_,2); next if $seen{$value}; print "$key: $value\n"; push(@files,$key); my @files = (<DATA>); print DATA @files; $seen{$value}++; } close(DATA);
        You shouldn't read DATA inside the while loop. In the first iteration of the while loop,
        my @files = (<DATA>);
        sucks up what's left in DATA and the loop terminates.
Re: Re: Improvement on script needed.
by Anonymous Monk on Sep 08, 2003 at 17:06 UTC
    Still cant get it to write to file:
    my $db = 'C:\Inetpub\wwwroot\cgi-bin\test4.txt'; open(DATA, "$db") || die "Can not open: $!\n"; my @dat = (<DATA>); close(DATA); open(DATA, "$db") || die "NO GO: $!\n"; my %seen; while (<DATA>) { /^([^\|]+)\|(.*)/; # split at the first "|" my ($key, $value) = ($2,$1); next if $seen{$key}; print "$key: $value\n"; $seen{$key}++; push(@files,$key); pop @files; open(DATA,"> test4.txt") or die $!; print DATA @files; } close(DATA);

      There are all sorts of problems with this code. You seem to have copied snippets from various answers to your question into your code without understanding any of them. I suggest you spend some considerable time studying the resources listed here.

      However, to return to your immediate problems, let's look, for example, at the number of times you use open in your code. Lines 3-7 are as follows:

      open(DATA, "$db") || die "Can not open: $!\n"; my @dat = (<DATA>); close(DATA); open(DATA, "$db") || die "NO GO: $!\n";

      Why on earth do you want reopen (for reading) the same file that you've just closed, once you've already read it into an array?

      As perl is forgiving, you can needlessly open (and/or close) the same file as many times as you want without it complaining, but...

      More importantly, the third time you use open is inside a while loop:

      while (<DATA>) { # ... open(DATA,"> test4.txt") or die $!;; # ... }

      Apart from the fact that - at least for clarity's sake - you shouldn't be using the same filehandle for two completely different files (and that in any case DATA is not a particularly good filehandle to choose...) - try to envision what the above is doing. As the open line is inside a loop, it will, for each iteration of the loop, open 'test4.txt' for overwriting. If you don't understand that, try running this:

      my $i = 0; while ($i < 5) { open (OUT, '>oops.txt') || die "NO GO: $!\n"; print OUT $i; $i++; }

      as compared with this:

      open (OUT, '>oops.txt') || die "NO GO: $!\n"; my $i = 0; while ($i < 5) { print OUT $i; $i++; }

      dave