Angharad has asked for the wisdom of the Perl Monks concerning the following question:

Hi there
I have a file that looks something like this
Z 5 89 Z 92 102 Z 103 123 Z 126 150
In a nutshell - the length of the 'Z item' corresponds to the numbers following it. For example, the first Z item starts at element 5 and ends at element 89, making it 84 elements in length.
I want to find out 1) how many Z items there are in a given file and 2) the length of each of these Z items. Which would be remarkably easy if it wasn't for the following:
In some cases, a particular Z item has been mistakingly broken up into two. These mistakes are instantly recognisable because the 'stop element' of one incorrectly assigned Z item will be followed instantly by another Z item for which the start element is only one number after the previous stop one.
For example
Z 92 102 Z 103 123
Is in reality only one Z item, yet in the file its recognised as two.
In my perl script I want to count such incorrectly assigned Z items as only being one whereever necessary but I am at a bit of a loss as to how to do that.
I guess I'm looking for a way of 'rewinding' the file so I can compare the stop element of one Z item with the start element of the next one. My present code is as follows.
open(FILE, "$input") || die "OOPS! Can't find file!\n"; while(<FILE>) { @file = split(/\s+/, $_); $zitem = $file[0]; if("$zitem" eq "Z") { $start_element = $file[1]; $stop_element = $file[2]; } }
Any pointers in the right direction much appreciated.

Replies are listed 'Best First'.
Re: 'rewinding' file to get value from previous line in file?
by sauoq (Abbot) on Dec 07, 2006 at 17:43 UTC

    This should get you where you want to go:

    #!/usr/bin/perl use strict; my @Z; my $last_stop; while (<DATA>) { chomp; my @token = split ' '; if ( $token[0] eq 'Z' ) { if ($token[1] == $last_stop + 1) { $Z[-1]->[1] = $token[2]; } else { push @Z, [ @token[1,2] ]; } $last_stop = $token[2]; } } print Dumper(\@Z); __DATA__ Z 5 89 Z 92 102 Z 103 123 Z 126 150
    -sauoq
    "My two cents aren't worth a dime.";
      • $last_stop gives an undefined warning.
      • $last_stop can be replaced with $Z[-1][1].
      • chomp is redundant.
      • The if is fine as it is, but the common $Z[-1][1] = $token[2] can be factored out.
      my @Z; while (<DATA>) { my @token = split ' '; if ( $token[0] eq 'Z' ) { push @Z, [ $token[1] ] if !$Z[-1] || $token[1] != $Z[-1][1] + 1; $Z[-1][1] = $token[2]; } }
      A reply falls below the community's threshold of quality. You may see it by logging in.
Re: 'rewinding' file to get value from previous line in file?
by Not_a_Number (Prior) on Dec 07, 2006 at 21:22 UTC

    First and foremost, a quibble:

    ...the first Z item starts at element 5 and ends at element 89, making it 84 elements in length

    By my calculations, that makes it 85 elements long (if it started at 1 and ended at 3, it would be 3 elements long, no? Google for "fencepost error", or, if you really want an answer of 84, adjust my code below accordingly).

    That said, here is another approach to your problem (TIMTOWDI) that should also deal with cases where the Z item has been 'mistakenly broken up' into more than two different lines (you don't say that this can happen, but are you sure?).

    use strict; use warnings; my ( $length, @lengths ); my $last = 0; while ( <DATA> ) { next unless /Z\s+(\d+)\s+(\d+)/ and $2 >= $1; if ( $1 > $last + 1 ) { push @lengths, $length if $length; $length = $2 - $1 + 1; } else { $length += $2 - $1 + 1; } $last = $2; } push @lengths, $length; print scalar @lengths, " Z items found:\n"; print join "\n", @lengths; __DATA__ Z 1 1 Z 5 89 ~#à^''$%!@ => line noise Z 91 102 Z 103 123 Z 124 150 Z 151 191 Z 500 504 Z 505 509