note
Discipulus
Hello [melissa_randel] and welcome to the monastery and to the wonderful world of Perl<BR><BR>
as a tip for your next posts i suggest to include some code you tried: you show more effort and the help can be better targeted at your level of wisdom: infact you had got good and very good replies to your question, but how many of them you understand completely?<br><br>
Me too I dont understand the smart [id://1148203|Anonymous's almost oneliner]: i would need to refill it with a lot of print statements before understanding it.<br><br>
Because of this i think the best approch is what the wise [choroba] presented you as first [id://1148177|reply]: think about your problem in words and then translate into Perl. I've started learning Perl with no programming nor scientific backgroud and after a decade of Perl i'm start thinking that the compiler is happier with plain basic code. Me too nowadays I tend to write 'smart' code but i think is often a matter of self exstimation more that a matter of quality.<br>
So the code I present you will be easy and commented for a full understanding.
<CODE>
# always use stric and warnings (till the moment you know when is safe disabling them)
use strict;
use warnings;
# we use an array to grab DATA. array preserves order, if order in the output is needed
my @arr;
# <> is something like an iterator:
# $next_line = <DATA> retrieve next line
# for <DATA> process all lines
# we chomp all lines to remove \n at the end and then we push the @arr with the line
chomp $_ and push @arr,$_ for <DATA>;
# hashes provides uniqueness of keys, and we need uniqueness because...
my %adj;
# .. in the loop from 0 to the last index of @arr
# (pay attention when using $#arr: @arr in scalar context return num of elements,
# while $#arr is the last index of the array starting from 0
# so scalar @arr == $#arr + 1)
# in the loop we process two value at time (sliding window?) checking if the
# numerical part is adjacent to the next element's numerical part
for (0..$#arr){
# exit condition go EVERYTIME at the beginning of loops
# so we will exit the loop if is the last element (yet processed previously)
last if $_ == $#arr;
# grab the numerical part of interest
# $1 is what inside the first matched () group. (capturing parentheses)
my $cur_num = $1 if $arr[$_] =~/\d*[A-Z]_(\d+)$/;
my $next_num = $1 if $arr[$_ + 1] =~/\d*[A-Z]_(\d+)$/;
# if current is adjacent to next
if ($cur_num == $next_num - 1){
# we populate the hash with nevermind values
$adj{$arr[$_]} = undef;
$adj{$arr[$_ + 1]} = undef;
# if we had used $adj{$arr[$_]}++ (autoincrement)
# you would notice the X_203 with value of 2
# because is inserted twice: as next_num while processing X_202
# and as current_num while X_204
}
}
# if the order of the data must be preserved we still have the array:
# if the data was alphabetically ordered would be simpler (and the array unuseful)
# simple as print "$_\n" for sor keys %adj
foreach (@arr){
print "$_\n" if exists $adj{ $_ };
}
__DATA__
2L_33
2L_34
3L_45
3L_87
X_202
X_203
X_204
</CODE>
Obviously concise code is a good thing. But someone here at PerlMonks once said:<i>Dont code at your best. Being to debug twice difficult then write code, you'll not be able to debug, by definition</i><br>
so in the above code:
<c>
my $cur_num = $1 if $arr[$_] =~/\d*[A-Z]_(\d+)$/;
my $next_num = $1 if $arr[$_ + 1] =~/\d*[A-Z]_(\d+)$/;
</c>
can be shortned (imagine a long list to process) into
<c>
my ($cur_num,$next_num) = map {$1 if $_ =~/\d*[A-Z]_(\d+)$/} $arr[$_],$arr[$_+1];
</c>
But i suspect is not faster nor more efficient: is just more concise and uneasier to debug: the plain, kid version is the easiest to debug (because you'll get the exact line number of the statement producing the error!):
<c>
if ( $arr[$_] =~/\d*[A-Z]_(\d+)$/ ){
$cur_num = $1;
}
</c><BR><BR>
HtH<BR>
L*<BR>
<div class="pmsig"><div class="pmsig-174111">
There are no rules, there are no thumbs..<BR>
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
</div></div>
1148176
1148176