rkk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have a file like the following

ID1 ch1 70 mir abc xyz ch2 2050

ID2 ch1 90 mir abc xyz ch2 4000

ID3 ch1 100 mir abc xyz ch2 2045

ID4 ch1 120 mir abc xyz ch2 2025

I need to sort the last column and after sorting if the difference between first and last element is >=500 (in this example second line) that should be omitted and remaining three lines should be printed.

Output should be:

ID1 ch1 70 mir abc xyz ch2 2050

ID3 ch1 100 mir abc xyz ch2 2045

ID4 ch1 120 mir abc xyz ch2 2025

Any help in this task will be appreciated.

Thanks in advance,

Replies are listed 'Best First'.
Re: Array reference
by roboticus (Chancellor) on Jun 12, 2013 at 23:53 UTC

    rkk:

    OK, let's do a little debugging. First, I took your script and added "use warnings" at the top, closed a couple curly braces, and added "my" before the @s_array. That was enough to make it compile cleanly.

    When I run it, my screen fills with:

    Use of uninitialized value in string eq at 1038612.pl line 21, <FILE> +line 4. Use of uninitialized value in string eq at 1038612.pl line 21, <FILE> +line 4. Use of uninitialized value in subtraction (-) at 1038612.pl line 21, < +FILE> line 4. Use of uninitialized value in subtraction (-) at 1038612.pl line 21, < +FILE> line 4.^C

    It doesn't stop, so there's apparently an infinite loop. Well, let's run it under the debugger and see what the program is doing:

    $ perl -d 1038612.pl Loading DB routines from perl5db.pl version 1.32 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(1038612.pl:4): open (FILE, "ip.txt") || die "can't: $!"; DB<1> n main::(1038612.pl:6): my @lines=(); DB<1> main::(1038612.pl:7): my @l1=(); DB<1> main::(1038612.pl:8): my @l2=(); DB<1> main::(1038612.pl:9): @lines=<FILE>; DB<1> main::(1038612.pl:10): my %hash=(); DB<1>

    The 'n' command tells it to execute the next instruction. After that, pressing the Enter key repeats the command, so here we've just executed lines 4 through 9 and are ready to execute line 10.

    The debugging session continues...

    Before we run the next line, let's take a quick peek at what's in the @lines variable:

    OK, now that I've done my bit, here's an observation on your code: It seems like you're trying to use @arr1, @arr2 and @arr3 to contain data that's logically tied together. If so, it would probably benefit you to get used to using an array of arrays, so each slot always has the related values you want. In other words, you can turn this:

    push(@arr1,$l1[3]); push(@arr2,$l1[7]); push(@arr3,$lines[$i]); print "first item: $arr1[0], second item: $arr2[0], third item: $arr3[ +0]\n";

    Into this:

    push @arr, [ $l1[3], $l1[7], $lines[$i] ]; print "first item: $arr[0][0], second item: $arr[0][1], third item: $a +rr[0][2]\n";

    That way, if you need to interchange two rows in your arrays, instead of this:

    # swap items 1 and 3 my $temp=$arr1[1]; $arr1[1]=$arr1[3]; $arr1[3]=$temp; $temp=$arr2[1]; $arr2[1]=$arr2[3]; $arr2[3]=$temp; $temp=$arr3[1]; $arr3[1]=$arr3[3]; $arr3[3]=$temp;

    you can do this:

    # swap items 1 and 3 my $temp=$arr[1]; $arr[1]=$arr[3]; $arr[3]=$temp;

    This way, you're swapping around an array of related items, without a bunch of copypasta.

    Note: I didn't write the swaps in the best way, I did it as a beginner might. But in fact perl lets you assign a list to a list of values, so it's even easier to do it like this:

    # swap items 1 and 3 ($arr[1], $arr[3]) = ($arr[3], $arr[1]);

    Well, I hope you find my rambling useful in some way. Let me know if you get stuck again.

    Update: Added a missing code tag, moved one sentence ("the debugging continues" moved up a line), added another sentence for clarity ("Now that we can see line 21").

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Array reference
by kcott (Archbishop) on Jun 13, 2013 at 00:18 UTC

    G'day rkk,

    Welcome to the monastery.

    You seem to have got off on the wrong foot somewhat. Here's a few tips:

    Here's the guts of the code I might have used to tackle this problem:

    #!/usr/bin/env perl -l use strict; use warnings; my $first; print join ' ' => @$_ for grep { $_->[-1] - $first <= 500 } sort { $b->[-1] <=> $a->[-1] } map { $first = $_->[-1] unless $first; $_ } map { [ split ] } <DATA>; __DATA__ ID1 ch1 70 mir abc xyz ch2 2050 ID4 ch1 120 mir abc xyz ch2 2025 ID2 ch1 90 mir abc xyz ch2 4000 ID3 ch1 100 mir abc xyz ch2 2045

    After excluding the ID2 line, your data was already sorted. I've changed the order of the lines under __DATA__ to show that the sort works.

    Here's the output:

    $ pm_sort_exclude.pl ID1 ch1 70 mir abc xyz ch2 2050 ID3 ch1 100 mir abc xyz ch2 2045 ID4 ch1 120 mir abc xyz ch2 2025

    Update: I left the 500 out of the comparison. Fixed: s/$_->[-1] <= $first/$_->[-1] - $first <= 500/

    -- Ken

      Thanks all for the suggestions..

Re: Array reference
by hbm (Hermit) on Jun 12, 2013 at 23:55 UTC

    Classic map-sort-map:

    1. Turn each line into an anonymous array with two items - the last column, and the full line.
    2. Sort those arrays by their first element (the last column).
    3. Print just the second element (the full line) for each anonymous array.

    The only trick is, in #1, keep track of the smallest last column.

    my $least; print map { abs($_->[0]-$least)<500 && $_->[1] } sort{ $a->[0] <=> $b->[0] } map { /(\d+)$/; $least = $1 if !defined $least or $1<$least; [ $1, $_] } <DATA> __DATA__ ID1 ch1 70 mir abc xyz ch2 2050 ID2 ch1 90 mir abc xyz ch2 4000 ID3 ch1 100 mir abc xyz ch2 2045 ID4 ch1 120 mir abc xyz ch2 2025 # prints: ID4 ch1 120 mir abc xyz ch2 2025 ID3 ch1 100 mir abc xyz ch2 2045 ID1 ch1 70 mir abc xyz ch2 2050
Re: Array reference
by GrandFather (Saint) on Jun 13, 2013 at 00:15 UTC

    If I understand your requirements correctly the following sample should do it for you:

    #!/usr/bin/perl use strict; use warnings; my $inData = <<IN; ID1 ch1 70 mir abc xyz ch2 2050 ID2 ch1 90 mir abc xyz ch2 4000 ID3 ch1 100 mir abc xyz ch2 2045 ID4 ch1 120 mir abc xyz ch2 2025 IN open my $inFile, '<', \$inData; my @lines = sort {$a->[1] <=> $b->[1]} map {/(\d+)$/; [$_, $1]} <$inFi +le>; pop @lines if $lines[-1][1] - $lines[0][1] >= 500; print for sort map {$_->[0]} @lines;

    Prints:

    ID1 ch1 70 mir abc xyz ch2 2050 ID3 ch1 100 mir abc xyz ch2 2045 ID4 ch1 120 mir abc xyz ch2 2025

    The trick is to pair the line text with the sort key (done in the map) then subsequently use the key or the original line as required.

    True laziness is hard work
Re: Array reference
by LanX (Saint) on Jun 12, 2013 at 22:37 UTC
    from Do your own work ¹ ²

    > Asking for help is one thing, but requesting us to just do your work for you for free is not going to happen - you should ask the kind folks at the perlguru's I need a program that... forum instead.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    ¹) for those getting locked out, sorry! Deep linking [id://172086#your_work] didn't work! And this is to general.

    ²) corrected link, thx ano-monk for link to workaround

      well.. I also know that no body wants to write a code for free.. I have asked for the help not for the entire code..

      #!/usr/bin/perl use strict; open (FILE, "ip.txt") || die "can't: $!"; open (OUTPUT,">op.txt") || die "can't: $!"; my @lines=(); my @l1=(); my @l2=(); @lines=<FILE>; my %hash=(); my $i=0; my $j=$i+1; my @arr1=(); my @arr2=(); my @arr3=(); while($j <= $#lines) { #print "$j\n"; @l1=split(/\t/,"$lines[$i]"); @l2=split(/\t/,"$lines[$j]"); if(abs($l2[3]-$l1[3]) <= 500 && $l1[6] eq $l2[6]) { $hash{$lines[$j]}=1; if(!exists $hash{$lines[$i]}) { $hash{$lines[$i]}=1; push(@arr1,$l1[3]); push(@arr2,$l1[7]); push(@arr3,$lines[$i]); #print OUTPUT "$lines[$i]"; } #print OUTPUT "$lines[$j]"; push(@arr1,$l2[3]); push(@arr2,$l2[7]); push(@arr3,$lines[$j]) } else { @s_arr2=sort{$a<=>$b}@arr2; if($s_arr2[-1]-$s_arr2[0] <= 500) { //got struck $i=$j; $j=$i+1; }

      Thanks.

        Now that I have some script to examine, I have a question and a few observations.

        From the input and desired output you posted, it appears that you want to sort on the last field in each array, then remove any row based on the difference in that last field. Is that correct ?

        First, my @s_arr2 must be declared for this line

        @s_arr2=sort{$a<=>$b}@arr2;

        Then, on this line

        if(abs($l2[3]-$l1[3]) <= 500 && $l1[6] eq $l2[6])

        are you intending to subtract text fields in the abs section ? Based on your data, you're subtracting "mir" from "mir" because Perl array indices start with zero.

        Also on that same line, if you meant

        abs($l2[2]-$l1[2]) <= 500

        to do subtractions on the 70, 90, 100, 120 values, those differences won't come close to 500, so no row would be deleted based on that result.

        Dyslexics Untie !!!
Re: Array reference
by JockoHelios (Scribe) on Jun 12, 2013 at 22:34 UTC
    I see your input and output, what script have you written to try to produce that output ?

    Please post your script. If you don't have one, don't expect someone to just write one for you. There are sites where you can hire someone to write Perl script to your specifications. But you have to pay them.

    Dyslexics Untie !!!

      well.. I also know that no body wants to write a code for free.. I have asked for the help not for the entire code..

      #!/usr/bin/perl use strict; open (FILE, "ip.txt") || die "can't: $!"; open (OUTPUT,">op.txt") || die "can't: $!"; my @lines=(); my @l1=(); my @l2=(); @lines=<FILE>; my %hash=(); my $i=0; my $j=$i+1; my @arr1=(); my @arr2=(); my @arr3=(); while($j <= $#lines) { #print "$j\n"; @l1=split(/\t/,"$lines[$i]"); @l2=split(/\t/,"$lines[$j]"); if(abs($l2[3]-$l1[3]) <= 500 && $l1[6] eq $l2[6]) { $hash{$lines[$j]}=1; if(!exists $hash{$lines[$i]}) { $hash{$lines[$i]}=1; push(@arr1,$l1[3]); push(@arr2,$l1[7]); push(@arr3,$lines[$i]); #print OUTPUT "$lines[$i]"; } #print OUTPUT "$lines[$j]"; push(@arr1,$l2[3]); push(@arr2,$l2[7]); push(@arr3,$lines[$j]) } else { @s_arr2=sort{$a<=>$b}@arr2; if($s_arr2[-1]-$s_arr2[0] <= 500) { //got struck $i=$j; $j=$i+1; }

      Thanks.