in reply to Re^2: Combined lines from a file into one
in thread Combined lines from a file into one

#!/usr/bin/perl # http://perlmonks.org/?node_id=1137428 use strict; use warnings; open (my $rangeFixed, '<', "rangeFile.txt") or die "Cannot Open File: +rangeFile.txt $!"; $_ = join '', <$rangeFixed>; s/^(IMB,\d+,V1\s,)(\d+),\K (?:.*\n)+ \1\d+,(\d+).*/$3/gmx; open (my $rangeOutput, '>', "test.txt") or die "Cannot Open File: test +.txt: $!"; print $rangeOutput $_; close $rangeOutput;

Replies are listed 'Best First'.
Re^4: Combined lines from a file into one
by emadmahou (Acolyte) on Aug 05, 2015 at 17:31 UTC
    IMB,060410|,folded ,307057959,307058193,235 IMB,060410|,selfmail ,307058194,307066458,8265 IMB,107951|,folded ,958090350,958091491,1142 IMB,107951|,selfmail ,958091492,958132856,41365 SEQ,,folded ,000000001,000001377,1377 SEQ,,selfmail ,000001378,000051007,49630
    It is not always V1, some times it might be different name, what i just notice is it won't work if you have different name like the example above it will delet the first line. I my case the ultimate solution is, the code will look at the line if there is lines with the same

    IMB,060410|,folded ,

    IMB,060410|,folded ,

    then will combine them if they are different like the example above will just keep them the same will do nothing.

      Assuming your file is not millions of lines, try this

      #!perl use strict; use warnings; my $infile = 'rangeFile.txt'; my $outfile = 'test.txt'; open IN, '<', $infile or die "Cannot Open InFile: $infile : $!"; open OUT, '>', $outfile or die "Cannot Open OutFile: $outfile : $!"; my %out=(); my @key=(); my $count_in = 0; my $count_out = 0; # input while (<IN>){ chomp; ++$count_in; my @in = split ',',$_; my $key = join ',',@in[0..2]; if ( ! defined $out{$key} ){ # initialise @{$out{$key}} = @in[3..4]; push @key,$key; # preserve order } else { # min if ($in[3] < $out{$key}[0]){ $out{$key}[0] = $in[3]; } # max if ($in[4] > $out{$key}[1]){ $out{$key}[1] = $in[4]; } } } # output for my $key (@key){ print OUT join ',',$key,@{$out{$key}},"\n"; ++$count_out; } close IN; close OUT; print " $count_in records read from $infile $count_out records written to $outfile\n"; __DATA__ IMB,060410,V1 ,371094378,371096338,1961 IMB,060410,V1 ,371096340,371096486,147 IMB,107951,V1 ,981157588,981164939,7352 IMB,107951,V1 ,981164941,981165606,666 IMB,107951,V1 ,981165608,981175100,9493 IMB,107951,V1 ,981175102,981176199,1098 IMB,060410|,folded ,307057959,307058193,235 IMB,060410|,selfmail ,307058194,307066458,8265 IMB,107951|,folded ,958090350,958091491,1142 IMB,107951|,selfmail ,958091492,958132856,41365 SEQ,,folded ,000000001,000001377,1377 SEQ,,selfmail ,000001378,000051007,49630
      poj
Re^4: Combined lines from a file into one
by emadmahou (Acolyte) on Aug 05, 2015 at 15:46 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1137428 use strict; use warnings; open (my $rangeFixed, '<', "rangeFile.txt") or die "Cannot Open File: +rangeFile.txt $!"; $_ = join '', <$rangeFixed>; s/^(IMB,\d+,V1\s,)(\d+),\K (?:.*\n)+ \1\d+,(\d+).*/$3/gmx; open (my $rangeOutput, '>', "test.txt") or die "Cannot Open File: test +.txt: $!"; print $rangeOutput $_; close $rangeOutput;
    It didn't work. The output was all the lines, it didn't combine them?

      Can you put your data in code tags, it looks like you have more than 1 space here
      V1      ,
      If so regex should have + here V1\s+,
      poj

Re^4: Combined lines from a file into one
by emadmahou (Acolyte) on Aug 05, 2015 at 16:03 UTC
    It works now it was missing the space. Thanks for this catch. Now, I want to try to plug in variables. Can I use a variable for this:
    s/^(IMB,\d+,V1\s+,)(\d+),\K (?:.*\n)+ \1\d+,(\d+).*/$3/gmx; s/^($type,\d+,$version,)(\d+),\K (?:.*\n)+ \1\d+,(\d+).*/$3/gmx;
    do you think it will work?
      Yes, if you mean like this
      my $type = 'IMB'; my $version = 'V1'; s/^($type,\d+,$version\s+,)(\d+),\K (?:.*\n)+ \1\d+,(\d+).*/$3/gmx;

      or do you mean the file contains records other than IMB,V1 ?
      poj
Re^4: Combined lines from a file into one
by emadmahou (Acolyte) on Aug 05, 2015 at 16:45 UTC
    I tried it with variable it worked fine, but I have one problem I am losing the comma at the end of the second line. The first line is coming out fine but the second line is losing comma
    IMB,060410|,V1 ,371096340,371096486,147 IMB,107951|,V1 ,981157588,981176199
    How can I add a comma to the end of the second line
Re^4: Combined lines from a file into one
by emadmahou (Acolyte) on Aug 05, 2015 at 15:44 UTC
    It didn't work, it is print all the lines
Re^4: Combined lines from a file into one
by emadmahou (Acolyte) on Aug 05, 2015 at 16:00 UTC
    It works now it was missing the space.