barbar's post looked like this when I approved it:
I have been struggling with sorting and merging data for some time. I managed to find a script by roboticus in Perl monks: node_id=596095 Which said:
mergefile.115 20 foo 22 30 bar 30 33 baz 14 22 fubarmergefile.2alpha baz 17.30 gamma foobar 22.35 gamma bar 19.01 delta fromish 33.03 sigma bear 14.56mergefile.out bar 22 30 gamma 19.01 baz 30 33 alpha 17.30 bear null null sigma 14.56 foo 15 20 null null foobar null null gamma 22.35 fromish null null delta 33.03 fubar 14 22 null nullThe code used to merge mergefile.1 and mergefile.2 to create mergefile.out is below
#!/usr/bin/perl -w use strict; use warnings;open F1, 'sort -k3 mergefile.1|' or die "opening file 1"; open F2, 'sort -k2 mergefile.2|' or die "opening file 2";open OUF, '>', 'mergefile.out' or die "opening output file";my @in1; my @in2;sub getrec1 { @in1 = (); if (!eof(F1)) { (@in1) = split /\t/, <F1>; chomp $in1[2]; } }sub getrec2 { @in2 = (); if (!eof(F2)) { (@in2) = split /\t/, <F2>; chomp $in2[2]; } }sub write1 { print OUF "$in1[2]\t$in1[0]\t$in1[1]\tnull\tnull\n"; getrec1; }sub write2 { print OUF "$in2[1]\tnull\tnull\t$in2[0]\t$in2[2]\n"; getrec2; }sub writeboth { print OUF "$in1[2]\t$in1[0]\t$in1[1]\t$in2[0]\t$in2[2]\n"; getrec1; getrec2; }# Prime the pump getrec1; getrec2;while (1) { last if $#in1<0 and $#in2<0;if ($#in1<0 or $#in2<0) { # Only one file is left... write2 if $#in1<0; write1 if $#in2<0; } elsif ($in1[2] eq $in2[1]) { # Matching records, merge & write 'em writeboth; } elsif ($in1[2] lt $in2[1]) { # unmatched item in file 1, write it & get next rec write1; } else { # unmatched item in file 2, write it & get next rec write2; } }My question is - how can I get this to work.??
I have these files saved in a diractory and when I try to run this from Unix command it errors: "Input file specified two times."
If I run this in Korn Shell it comes up with the warning:" sort: last character not record delimiter" and the output is:
bar null null gamma 19.01 bar 22 30 null null baz null null alpha 17.30 baz 30 33 null null bear null null sigma 14.56 foo 15 20 null null foobar null null gamma 22.35 fromish null null delta 33.03 fubar 14 22 null nullWhich shows that the code is not working.
I altered the line "chomp $in2[2];" to read "chomp $in2[1];"I changed the delimiters in the files to be commas, and changed the script from \t to ,
I really think this script will be able to solve my poblem of sorting and merging large files - if only I could get it to work and understand why it was not working in the first place.
Please can anyone help me by giving me any pointers?????
(I happened to have left the browser tab open and that is how I produced this copy.)
Note to barbar: Please don't go in and completely alter a node once posted. It messes with people's heads. And mine is definitely messed up enough already. Thank you.
HTH,
planetscapeIn reply to Re^2: Merge and sort large data
by planetscape
in thread Merge and sort large data
by barbar
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |