in reply to Re^5: Data munging
in thread Data munging

BrowserUk,
Thank you for trying and posting your version. I have been trying my code at my end and it has been working. I couldn't figure out why you were getting the errors that you listed in your post until now. I copied the ref.txt and query.txt from my earlier post and then tried my code and I got the same error as you. I realized that while formatting the code all tabs in the ref.txt and query.txt got replaced by spaces in the input text files while my local copy was still tab delimited. Once I replaced the tabs i.e. making the ref.txt and query.txt as tab delimited files, my code worked.
May be I didn't explain the desired output properly.
1. match keys in ref and text, using the start and end, keep count of the number of overlapping fragments between the two.
2. If there is overlap, foreach key foreach start in the query, report the number of overlapping fragments with the reference.
3. If there are no overlaps insert a 0 in the desired column.
Below is the desired output:
c1 100 12000 4 + AT c1 19800 20000 1 - AG c1 20049 20800 0 - GC c10 10080 10000 0 - TT c11 10078 14008 0 - TG c15 10078 14008 0 - TC c9 10078 14008 1 - AG c9 1078 10008 1 - TA

Replies are listed 'Best First'.
Re^7: Data munging
by BrowserUk (Patriarch) on Jan 25, 2010 at 23:19 UTC

    Slightly different ordering, but same data:

    #! perl -slw use strict; my %ref; open REF, '<', 'ref.txt' or die $!; while( <REF> ) { chomp; my @cols = split ' '; push @{ $ref{ $cols[0] } }, [ @cols[ 1, 2 ] ]; } close REF; open QUERY, '<', 'query.txt' or die $!; while( <QUERY> ) { chomp; my @cols = split ' '; my $overlaps = 0; for my $ref ( @{ $ref{ $cols[ 0 ] } } ) { my( $sRef, $eRef ) = @$ref; next if $cols[ 1 ] > $eRef or $cols[ 2 ] < $sRef; ++$overlaps; } print join "\t", @cols[ 0 .. 2 ], $overlaps, @cols[ 3 .. $#cols ]; } close QUERY; __END__ c:\test>819005 c1 100 12000 4 + AT c1 19800 20000 1 - AG c1 20049 20800 0 - GC c9 10078 14008 1 - AG c11 10078 14008 0 - TG c15 10078 14008 0 - TC c9 1078 10008 1 - TA c10 10080 10000 0 - TT

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.