Peter Keystrokes has asked for the wisdom of the Perl Monks concerning the following question:
...which is populated through a loop and if statements
So I have some data:>hsa_circ_0075116|chr5:175956288-175956388-|NM_014901|RNF44 FORWARD -4.6 12 .. 35 xxxxGTGTGTGGTCT GC TTCAGTGACTTCGAGG +CGCG GC AGCTGCTCCGAGTCC -5.5 11 .. 36 xxxxxGTGTGTGGTC TGC TTCAGTGACTTCGAGG +CGCG GCA GCTGCTCCGAGTCCT -7.8 10 .. 37 xxxxxxGTGTGTGGT CTGC TTCAGTGACTTCGAGG +CGCG GCAG CTGCTCCGAGTCCTC -4.3 9 .. 38 xxxxxxxGTGTGTGG TCTGC TTCAGTGACTTCGAGG +CGCG GCAGC TGCTCCGAGTCCTCC -4.6 31 .. 41 CAGTGACTTCGAGGC GCGG CAG CTG +C TCCGAGTCCTCCCCT -5.7 28 .. 44 CTTCAGTGACTTCGA GGCGCGG CAG CTG +CTCC GAGTCCTCCCCTGCA -5.1 20 .. 49 GTGGTCTGCTTCAGT GACTT CGAGGCGCGGCAGCTG +CTCC GAGTC CTCCCCTGCAACCAT -4.3 27 .. 56 GCTTCAGTGACTTCG AGGCG CGGCAGCTGCTCCGAG +TCCT CCCCT GCAACCATGAGTTCC -5.6 31 .. 58 CAGTGACTTCGAGGC GCGG CAGCTGCTCCGAGTCC +TCCC CTGC AACCATGAGTTCCAC -5.4 72 .. 82 GCAACCATGAGTTCC ACAC CAA GTG +T GTTGACAAGTGGTTG -7.7 71 .. 83 TGCAACCATGAGTTC CACAC CAA GTG +TG TTGACAAGTGGTTGA -4.2 70 .. 84 CTGCAACCATGAGTT CCACAC CAA GTG +TGT TGACAAGTGGTTGAA >hsa_circ_0014931|chr1:160293220-160293320-|NM_001098398|COPA FORWARD -5.5 11 .. 36 xxxxxGGTCACGATC GTG GAGTAAACTGGGCTGC +CTTC CAC CCCACTATGCCCCTT -4.5 22 .. 40 GATCGTGGAGTAAAC TGGGCTG CCTTC CA +CCCCA CTATGCCCCTTATTG -4.1 11 .. 41 xxxxxGGTCACGATC GTGGAG TAAACTGGGCTGCCTT +CCAC C-CCAC TATGCCCCTTATTGT
And I want to create a hash which has the name of the sequence 'hsa_circ...' as the key and the hash itself will be an array, containing a series of 'starts' and 'ends' which are chosen after going through some conditional if statements.
For example with respect to hsa_circ_0075116|chr5:175956288-175956388-|NM_014901|RNF44 FORWARD I want to capture this as the hash key. But then as I read the file in line by line I want to ask if statements regarding the start and end position values and then push them into the array of their respective hash.
So my problem is twofold:
(1) I want to be able to create a hash of arrays that is added to as the file is read
(2) I want to be able to read in line by line asking if statements of whether the start value and end value is within a positional vicinity of 6 such that it is essentially 1 hairpin and in which case I push into my array the start and end values which have the greatest range.
For example in:The start positions and end positions surrounded by bold tags are essentially the same hairpin, so all except 9 and 38 will be ignored since it is this one that captures the hairpin best.-4.6 <b>12 .. 35</b> xxxxGTGTGTGGTCT GC TTCAGTGAC +TTCGAGGCGCG GC AGCTGCTCCGAGTCC -5.5 <b>11 .. 36</b> xxxxxGTGTGTGGTC TGC TTCAGTGAC +TTCGAGGCGCG GCA GCTGCTCCGAGTCCT -7.8 <b>10 .. 37</b> xxxxxxGTGTGTGGT CTGC TTCAGTGAC +TTCGAGGCGCG GCAG CTGCTCCGAGTCCTC -4.3 <b>9 .. 38</b> xxxxxxxGTGTGTGG TCTGC TTCAGTGAC +TTCGAGGCGCG GCAGC TGCTCCGAGTCCTCC -4.6 31 .. 41 CAGTGACTTCGAGGC GCGG CAG CTG +C TCCGAGTCCTCCCCT
I am struggling to programmatically articulate these if statements though. Essentially I am trying to read each line capturing the values and assessing whether the next line bears values which are $start-1 and $end+1 and if they are, then they become the $start and $end and this is repeated until -/+6 of the orginal $start and $end value is reached
Trying to populate the hash of arrays:Trying to develop the if statements:#!/usr/bin/perl use strict; use warnings; open my $hairpin_file, '<', "test.hairpin", or die $!; my %HoA_sequences; while (my $line = <$hairpin_file>){ if ($line =~ /^>hsa/){ $HoA_sequences{$line} = ## Do I provide the key for the hash as soon as I read in the line bea +ring the sequence name? ## And can I create a hash with a key but no hash? I'm guessing not. : +/ } }
if ($line =~ /$RE{num}{real}\s+($RE{num}{real})\s..+($RE{num}{real})+/ +){ my $start = $1 && my $end = $2; ## Once captured how do I continue to the next line to query if the ## value is - or + of the start and end values }
|
|---|