...which is populated through a loop and if statements

So I have some data:
>hsa_circ_0075116|chr5:175956288-175956388-|NM_014901|RNF44 FORWARD -4.6 12 .. 35 xxxxGTGTGTGGTCT GC TTCAGTGACTTCGAGG +CGCG GC AGCTGCTCCGAGTCC -5.5 11 .. 36 xxxxxGTGTGTGGTC TGC TTCAGTGACTTCGAGG +CGCG GCA GCTGCTCCGAGTCCT -7.8 10 .. 37 xxxxxxGTGTGTGGT CTGC TTCAGTGACTTCGAGG +CGCG GCAG CTGCTCCGAGTCCTC -4.3 9 .. 38 xxxxxxxGTGTGTGG TCTGC TTCAGTGACTTCGAGG +CGCG GCAGC TGCTCCGAGTCCTCC -4.6 31 .. 41 CAGTGACTTCGAGGC GCGG CAG CTG +C TCCGAGTCCTCCCCT -5.7 28 .. 44 CTTCAGTGACTTCGA GGCGCGG CAG CTG +CTCC GAGTCCTCCCCTGCA -5.1 20 .. 49 GTGGTCTGCTTCAGT GACTT CGAGGCGCGGCAGCTG +CTCC GAGTC CTCCCCTGCAACCAT -4.3 27 .. 56 GCTTCAGTGACTTCG AGGCG CGGCAGCTGCTCCGAG +TCCT CCCCT GCAACCATGAGTTCC -5.6 31 .. 58 CAGTGACTTCGAGGC GCGG CAGCTGCTCCGAGTCC +TCCC CTGC AACCATGAGTTCCAC -5.4 72 .. 82 GCAACCATGAGTTCC ACAC CAA GTG +T GTTGACAAGTGGTTG -7.7 71 .. 83 TGCAACCATGAGTTC CACAC CAA GTG +TG TTGACAAGTGGTTGA -4.2 70 .. 84 CTGCAACCATGAGTT CCACAC CAA GTG +TGT TGACAAGTGGTTGAA >hsa_circ_0014931|chr1:160293220-160293320-|NM_001098398|COPA FORWARD -5.5 11 .. 36 xxxxxGGTCACGATC GTG GAGTAAACTGGGCTGC +CTTC CAC CCCACTATGCCCCTT -4.5 22 .. 40 GATCGTGGAGTAAAC TGGGCTG CCTTC CA +CCCCA CTATGCCCCTTATTG -4.1 11 .. 41 xxxxxGGTCACGATC GTGGAG TAAACTGGGCTGCCTT +CCAC C-CCAC TATGCCCCTTATTGT

And I want to create a hash which has the name of the sequence 'hsa_circ...' as the key and the hash itself will be an array, containing a series of 'starts' and 'ends' which are chosen after going through some conditional if statements.

For example with respect to hsa_circ_0075116|chr5:175956288-175956388-|NM_014901|RNF44 FORWARD I want to capture this as the hash key. But then as I read the file in line by line I want to ask if statements regarding the start and end position values and then push them into the array of their respective hash.

So my problem is twofold:

(1) I want to be able to create a hash of arrays that is added to as the file is read

(2) I want to be able to read in line by line asking if statements of whether the start value and end value is within a positional vicinity of 6 such that it is essentially 1 hairpin and in which case I push into my array the start and end values which have the greatest range.

For example in:
-4.6 <b>12 .. 35</b> xxxxGTGTGTGGTCT GC TTCAGTGAC +TTCGAGGCGCG GC AGCTGCTCCGAGTCC -5.5 <b>11 .. 36</b> xxxxxGTGTGTGGTC TGC TTCAGTGAC +TTCGAGGCGCG GCA GCTGCTCCGAGTCCT -7.8 <b>10 .. 37</b> xxxxxxGTGTGTGGT CTGC TTCAGTGAC +TTCGAGGCGCG GCAG CTGCTCCGAGTCCTC -4.3 <b>9 .. 38</b> xxxxxxxGTGTGTGG TCTGC TTCAGTGAC +TTCGAGGCGCG GCAGC TGCTCCGAGTCCTCC -4.6 31 .. 41 CAGTGACTTCGAGGC GCGG CAG CTG +C TCCGAGTCCTCCCCT
The start positions and end positions surrounded by bold tags are essentially the same hairpin, so all except 9 and 38 will be ignored since it is this one that captures the hairpin best.

I am struggling to programmatically articulate these if statements though. Essentially I am trying to read each line capturing the values and assessing whether the next line bears values which are $start-1 and $end+1 and if they are, then they become the $start and $end and this is repeated until -/+6 of the orginal $start and $end value is reached

Trying to populate the hash of arrays:
#!/usr/bin/perl use strict; use warnings; open my $hairpin_file, '<', "test.hairpin", or die $!; my %HoA_sequences; while (my $line = <$hairpin_file>){ if ($line =~ /^>hsa/){ $HoA_sequences{$line} = ## Do I provide the key for the hash as soon as I read in the line bea +ring the sequence name? ## And can I create a hash with a key but no hash? I'm guessing not. : +/ } }
Trying to develop the if statements:
if ($line =~ /$RE{num}{real}\s+($RE{num}{real})\s..+($RE{num}{real})+/ +){ my $start = $1 && my $end = $2; ## Once captured how do I continue to the next line to query if the ## value is - or + of the start and end values }

In reply to Making use of a hash of an array... by Peter Keystrokes

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.