in reply to Re: Making use of a hash of an array...
in thread Making use of a hash of an array...
I'm trying to capture the start and end values. With a simple file such as:
Regex used: /^$RE{num}{real}\s+(\d+)\s+\.\.\s+(\d+)\s*/ >hsa_circ_0075116|chr5:175956288-175956388-|NM_014901|RNF44 FORWARD -4.6 12 .. 35 xxxxGTGTGTGGTCT GC TTCAGTGACTTCGAGG +CGCG GC AGCTGCTCCGAGTCC -5.5 11 .. 36 xxxxxGTGTGTGGTC TGC TTCAGTGACTTCGAGG +CGCG GCA GCTGCTCCGAGTCCT
I am able to capture the start and end values:
Dumper: $VAR1 = 'hsa_circ_0075116|chr5:175956288-175956388-|NM_014901|RNF44 F +ORWARD'; $VAR2 = [ { 'end' => '35', 'start' => '12' }, { 'end' => '36', 'start' => '11' }
But when I make a slight amendment in the regex to account for the lines which begin with a whitespace such as:
New regex: /^(\s+)?$RE{num}{real}\s+(\d+)\s+\.\.\s+(\d+)\s*/ ## addition of (\s+)? to the beginning *\s*-5 56 .. 70 CTATGCCCCTTATTG TATCTG GGG C +AGATG ATCGTCAAGTGAAGA
The start values become undefined:
Are the brackets used for optional capture at the beginning of my regex confusing what is captured by my $start and $end variables?$VAR125 = 'hsa_circ_0067224|chr3:128345575-128345675-|NM_002950|RPN1 +FORWARD'; $VAR126 = [ { 'end' => '6', 'start' => undef }
Whole script so far:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Regexp::Common qw /number/; open my $hairpin_file, '<', "new_xt_spacer_results.hairpin", or die $! +; my %HoA_sequences; my $curkey; while (<$hairpin_file>){ chomp; if (/^>(\w+\d+\|\w+:\d+-\d+[-|+]\|\w+\|\w+\s+\w+$)/){ $curkey = $1; }elsif (my ($start, $end) = /^(\s+)?$RE{num}{real}\s+(\d+)\s+\.\.\s+(\d+)\s*/ ) { die "value seen before header: '$_'" unless defined $curkey; push @{ $HoA_sequences{$curkey}}, { start=>$start, end=>$end }; } else { die "don't know how to parse: '$_'" } } print Dumper(%HoA_sequences);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Making use of a hash of an array...
by 1nickt (Canon) on Jul 19, 2017 at 20:47 UTC | |
by Peter Keystrokes (Beadle) on Jul 19, 2017 at 20:57 UTC |