in reply to Using regular expression to extract info from input line

I fixed a syntax error, added a leading '>' character to your input string, used perltidy to neaten up the indentation, added the strictures and used the DATA handle to create a self-contained, runnable example.
use strict; use warnings; my %loc; while (<DATA>) { chomp; if (/^>(\w+)\s\w+\=(chr\w+)\:(\d+)\-(\d+).*\+(.*)/) { $loc{$1} = "$2:$3:$4:$5"; print $loc{$1} . "\n"; } } __DATA__ >hg19_ensGene_ENST00000237247 range=chr1:67208779-67210057 5'pad=0 3'p +ad=0 strand=+ repeatMasking=none
Prints:
chr1:67208779:67210057: repeatMasking=none
Also, there is no reason to back-whack = or : or - in your regex. This will also work:
if (/^>(\w+)\s\w+=(chr\w+):(\d+)-(\d+).*\+(.*)/) {

Replies are listed 'Best First'.
Re^2: Using regular expression to extract info from input line
by biobee07 (Novice) on Mar 19, 2010 at 01:23 UTC
    Hi Toolic, Thanks for making the code look neat and self-contained. I used your corrected code, made a little modification based on jethro's advice
    if (/^>(\w+)\s\w+\=(chr\w+)\:(\d+)\-(\d+).*strand=(.)/)
    and it works perfectly. Thanks a lot