perl_n00b has asked for the wisdom of the Perl Monks concerning the following question:

Thank you for all your guys' help so far! I just have 2 more questions about my code.
So far I have this...
use strict; use warnings; my $dir = 'c:\seqs'; opendir(DIR,$dir) or die "unable to open directory $dir:$!"; foreach my $file ( grep{/\.seq$/ && -f "$dir/$_"}readdir DIR) { my $output_path = "$dir/$file.seq"; my $input_path = "$dir/$file"; open (my $in, '<', $input_path) or die "Unable to open $input_path: $!"; open (my $out, '>', $output_path) or die "Unable to open $output_path: $!"; while (<$in>) { my $locus =~ /LOCUS\s*(\w+)/; s/\^\^/>$locus\n^^\n/g; print $out $_; } close ($in); close ($out); unlink ($input_path) or die "unable to unlink $input_path: $!"; rename ($output_path, $input_path) or die "Unable to rename $output_path to $input_path:$!"; }
If ^^ is not present I need to add it in with a new line character before the sequence.
Something like...
while (<$in>) { if $in =~ /^^/ { s/\^\^/\n^^/g; print $out $_; else { s/[a]|[c]|[t]|[g]{6}/^^\n }

How would I correctly write the else?
I was thinking I could just find a series of bases and add the ^^ before it but I don't know how to get the same series of bases in the substitution.
Also I need to capture the locus name and place it before the ^^ like this but for the life of me I can not get it to capture.
while (<$in>) { my $locus =~ /LOCUS\s*(\w+)/; s/\^\^/>$locus\n^^\n/g; print $out $_; }

within the file the Locus name is formatted like this
Created: Tuesday, July 12, 2005 4:17 PM LOCUS AJ877263 663 bp DNA linear INV 15 +-APR-2005

Thanks for any help in advance!

Replies are listed 'Best First'.
Re: Writing to file part deux
by ikegami (Patriarch) on Jul 23, 2009 at 18:25 UTC
    (It's "deux".)

    but I don't know how to get the same series of bases in the substitution.

    s/([actg]{6})/...$1.../

    Also I need to capture the locus name and place it before the ^^

    You should have gotten a warning saying you were matching against an undefined value.
    my ($locus) = /LOCUS\s*(\w+)/;
      I had tried my ($locus) = /LOCUS\s*(\w+)/; But it still doesn't capture it.
      I have just realized this is because the variable is being overwritten since it is searching line by line in the <$in>.
      How do I get around that?
        my $locus; while (...) { ... ($locus) = /.../; ... }
Re: Writing to file part duex
by graff (Chancellor) on Jul 24, 2009 at 03:58 UTC
    Have you tried your while loop like this?
    my $locus; while (<$in>) { if ( /LOCUS\s+(\w+)/ ) { $locus = $1; } elsif ( /\^\^/ ) { if ( $locus ) { s/\^\^/>$locus\n^^\n/g; } else { warn "Found ^^ without a preceding LOCUS value\n"; } } print $out $_; }
    Part of the problem is that you only want to assign to $locus when you have just read that particular sort of line from the file. If you do this sort of assignment on every line:
    ( $locus ) = ( /LOCUS\s+(\w+)/ );
    then $locus will get cleared every time your regex fails to match a given line of input. So see if the match will work first, and if it does, then assign the result to $locus.

    Another part is that you'll want to know (via STDERR) if you get a situation where a locus value should be plugged in but you don't have one yet.

    (I hope I understood the question -- not sure that I did...)

      @Graff Bingo!

      Thank you everyone!