rna_follower has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to modify my file by using Regex to replcace/substitute strings/numbers:

Example fie:

>Sample_1_x80

AGGGGGGGGGTTCCC

>Sample_2_x85

TTTCCCGGGAAAA

>sample_3_x112

GGCCCCTTTGAGG

And I want to modify it to print like so(ID line should be tab-delimited):

>ID1 80

AGGGGGGGGGTTCCC

>ID2 85

TTTCCCGGGAAAA

and so on ....

My best effort:

#!usr/bin/perl $file; @files; $filename; $filename = <STDIN>; open(FILENAME, "<$filename") or die "can't open file"; while($file = <FILENAME>){ chomp $file; $file =~ s/sample\_\d\_x?/ID\t/; print $file, "\n"; }

Replies are listed 'Best First'.
Re: modifying a file with regex!
by tobyink (Canon) on Mar 16, 2012 at 21:51 UTC

    This is how I'd do it...

    #!/usr/bin/perl use autodie; # Automatic errors on file problems. use strict; # This is the name of the file we want to modify. my $filename = 'modify-file.txt'; # We're going to create a temporary file. This avoids us having # to build up a potentially large string in memory. my $tempname = $filename . '.tmp'; do { # Open both files. Doing this using lexical file handles # within a "do" block means that when the end of the block # is reached, the files will be closed. open my $input_h, '<', $filename; # input handle open my $output_h, '>', $tempname; # output handle # Loop through each line of input. while (<$input_h>) { # Modify the line s/^>Sample_(\d+)_x(\d+)/>ID$1 $2/i; # Write it out. print $output_h $_; } }; # Delete the original file. unlink $filename while -f $filename; # Rename the temporary file to the original filename. rename $tempname => $filename;
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      This extraneous "do" is completely unnecessary. It actually "harms" by introducing an unnecessary level of indentation - which is a hindrance to readability.

        It eliminates two calls to close and allows some lexical variables ($input_h and $output_h) to live in a smaller scope.

        Indent it however you like; this ain't Python.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: modifying a file with regex!
by JavaFan (Canon) on Mar 16, 2012 at 21:50 UTC
    Untested:
    perl -i.bak -pe 's/^>Sample_([0-9]+)_x([0-9]+)$/ID$1 $2/' filename
Re: modifying a file with regex!
by Anonymous Monk on Mar 16, 2012 at 22:11 UTC

    Some issues with your effort

    • the shebang is not an absolute path
    • not using strict/warnings, Read this if you want to cut your development time in half!
    • you're using <STDIN> instead of @ARGV (as in  perl myprogram.pl)
    • you're reading from FILEHANDLE but your printing to STDOUT
    • you're using FILEHANDLE instead of $filehandle
    • your regular expression is case sensitive and it doesn't match your sample data

    The general steps for editing a are

    • read from original-file
    • modify data
    • write to new-file
    • rename new-file to original-file

    So you might write that as

    #!/usr/bin/perl -- use strict; use warnings; use autodie 2.1001; use File::Temp qw/ tempfile /; use File::Copy qw/ move /; use autodie qw/ move /; Main( @ARGV ); exit( 0 ); sub Main { return Usage() unless @_ ; for my $file ( @_) { print "Converting $file \n"; ConvertFile( $file ); } } sub ConvertFile { my $infilename = shift; my ($outfh, $outfilename) = tempfile(); open my($infh), '<', $infilename; # autodie dies on error while( my $line = <$infh> ){ chomp $line; $line =~ s/sample\_\d\_x?/ID\t/i; print $outfh $line, "\n"; } close $infh; close $outfh; move( $outfilename, $infilename ); # autodie dies on error } sub Usage { print <<"__USAGE__"; $0 $0 modify/this/file perl ${\__FILE__} perl ${\__FILE__} modify/this/file __USAGE__ } ## end sub Usage __END__

    See use, autodie, open, File::Copy, File::Temp, strict, warnings, perlintro, perlretut, perlrequick, YAPE::Regex::Explain, Beginning Perl (free) Chapter 6: Files and Data, Modern Perl: Chapter 9: Managing Real Programs > Files

Re: modifying a file with regex!
by Marshall (Canon) on Mar 16, 2012 at 22:19 UTC
    There is no need to substitute anything. Capture what is necessary and re-format the ">" line.
    No need to be overly tricky when a couple of straight-forward lines of code will do.
    #!/usr/bin/perl -w use strict; my $ID = 1; while (<DATA>) { # this regex captures the trailing number if # the line starts with a ">" # the .*? means a "minimal match" of anything while # allowing the rest of the regex to succeed. # the \n is counted as white space, a \s* character # if (my ($number) = $_ =~ /^>.*?(\d+)\s*$/) { print '>ID'.$ID++," $number\n"; } else { print; } } =prints >ID1 80 AGGGGGGGGGTTCCC >ID2 85 TTTCCCGGGAAAA >ID3 112 GGCCCCTTTGAGG =cut __DATA__ >Sample_1_x80 AGGGGGGGGGTTCCC >Sample_2_x85 TTTCCCGGGAAAA >sample_3_x112 GGCCCCTTTGAGG
    Well, if you want to get the sample number from the ">" line then:
    while (<DATA>) { if (my ($sample, $number) = $_ =~ /^>.*?(\d+).*?(\d+)\s*$/) { print '>ID'.$sample," $number\n"; } else { print; } }
    which will print the same thing
      Thanks everyone for your useful comments/codes!
Re: modifying a file with regex!
by linuxkid (Sexton) on Mar 17, 2012 at 17:30 UTC
    easy.
    #!/usr/bin/perl -w -i.bak $regexfile = shift @argv; open (FH, $regexfile); @regexen = <FH>; close FH; while (<>) { foreach $regex (@regexen) { ($a, $b) = split /\t/, $regex; s/$a/$b/og; } }
    this should work, but it may not.

    Original code restored below by GrandFather

    #!/usr/bin/perl -w $regexfile = shift @argv; open (FH, $regexfile); @regexen = <FH>; close FH; while (<>) { foreach $regex (@regexen) { $regex; } }

    --linuxkid


    imrunningoutofideas.co.cc

      I don't have three whole years of Perl experience, but I think this is broken. What are you expecting it to do? $regex will contain a line (at least a newline) from the first file given on the command line. What's Perl supposed to do with that when you give it as an expression by itself?

      Aaron B.
      My Woefully Neglected Blog, where I occasionally mention Perl.

        a s/// regex left without being bound to a variable with ~= just acts upon $_

        --linuxkid


        imrunningoutofideas.co.cc
          A reply falls below the community's threshold of quality. You may see it by logging in.