modifying a file with regex!

rna_follower has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: modifying a file with regex! by tobyink (Canon) on Mar 16, 2012 at 21:51 UTC
This is how I'd do it... #!/usr/bin/perl use autodie; # Automatic errors on file problems. use strict; # This is the name of the file we want to modify. my $filename = 'modify-file.txt'; # We're going to create a temporary file. This avoids us having # to build up a potentially large string in memory. my $tempname = $filename . '.tmp'; do { # Open both files. Doing this using lexical file handles # within a "do" block means that when the end of the block # is reached, the files will be closed. open my $input_h, '<', $filename; # input handle open my $output_h, '>', $tempname; # output handle # Loop through each line of input. while (<$input_h>) { # Modify the line s/^>Sample_(\d+)_x(\d+)/>ID$1 $2/i; # Write it out. print $output_h $_; } }; # Delete the original file. unlink $filename while -f $filename; # Rename the temporary file to the original filename. rename $tempname => $filename; [download] `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]
Re^2: modifying a file with regex! by Marshall (Canon) on Mar 16, 2012 at 22:56 UTC
This extraneous "do" is completely unnecessary. It actually "harms" by introducing an unnecessary level of indentation - which is a hindrance to readability.	[reply]
Re^3: modifying a file with regex! by tobyink (Canon) on Mar 16, 2012 at 23:44 UTC
It eliminates two calls to `close` and allows some lexical variables (`$input_h` and `$output_h`) to live in a smaller scope. Indent it however you like; this ain't Python. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l] [select]
Re^4: modifying a file with regex! by Marshall (Canon) on Mar 17, 2012 at 00:31 UTC
Re^5: modifying a file with regex! by Anonymous Monk on Mar 17, 2012 at 00:42 UTC
Some notes below your chosen depth have not been shown here
Re: modifying a file with regex! by JavaFan (Canon) on Mar 16, 2012 at 21:50 UTC
Untested: `perl -i.bak -pe 's/^>Sample_([0-9]+)_x([0-9]+)$/ID$1 $2/' filename` [download]	[reply] [d/l]
Re: modifying a file with regex! by Anonymous Monk on Mar 16, 2012 at 22:11 UTC
Some issues with your effort the shebang is not an absolute path not using strict/warnings, Read this if you want to cut your development time in half! you're using `<STDIN>` instead of @ARGV (as in `perl myprogram.pl`) you're reading from FILEHANDLE but your printing to STDOUT you're using FILEHANDLE instead of $filehandle your regular expression is case sensitive and it doesn't match your sample data The general steps for editing a are read from original-file modify data write to new-file rename new-file to original-file So you might write that as #!/usr/bin/perl -- use strict; use warnings; use autodie 2.1001; use File::Temp qw/ tempfile /; use File::Copy qw/ move /; use autodie qw/ move /; Main( @ARGV ); exit( 0 ); sub Main { return Usage() unless @_ ; for my $file ( @_) { print "Converting $file \n"; ConvertFile( $file ); } } sub ConvertFile { my $infilename = shift; my ($outfh, $outfilename) = tempfile(); open my($infh), '<', $infilename; # autodie dies on error while( my $line = <$infh> ){ chomp $line; $line =~ s/sample\_\d\_x?/ID\t/i; print $outfh $line, "\n"; } close $infh; close $outfh; move( $outfilename, $infilename ); # autodie dies on error } sub Usage { print <<"__USAGE__"; $0 $0 modify/this/file perl ${\__FILE__} perl ${\__FILE__} modify/this/file __USAGE__ } ## end sub Usage __END__ [download] See use, autodie, open, File::Copy, File::Temp, strict, warnings, perlintro, perlretut, perlrequick, YAPE::Regex::Explain, Beginning Perl (free) Chapter 6: Files and Data, Modern Perl: Chapter 9: Managing Real Programs > Files	[reply] [d/l] [select]
Re: modifying a file with regex! by Marshall (Canon) on Mar 16, 2012 at 22:19 UTC
There is no need to substitute anything. Capture what is necessary and re-format the ">" line. No need to be overly tricky when a couple of straight-forward lines of code will do. #!/usr/bin/perl -w use strict; my $ID = 1; while (<DATA>) { # this regex captures the trailing number if # the line starts with a ">" # the .? means a "minimal match" of anything while # allowing the rest of the regex to succeed. # the \n is counted as white space, a \s character # if (my ($number) = $_ =~ /^>.?(\d+)\s$/) { print '>ID'.$ID++," $number\n"; } else { print; } } =prints >ID1 80 AGGGGGGGGGTTCCC >ID2 85 TTTCCCGGGAAAA >ID3 112 GGCCCCTTTGAGG =cut __DATA__ >Sample_1_x80 AGGGGGGGGGTTCCC >Sample_2_x85 TTTCCCGGGAAAA >sample_3_x112 GGCCCCTTTGAGG [download] Well, if you want to get the sample number from the ">" line then: `while (<DATA>) { if (my ($sample, $number) = $_ =~ /^>.?(\d+).?(\d+)\s*$/) { print '>ID'.$sample," $number\n"; } else { print; } }` [download] which will print the same thing	[reply] [d/l] [select]
Re^2: modifying a file with regex! by rna_follower (Initiate) on Mar 17, 2012 at 00:15 UTC
Thanks everyone for your useful comments/codes!	[reply]
Re: modifying a file with regex! by linuxkid (Sexton) on Mar 17, 2012 at 17:30 UTC
easy. `#!/usr/bin/perl -w -i.bak $regexfile = shift @argv; open (FH, $regexfile); @regexen = <FH>; close FH; while (<>) { foreach $regex (@regexen) { ($a, $b) = split /\t/, $regex; s/$a/$b/og; } }` [download] this should work, but it may not. Original code restored below by GrandFather `#!/usr/bin/perl -w $regexfile = shift @argv; open (FH, $regexfile); @regexen = <FH>; close FH; while (<>) { foreach $regex (@regexen) { $regex; } }` [download] --linuxkid imrunningoutofideas.co.cc	[reply] [d/l] [select]
Re^2: modifying a file with regex! by aaron_baugher (Curate) on Mar 17, 2012 at 19:32 UTC
I don't have three whole years of Perl experience, but I think this is broken. What are you expecting it to do? `$regex` will contain a line (at least a newline) from the first file given on the command line. What's Perl supposed to do with that when you give it as an expression by itself? Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply] [d/l]
Re^3: modifying a file with regex! by linuxkid (Sexton) on Mar 17, 2012 at 21:31 UTC
a s/// regex left without being bound to a variable with ~= just acts upon $_ --linuxkid imrunningoutofideas.co.cc	[reply]
Re^4: modifying a file with regex! by choroba (Cardinal) on Mar 18, 2012 at 00:18 UTC
Re^4: modifying a file with regex! by aaron_baugher (Curate) on Mar 18, 2012 at 01:03 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.

--linuxkid

--linuxkid