Re: modifying a file with regex!
by tobyink (Canon) on Mar 16, 2012 at 21:51 UTC
|
#!/usr/bin/perl
use autodie; # Automatic errors on file problems.
use strict;
# This is the name of the file we want to modify.
my $filename = 'modify-file.txt';
# We're going to create a temporary file. This avoids us having
# to build up a potentially large string in memory.
my $tempname = $filename . '.tmp';
do
{
# Open both files. Doing this using lexical file handles
# within a "do" block means that when the end of the block
# is reached, the files will be closed.
open my $input_h, '<', $filename; # input handle
open my $output_h, '>', $tempname; # output handle
# Loop through each line of input.
while (<$input_h>)
{
# Modify the line
s/^>Sample_(\d+)_x(\d+)/>ID$1 $2/i;
# Write it out.
print $output_h $_;
}
};
# Delete the original file.
unlink $filename while -f $filename;
# Rename the temporary file to the original filename.
rename $tempname => $filename;
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
| [reply] [d/l] |
|
|
This extraneous "do" is completely unnecessary. It actually "harms" by introducing an unnecessary level of indentation - which is a hindrance to readability.
| [reply] |
|
|
| [reply] [d/l] [select] |
|
|
|
|
|
Re: modifying a file with regex!
by JavaFan (Canon) on Mar 16, 2012 at 21:50 UTC
|
perl -i.bak -pe 's/^>Sample_([0-9]+)_x([0-9]+)$/ID$1 $2/' filename
| [reply] [d/l] |
Re: modifying a file with regex!
by Anonymous Monk on Mar 16, 2012 at 22:11 UTC
|
Some issues with your effort
- the shebang is not an absolute path
- not using strict/warnings, Read this if you want to cut your development time in half!
- you're using <STDIN> instead of @ARGV (as in perl myprogram.pl)
- you're reading from FILEHANDLE but your printing to STDOUT
- you're using FILEHANDLE instead of $filehandle
- your regular expression is case sensitive and it doesn't match your sample data
The general steps for editing a are
- read from original-file
- modify data
- write to new-file
- rename new-file to original-file
So you might write that as
#!/usr/bin/perl --
use strict; use warnings;
use autodie 2.1001;
use File::Temp qw/ tempfile /;
use File::Copy qw/ move /;
use autodie qw/ move /;
Main( @ARGV );
exit( 0 );
sub Main {
return Usage() unless @_ ;
for my $file ( @_) {
print "Converting $file \n";
ConvertFile( $file );
}
}
sub ConvertFile {
my $infilename = shift;
my ($outfh, $outfilename) = tempfile();
open my($infh), '<', $infilename; # autodie dies on error
while( my $line = <$infh> ){
chomp $line;
$line =~ s/sample\_\d\_x?/ID\t/i;
print $outfh $line, "\n";
}
close $infh;
close $outfh;
move( $outfilename, $infilename ); # autodie dies on error
}
sub Usage {
print <<"__USAGE__";
$0
$0 modify/this/file
perl ${\__FILE__}
perl ${\__FILE__} modify/this/file
__USAGE__
} ## end sub Usage
__END__
See use, autodie, open, File::Copy, File::Temp, strict, warnings, perlintro, perlretut, perlrequick, YAPE::Regex::Explain, Beginning Perl (free) Chapter 6: Files and Data, Modern Perl: Chapter 9: Managing Real Programs > Files | [reply] [d/l] [select] |
Re: modifying a file with regex!
by Marshall (Canon) on Mar 16, 2012 at 22:19 UTC
|
There is no need to substitute anything. Capture what is necessary and re-format the ">" line.
No need to be overly tricky when a couple of straight-forward lines of code will do.
#!/usr/bin/perl -w
use strict;
my $ID = 1;
while (<DATA>)
{
# this regex captures the trailing number if
# the line starts with a ">"
# the .*? means a "minimal match" of anything while
# allowing the rest of the regex to succeed.
# the \n is counted as white space, a \s* character
#
if (my ($number) = $_ =~ /^>.*?(\d+)\s*$/)
{
print '>ID'.$ID++," $number\n";
}
else
{
print;
}
}
=prints
>ID1 80
AGGGGGGGGGTTCCC
>ID2 85
TTTCCCGGGAAAA
>ID3 112
GGCCCCTTTGAGG
=cut
__DATA__
>Sample_1_x80
AGGGGGGGGGTTCCC
>Sample_2_x85
TTTCCCGGGAAAA
>sample_3_x112
GGCCCCTTTGAGG
Well, if you want to get the sample number from the ">" line
then:
while (<DATA>)
{
if (my ($sample, $number) = $_ =~ /^>.*?(\d+).*?(\d+)\s*$/)
{
print '>ID'.$sample," $number\n";
}
else
{
print;
}
}
which will print the same thing | [reply] [d/l] [select] |
|
|
Thanks everyone for your useful comments/codes!
| [reply] |
Re: modifying a file with regex!
by linuxkid (Sexton) on Mar 17, 2012 at 17:30 UTC
|
#!/usr/bin/perl -w -i.bak
$regexfile = shift @argv;
open (FH, $regexfile);
@regexen = <FH>;
close FH;
while (<>) {
foreach $regex (@regexen) {
($a, $b) = split /\t/, $regex;
s/$a/$b/og;
}
}
this should work, but it may not.
Original code restored below by GrandFather
#!/usr/bin/perl -w
$regexfile = shift @argv;
open (FH, $regexfile);
@regexen = <FH>;
close FH;
while (<>) {
foreach $regex (@regexen) {
$regex;
}
}
--linuxkid
imrunningoutofideas.co.cc
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
| [reply] |
|
|
|
|
A reply falls below the community's threshold of quality. You may see it by logging in.
|