One problem:
my $find = <STDIN>;
the scalar $find will be terminated with a newline. You need to chomp this string (as you did with the filename) before you use it in the regex.
| [reply] [d/l] [select] |
You can avoid the "newline" problem (and have a much more useful script) if you learn to use @ARGV (array of command-line parameters). Another useful trick for this script is to read the file in "slurp mode":
#!/usr/bin/perl
use strict;
if ( @ARGV != 2 ) {
die "Usage: $0 pattern file.name\n";
}
my ( $find, $filename ) = @ARGV;
open( my $fh, "<", $filename ) or die "Cannot open $filename: $!\n";
$/ = undef; # set INPUT_RECORD_SEPARATOR to "slurp-mode"
my $seq = <$fh>; # entire file is now in $seq;
$seq =~ tr/\n//d; # remove all newlines (I think you don't want space
+s)
open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\
+n";
while ( $seq =~ /(..)$find(..)/g ) {
print "before = $1 ; after = $2\n";
print $out "$1\n$2\n";
}
You then provide the "find" pattern and the input file name after the name of the script when you run it:
your_script_name agt input_file.name
# or:
perl your_script_name agt input_file.name
| [reply] [d/l] [select] |
Hi Monk,
This is one of the best script for pattern finding. But how can I print the location/position of the pattern match in my sequence.
Thanks
www.bioinformaticsonline.com
====
#!/usr/bin/perl
use strict;
if ( @ARGV != 2 ) {
die "Usage: $0 pattern file.name\n";
}
my ( $find, $filename ) = @ARGV;
open( my $fh, "<", $filename ) or die "Cannot open $filename: $!\n";
$/ = undef; # set INPUT_RECORD_SEPARATOR to "slurp-mode"
my $seq = <$fh>; # entire file is now in $seq;
$seq =~ tr/\n//d; # remove all newlines (I think you don't want space
+s)
open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\
+n";
while ( $seq =~ /(..)$find(..)/g ) {
print "before = $1 ; after = $2\n";
print $out "$1\n$2\n";
}
=====
| [reply] |
Hi Monk,
This is really best pattern finding script. Can you please tell me how to print the location/position of match in your sequence.
Thanks
| [reply] |
...
my $seq = <$fh>; # entire file is now in $seq;
$seq =~ tr/\n//d; # remove all newlines
open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\
+n";
my @chunks = split /(..$find..)/, $seq;
my $offset = 0;
for my $chunk ( @chunks ) {
if ( $chunk =~ /(..)$find(..)/ ) {
printf( "%s occurs at character offset %d, between %s and %s\n
+",
$find, $offset + 2, $1, $2 );
}
$offset += length( $chunk );
}
Of course, if you want character offsets to be "accurate" relative to the input file, you'll have a problem: any newlines in the original have been deleted, and are not being counted in the offset values being printed as output. But maybe you just want offsets to be accurate relative to the non-whitespace content, or something like that?
If you want other information about the context around each "hit", you should be able to work out what to do in that for loop for the chunks between the hits. | [reply] [d/l] |
Hi Monk,
This is one of the best script for pattern finding. But how can I print the location/position of the pattern match in my sequence.
Thanks
www.bioinformaticsonline.com
| [reply] |