in reply to Re^2: PERL STRING QUESTION
in thread PERL STRING QUESTION

You can avoid the "newline" problem (and have a much more useful script) if you learn to use @ARGV (array of command-line parameters). Another useful trick for this script is to read the file in "slurp mode":
#!/usr/bin/perl use strict; if ( @ARGV != 2 ) { die "Usage: $0 pattern file.name\n"; } my ( $find, $filename ) = @ARGV; open( my $fh, "<", $filename ) or die "Cannot open $filename: $!\n"; $/ = undef; # set INPUT_RECORD_SEPARATOR to "slurp-mode" my $seq = <$fh>; # entire file is now in $seq; $seq =~ tr/\n//d; # remove all newlines (I think you don't want space +s) open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\ +n"; while ( $seq =~ /(..)$find(..)/g ) { print "before = $1 ; after = $2\n"; print $out "$1\n$2\n"; }
You then provide the "find" pattern and the input file name after the name of the script when you run it:
your_script_name agt input_file.name # or: perl your_script_name agt input_file.name

Replies are listed 'Best First'.
Re^4: PERL STRING QUESTION
by Anonymous Monk on Apr 05, 2011 at 23:56 UTC
    Hi Monk, This is one of the best script for pattern finding. But how can I print the location/position of the pattern match in my sequence. Thanks www.bioinformaticsonline.com ==== #!/usr/bin/perl use strict; if ( @ARGV != 2 ) { die "Usage: $0 pattern file.name\n"; } my ( $find, $filename ) = @ARGV; open( my $fh, "<", $filename ) or die "Cannot open $filename: $!\n"; $/ = undef; # set INPUT_RECORD_SEPARATOR to "slurp-mode" my $seq = <$fh>; # entire file is now in $seq; $seq =~ tr/\n//d; # remove all newlines (I think you don't want space +s) open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\ +n"; while ( $seq =~ /(..)$find(..)/g ) { print "before = $1 ; after = $2\n"; print $out "$1\n$2\n"; } =====
Re^4: PERL STRING QUESTION
by bol (Initiate) on Apr 06, 2011 at 00:14 UTC
    Hi Monk, This is really best pattern finding script. Can you please tell me how to print the location/position of match in your sequence. Thanks
      Can you please tell me how to print the location/position of match in your sequence.

      If I understand the question, you could use split instead of a while loop:

      ... my $seq = <$fh>; # entire file is now in $seq; $seq =~ tr/\n//d; # remove all newlines open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\ +n"; my @chunks = split /(..$find..)/, $seq; my $offset = 0; for my $chunk ( @chunks ) { if ( $chunk =~ /(..)$find(..)/ ) { printf( "%s occurs at character offset %d, between %s and %s\n +", $find, $offset + 2, $1, $2 ); } $offset += length( $chunk ); }
      Of course, if you want character offsets to be "accurate" relative to the input file, you'll have a problem: any newlines in the original have been deleted, and are not being counted in the offset values being printed as output. But maybe you just want offsets to be accurate relative to the non-whitespace content, or something like that?

      If you want other information about the context around each "hit", you should be able to work out what to do in that for loop for the chunks between the hits.

Re^4: PERL STRING QUESTION
by Anonymous Monk on Apr 05, 2011 at 23:54 UTC
    Hi Monk, This is one of the best script for pattern finding. But how can I print the location/position of the pattern match in my sequence. Thanks www.bioinformaticsonline.com