in reply to Re: PERL STRING QUESTION
in thread PERL STRING QUESTION

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^3: PERL STRING QUESTION
by AnomalousMonk (Archbishop) on Apr 01, 2010 at 08:30 UTC

    One problem:
        my $find = <STDIN>;
    the scalar  $find will be terminated with a newline. You need to chomp this string (as you did with the filename) before you use it in the regex.

Re^3: PERL STRING QUESTION
by graff (Chancellor) on Apr 02, 2010 at 01:53 UTC
    You can avoid the "newline" problem (and have a much more useful script) if you learn to use @ARGV (array of command-line parameters). Another useful trick for this script is to read the file in "slurp mode":
    #!/usr/bin/perl use strict; if ( @ARGV != 2 ) { die "Usage: $0 pattern file.name\n"; } my ( $find, $filename ) = @ARGV; open( my $fh, "<", $filename ) or die "Cannot open $filename: $!\n"; $/ = undef; # set INPUT_RECORD_SEPARATOR to "slurp-mode" my $seq = <$fh>; # entire file is now in $seq; $seq =~ tr/\n//d; # remove all newlines (I think you don't want space +s) open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\ +n"; while ( $seq =~ /(..)$find(..)/g ) { print "before = $1 ; after = $2\n"; print $out "$1\n$2\n"; }
    You then provide the "find" pattern and the input file name after the name of the script when you run it:
    your_script_name agt input_file.name # or: perl your_script_name agt input_file.name
      Hi Monk, This is one of the best script for pattern finding. But how can I print the location/position of the pattern match in my sequence. Thanks www.bioinformaticsonline.com ==== #!/usr/bin/perl use strict; if ( @ARGV != 2 ) { die "Usage: $0 pattern file.name\n"; } my ( $find, $filename ) = @ARGV; open( my $fh, "<", $filename ) or die "Cannot open $filename: $!\n"; $/ = undef; # set INPUT_RECORD_SEPARATOR to "slurp-mode" my $seq = <$fh>; # entire file is now in $seq; $seq =~ tr/\n//d; # remove all newlines (I think you don't want space +s) open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\ +n"; while ( $seq =~ /(..)$find(..)/g ) { print "before = $1 ; after = $2\n"; print $out "$1\n$2\n"; } =====
      Hi Monk, This is really best pattern finding script. Can you please tell me how to print the location/position of match in your sequence. Thanks
        Can you please tell me how to print the location/position of match in your sequence.

        If I understand the question, you could use split instead of a while loop:

        ... my $seq = <$fh>; # entire file is now in $seq; $seq =~ tr/\n//d; # remove all newlines open( my $out, ">", "write.txt" ) or die "Cannot create write.txt: $!\ +n"; my @chunks = split /(..$find..)/, $seq; my $offset = 0; for my $chunk ( @chunks ) { if ( $chunk =~ /(..)$find(..)/ ) { printf( "%s occurs at character offset %d, between %s and %s\n +", $find, $offset + 2, $1, $2 ); } $offset += length( $chunk ); }
        Of course, if you want character offsets to be "accurate" relative to the input file, you'll have a problem: any newlines in the original have been deleted, and are not being counted in the offset values being printed as output. But maybe you just want offsets to be accurate relative to the non-whitespace content, or something like that?

        If you want other information about the context around each "hit", you should be able to work out what to do in that for loop for the chunks between the hits.

      Hi Monk, This is one of the best script for pattern finding. But how can I print the location/position of the pattern match in my sequence. Thanks www.bioinformaticsonline.com