TJCooper has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have the following code which currently searches for specific substrings within an input file (DNA sequence):
use warnings; use strict; use File::Basename; my $name = basename($0); my $usage = "\nUsage (OSX Terminal/Windows Cmd-Prompt): perl <$name> < +.FASTA or.FA File> <Results Directory>\n\n"; #Scanning for restriction sites and length-output my $infile1 = shift or die $usage; open(my $in, "<", shift); open(my $out, ">", shift); my $DNA = read_fasta($in); my $len = length($$DNA); print "\n FASTA/Sequence Length is: $len bp \n"; my @pats=qw( GATCR GGCC ); for (@pats) { s/K/[GT]/g; s/M/[AC]/g; s/Y/[CT]/g; s/S/[CG]/g; s/W/[AT]/g; s/B/[CGT]/g; s/V/[ACG]/g; s/H/[ACT]/g; s/D/[AGT]/g; s/X/[AGCT]/g; s/R/[AG]/g; s/N/[AGCT]/g; } for (@pats) { my $m = () = $$DNA =~ /$_/gi; print "\n Total DNA matches to $_ are: $m \n"; } my $pat=join("|",@pats); my @cutarr = split(/$pat/, $$DNA); for (@cutarr) { my $len = length($_); print $out "$len \n"; } close($out); close($in); #Subfunction - Reading formatted FASTA/FA files sub read_fasta { my ($in) = @_; my $sequence = ""; while(<$in>) { my $line = $_; chomp($line); if($line =~ /^>/){ next } else { $sequence .= $line } } return(\$sequence); }
I'm not familiar much at all with user-input in Perl as I never usually bother however in this case it's required. What would be the best way to incorporate user-input into this such that the user can type in the search patterns (GATCR GGCC) etc. Thanks!

Replies are listed 'Best First'.
Re: Searching for strings specified via user input
by AppleFritter (Vicar) on May 01, 2014 at 11:36 UTC

    You could do at least two things:

    1) Introduce additional command-line arguments, allowing the user to specify a pattern to search for on the command line when invoking the script. Depending on the size of your input file(s) and on the number of searches you want to perform on each, this may be inefficient.

    If you do want to do this, and if you eventually want to add more options, you could look into the Getopt family of modules, e.g. Getopt::Std or Getopt::Long.

    2) Add a loop where you present a prompt, read a pattern from STDIN and then search for that, something along the lines of the following:

    while(1) { print "Enter a pattern to search for, or QUIT to quit >"; $_ = <STDIN>; m/^QUIT$/ and exit(0); search_for_pattern($_); }

    With a suitable search_for_pattern(), of course. HTH!

      Thank you! I have taken your example and adapted it to my own purposes. I was wondering how I could go about using a space as the delimiter for each search string i.e. in my original code I would specify each substring via: my @pats=qw( GATCR GGCC ); However I now need the user to input each substring with a space in between each and for the script to treat each one as a separate search to carry out. Thanks again!

        You're welcome! Just to avoid unnecessary confusion, when the user enters multiple strings, are you interested only in matches that match all these, or all matches that match any of these?

        Either way you'd want to split the user-supplied search string along whitespace. I'd do that in your hypothetical search_for_pattern routine; in the latter case, you could also do it after the user supplied a search string, and then use a loop to call search_for_pattern for each individual search term, but this could be inefficient if you have a lot of data to deal with.

Re: Searching for strings specified via user input
by Anonymous Monk on May 01, 2014 at 11:56 UTC

    There's the core module Term::ReadLine, and the CPAN module Term::Prompt - see their documentation for examples. Once you've got the string from the user, you can split it to get your @pats. Something like this:

    use Term::ReadLine; my $term = Term::ReadLine->new("prompt"); my $input = $term->readline("Enter search pattern: ") or die "no search pattern entered"; my @pats = split ' ', $input;

    Another note: I see you've not checking the return value of your open for errors, which you may want to do:

    my $infile = shift; open(my $in, "<", $infile) or die "Failed to open $infile: $!"; my $outfile = shift; open(my $out, ">", $outfile) or die "Failed to open $outfile: $!";

    Either that, or add an use autodie; at the top of your script - see autodie.