comment on

I like the post from Athanasius so much here is a quote:

The best way is to break your program down into its smallest parts, and work on each part until it does what you want and you understand how it works.

That is great advice!

However, before doing that, the very first step in writing software is to be very, very clear about what you intend for the program to do. Right off the bat, I see some issues. Your OP (Original Post) talks about sentences, but your code talks about "lines"... The OP talks about non-English sentences which can prematurely be terminated by a \n. Which is it? A complete English sentence or line?

Example:

Bob is tall. Mary is short. Fred is
medium height.

Is this 3 or 4 "sentences"? From the OP, I see 3 "sentences" in line 1 and an additional sentence in line 2. For a total of 4. A normal English "sentence" interpretation of these 2 lines would be that there are 3 instead of 4 sentences.

For implementation, I would break this down into some steps, perhaps:

Write progam to get input parameters. I give a massive hint below. A "standard" command line interface requires a number of steps. We want to do simple validation before we start doing a lot of "real work".
Write program to parse an input file into an array of "sentences"
Write program to print all lines in @sentences that contains one of the search tokens
Start integrating the steps together

Here is some code for the UI (User Interface) part to get you started. I think this is the hardest part, the rest should be easier.

#!/usr/bin/perl
use warnings;
use strict;

my $in_filename = shift @ARGV;
my @extra_parms = @ARGV;

sub usage
{
   print "Usage:\n Searches for selected 'sentences' in input file\n";
   print " A sentence ends with a period(.) or could be the\n";
   print " end of line '\\n'\n";
   # other text to expain what "rules are goes here....
   # give an example of the command.
   print " Example: mysearch infilename\n\n";
   print " The program will prompt user for the search terms.\n";
   
   exit(1); #this is an error exit (non-zero return value)
}

# The very basic "sanity" checks to display the usage() message.

if (!defined($in_filename) or @extra_parms>0
    or $in_filename =~ /^(-)?\?/ or $in_filename =~ /^\s*-(-)?h(elp)?/
+i)
{
    usage();
}

if (! -e $in_filename)
{
   print "Error! input file name: $in_filename does not exist\n\n";
   usage();
}

print "Enter search parameter(s), one per line or end\n";

my $input;
my @search_tokens;

while ( (print "search for: "), $input=<STDIN>, $input !~ /^\s*END\s*$
+/i)
{

    next if $input =~ /^\s*$/; # re-prompt on a blank input line
    $input =~ s/^\s*//;        # delete leading spaces
    $input =~ s/\s*$//;        # delete trailing spaces
 
    if ($input =~ /^\S+\s+\S+/)
    {
       print "Error! Only one search term per input line!\n";
       next;  
    }

    # Note: here tr// counts the number of characters which are not
    # in the set, without modifying the $input variable.
    
    if ($input =~ tr/A-Za-z0-9_//c) # must be legal characters
                                    # for a filename!
    {
       print "Error! Illegal character! only A-za-z0-9_ allowed!\n";
       next;
    }
    
    if ($input =~ /^\d/) # must be legal characters for a filename!
    {
       print "Error! Token cannot start with a number!\n";
       next;
    }
    
    push (@search_tokens, $input);
}

if (@search_tokens==0)
{
   print "Error! No search tokens entered..exiting..\n";
   exit (2);    # another error exit with error code 2
}

# For debugging, dump the tokens back out

print "\n"; #just a space line
print "Search Terms are:\n";

foreach my $token (@search_tokens)
{
   print "token=\'$token\'\n";
}

__END__

Get the above working and tested, then move on the next section
of the program. You can make a separate test program that just
Hard codes the $filename and @search_tokens. Get that
working, then move on to another step.
[download]

In reply to Re^3: PERL searching through a file by Marshall
in thread PERL searching through a file by ssimone

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.