comment on

Hmmm.

# while not EOF keep going
while ( <DATA> ) {
    $lineCount++;   # increment the lineCount
    for my $word ( @words ) {
        my $pat = $regex{ lc($word) };
        #next word unless the word is a keyword so store a report in @
+found
        next unless ( /$pat/ );
        @found = (@found, "\nError in line $lineCount of file $file oc
+curence of \"$word\" :\n\t@words\n");
        $foundCount++; # increment total found words
    }
}
[download]

First, lets make this a bit less painful. Sorry this is not directly answering your problems, I will get to that in a bit.

First and foremost, You need to stop that @found assignment. It is really painful to look at. The perlish way ( highly optimized as well as reading better ) is to push like

push @found, "\nError in line $lineCount of file $file occurence of \"
+$word\" :\n\t@words\n";
[download]

The way you do that previously causes perl to expand the array to a list and then puts it back into the array. That expansion step is going to get very costly. The push doesn't bother with all that, it just tacks the data onto the end of the array.

You can also get rid of the lineCount variable if you wish. Perl automagically keeps track of the current line number and stores that in $.

Now, onto you problem. I am worried about this line

# read in all the keywords from the configFile and put all the words i
+nto a hash
%regex = map { $_ => qr/$_/ } init_keywords ($ConfigFile);


# init_keywords just returns an array of keywords all in lower case.
 ....
[download]

this means you will only match words when they are in lower case. Without knowing your data set, I cannot say for certain if this is your problem, but that is my guess. You can solve this many ways, but frankly I think something like

%regex = map { $_ => qr/$_/i } init_keywords ($ConfigFile);
[download]

is the cleanest way to do it. The additional /i modifier tells the regex to ignore case when doing the match.

Beyond this, you will need either somebody better than I am at this stuff ( and there are plenty of them here ) or I will need a sample of your data - both the key words and the files you are parsing.

mikfire

In reply to RE: RE: RE: Re: Search Algorithm by mikfire
in thread Search Algorithm by tenfourty

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.