in reply to Searching through a document and reporting results.
I second the suggestion to consider an HTML parsing module. You may also want to consider replacing your sentence and word splits with a more sophisticated grammar for parsing sentences and words using a module like Parse::RecDescent. For example, the period character is not a sentence terminator when used in an ellipsis, as a decimal point in a number, etc.while (($key, $value) = each @sentence) { if (has_one_or_more_keywords($value)) print "$key: $value\n"; }
|
|---|