Hello lobs,

Where possible, it’s generally better to use an existing module than to re-invent the wheel. In this case, there are modules on such as Text::Sentence that do most of the work for you:

use strict; use warnings; use Text::Sentence qw( split_sentences ); my $sent = 'George Washington'; my $doc = do { local $/; <DATA>; }; my @sentences = split_sentences($doc); for (@sentences) { if (/^'?$sent'?\s+(?:is|was)/) { print "FOUND:\n$_\n"; last; } } __DATA__ The quick brown fox jumped over the unfortunate dog. 'George Washington' was the first President of the United States, the +Commander-in-Chief of the Continental Army during the American Revolu +tionary War, and one of the Founding Fathers of the United States. He + presided over the convention that drafted the current United States +Constitution and during his lifetime was called the "father of his co +untry". Widely admired for his strong leadership qualities, Washington was una +nimously elected president in the first two national elections. He ov +ersaw the creation of a strong, well-financed national government tha +t maintained neutrality in the French Revolutionary Wars, suppressed +the Whiskey Rebellion, and won acceptance among Americans of all type +s.[5] Washington's incumbency established many precedents, still in u +se today, such as the cabinet system, the inaugural address, and the +title Mr. President.[6][7] His retirement from office after two terms + established a tradition that lasted until 1940, when Franklin Delano + Roosevelt won an unprecedented third term. The 22nd Amendment (1951) + now limits the president to two elected terms.

Output:

14:11 >perl 1601_SoPW.pl FOUND: 'George Washington' was the first President of the United States, the +Commander-in-Chief of the Continental Army during the American Revolu +tionary War, and one of the Founding Fathers of the United States. 14:11 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re: Regular expression for Wikipedia Module by Athanasius
in thread Regular expresión for Wikipedia Module by lobs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.