Hello, I need help with Regular expression for the wikipedia module. What I am trying to implement is a simple question/answer system. I have parsed the question to get the keyword to search wikipedia with and get a page back. All I need is the first sentence since that is usually where the answer is. The Problem I am having is there the complete first sentence is not returning. For example George Washington:
(A lot more info but this is what I am trying to parse)
'George Washington' was the first President of the United States , the Commander-in-Chief of the Continental Army during the American Revolutionary War, and one of the Founding Fathers of the United States. He presided over the convention that drafted the current United States Constitution and during his lifetime was called the "father of his country".<ref name="Grizzard105"></ref> ...
Even with this I am having a lot of trouble with just extracting the first sentence.
The closest I have gotten is with this code where $sent is the keyword I am searching(always the wiki articles itself), in this example we can say $sent=George Washington
if($doc =~ /'$sent' (is|was) ([\w]+[\s]*[,|;|"|'|\-]?[\s]*)+\./)
{
$reform = "$sent $1 $2.\n\nFOUND!!";
}
This returned:
$reform = George Washington was States.
This was what I was using earlier but gave me the whole article in $reform.
if($doc =~ /'$sent' (is|was) (.*)\./)
{
$reform = "$sent $1 $2.\n\nFOUND!!";
}
So can someone help me with my regex match please.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.