The Specimen
Before ...
With a population of more than 10.2 million, Seoul, the
capital of South Korea, is the world’s largest city in terms of population. Sao Paulo(Brazil), the world’s second-largest city, has a population of just over ten million. Three other cities, Bombay(India), Jakarta(Indonesia) and Karachi(Pakistan), have grown to more than nine million people.
After ...
Seoul has a population of more than 10.2 million.
Seoul is the capital of South Korea.
Seoul is the world’s largest city in terms of population.
Sao Paulo(Brazil) is the world’s second-largest city.
Sao Paulo(Brazil) has a population of just over ten million.
Bombay(India) has grown to more than nine million people.
Jakarta(Indonesia) has grown to more than nine million people.
Karachi(Pakistan), has grown to more than nine million people.
The Question
How can I use perl to automatically (or at least partially) generate AFTER from BEFORE
The Background
There is a guy who wants to do this sort of thing, with the following disclaimers:
- The guy is not a linguistics professor
- The guy wants to spit out questions for a 'flashcard' type thingy
- The guy prefers practical nuts and bolts examples to pie-in-the-sky visions of 'AI'
- The guy wishes to avoid esoteric concepts beyond the grasp of a moderately competent college student who knows some perl.
- The guy admits this is the stuff of decades of reasearch, multitudinous PhD theses, and towering artifices of herculean intellectual endeavor, but still wants a simple solution from perlmonks.org.
The Speculations
The guy has toyed with the following speculations:
- Build a 'corpus' of domain-compatible 'trigger words' and use 'split' with those as delimiters (eg 'has a','is a','having a', 'has grown', etc)
- Simply split the BEFORE text on punctuation, call those 'the building blocks' and randomly generate different structures based on those building blocks, discarding (by hand) all but those which make sense.
The Disclaimer
Yes, the guy has seen the nodes on NLP and searched around a bit, but answers always seem shrouded in a funk of elaborately ornate statistical contrivances that seem overly complicated for the task at hand. The guy was reluctant to ask this question, but WTH, someone might be able to help with a breakthrough.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.