LoneRanger has asked for the wisdom of the Perl Monks concerning the following question:
I have to parse through an email that is sent out daily. The people who send it out, just copy and paste from websites around the internet so the formatting is terrible. It contains various articles about different health related topics.
The emails are always different except for: a block of text with all the article titles, 10 = signs to separate the titles and the articles, each article has a title that is always in all caps, and is then followed by a header of sorts with source information, etc., and then the actual article, followed by 2 \n. (an example is below).
I need to know how to approach this and perhaps some methods to figure this problem out. I'm currently thinking that this can only be solved by implementing a state machine, but I'm not sure.Thanks, LoneRanger
FSNET NOVEMBER 29, 1999 Cyclosporiasis: Ontario Cyclosporiasis: Guatemala ========== CYCLOSPORIASIS: ONTARIO November 26, 1999 Infectious Disease News Brief Health Canada An outbreak of enteric infection due to Cyclospora cayetanensis diarrhea occurred in Ontario in the spring of 1999, the fourth consecutive year of spring-time outbreaks of this parasitic infection in this province. The CYCLOSPORIASIS: GUATEMALA November 26, 1999 Infectious Disease News Brief Health Canada CDC conducted a study in health-care facilities and among raspberry farm