Dear Monks I have been struggling for a while with the negative look ahead regex. I am trying extract certain excerpts from financial statements. Specifically, I would like to extract "Item 1. Business." Here is the file link http://sec.gov/Archives/edgar/data/931683/000115752309002434/a5927574.txt The way I extract the Item 1 section is using boundaries starting from "Item 1" to either "Item 1A," "Item 2", or "Item 3." Unfortunately, the regex does not extract the whole Item 1 because it matches on "Item 3" mentioned in the excerpt, specifically it stops at "discussed more thoroughly in Item 3" I have tried using negative lookahead to make it match all the way but I can't get my code to work. Here is my code.
if($data=~m/(?:Item|ITEM)[Ss]?\s?(?:\.|\-|\:|\-|\,)?\s?(?:1|I)\s?(?:\. +|\-|\:|\,|\-\-)?\s?(?:Description|DESCRIPTION|Discussion|DISCUSSION)? +\s?(?:[Oo]f|OF)?\s?(?:[Tt][Hh][Ee])?\s?(?:Our|OUR)?\s?(?:Busine\s?ss| +BUSINE\s?SS|Company|COMPANY)\s?(?:\.|\-|\:|\-\-|\,)?(.*?)(?!discussed +\s?more\s?in\s?|set\s?forth\s?in\s?|see\s?)(?:Item|ITEM)\s?(?:\.|\-|\ +:|\-\-|\,)?\s?(?:I|1A|1B|2|3)\s?(?:\.|\-|\:|\-\|\,)?/xisg)
It would be great if you can help me figure out a way to make the regex only match the real end, (beginning of item 3) not the words "item 3" inside the excerpts of "item 1." Thank you so much!
In reply to Help with negative look ahed by eversuhoshin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |