in reply to Re^2: Extracting a (UK) Address
in thread Extracting a (UK) Address
So you are looking for three or more lines together, the last ending in something that looks like a post code...
$letter =~ m/((?:[^\n]+\n){2,}[^\n]*?[a-zA-Z]+[0-9]+\s+[0-9]+[a-zA-Z]+\s*?\n)\s*?\n/...seemed to do the trick, where the entire letter was read into $letter. Obviously this will miss addresses with no post code or really rubbish post codes. You could just extract all groups of 3 or more lines, and then apply some more cunning address recogniser to the result -- perhaps from one of the modules recommended elsewhere.
(I haven't tried to figure out how much work this is asking the regex engine to do on difficult input. I'd worry about that only if it becomes a problem.)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Extracting a (UK) Address
by jvector (Friar) on Jan 04, 2009 at 20:33 UTC |