Re^2: regex to extract text

Replies are listed 'Best First'.
Re^3: regex to extract text by graff (Chancellor) on Jan 19, 2009 at 07:46 UTC
Note that CountZero's solution (based on your initial attempt, just adding the necessary "s" modifier) is doing a greedy match with '(.)' -- this means that if there are two or more instances of '`</div>`' following the address section, the match will extend to the farthest one. Using '(.?)' instead, to specify a non-greedy match, will do what you really want, though as pointed out already, you probably should be getting acquainted with proper HTML parsing. It takes a bit of learning to catch on, but in the long run a parsing module will lead you to quicker and better solutions than what can be done with regex matching.	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: regex to extract text
by graff (Chancellor) on Jan 19, 2009 at 07:46 UTC

</div>

Using '(.*?)' instead, to specify a non-greedy match, will do what you really want, though as pointed out already, you probably should be getting acquainted with proper HTML parsing. It takes a bit of learning to catch on, but in the long run a parsing module will lead you to quicker and better solutions than what can be done with regex matching.

[reply]
[d/l]