in reply to Re^3: parsing a line with $1, $2, $3
in thread parsing a line with $1, $2, $3

I don't disagree with any of that-- I'd use an HTML parser to get the content out of the various tags, which is the easy part. The harder part (and even more fragile) is separating the generic names, doses, and forms, because there's no markup indicators at all, and there's probably just enough inconsistency in the source data to make you crazy.

I don't use Perl a lot, but it's what I use when I need to deal with HTML, XML, or structured stuff that I can coerce into looking like HTML or XML and then use something out of CPAN that will be better behaved in less time than what I can do myself.