You have a very good point in relation to different chunks having to be processed seperately. I have started a seperate thread in relation to the Marking up alternatives as this is more of a meditation on the nature of markups.
Thanks
UnderMine
Comment on Re^4: Extracting appropriate language text from HTML data