in reply to 2 remaining problems (regex)

Trimming the extra whitespace can be done with this regexp:
s/\s+/ /g;

Just in case that's not greedy enough, though, you could also try:
s/\s+([^\s])/ $1/g;

But the second pattern won't trim trailing whitespace.
HTH,
--isotope