I do something a bit different when working within vim to take pasted text from an MS Word document, and translate the few oddball characters I frequently encounter into html entities.
It took a good bit of experimentation to work this out, but it works well and consistently in translating on the fly, per line or per selection.
From my .vimrc.web:
let myentity = "–—“”‘’«»…ãáçêé¼½¾¿°"
nmap <buffer> <silent> <localleader>utf :.!perl -MHTML::Entities -Mutf8 -lne 'utf8::decode($_); print encode_entities($_, qq{<C-R>=g:myentity<CR>} );'<CR>
vmap <buffer> <silent> <localleader>utf :!perl -MHTML::Entities -Mutf8 -lne 'utf8::decode($_); print encode_entities($_, qq{<C-R>=g:myentity<CR>});'<CR>
Translated, removed from it's vim environment, the line would look something like:
perl -MHTML::Entities -Mutf8 -lne 'utf8::decode($_); print encode_entities($_, qq{–—“”‘’«»…ãáçêé¼½¾¿°});' <yourfile>
As always, your mileage may vary, but you should find this useful and consistent. :-)
update: The above are supposed to be the actual utf-8 literals. In other words, you should see more of these: "ãáçêé¼½¾¿°" and NONE of these: "–—“”‘’"
If I had marked the above as <code>, the literals were all escaped which obscured the whole point of the post reply.
In reply to Re: character encoding ambiguities when performing regexps with html entities
by WebDragon
in thread character encoding ambiguities when performing regexps with html entities
by angelixd
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |