in reply to Re^2: Parsing/regex help required
in thread Parsing/regex help required
In any event, you will have to be reading using UTF-8 encoding. My dev environment for Perl only can do ASCII. I cannot easily write code for this.
As far as regex goes:
You need to group an or'd expression something like this (-|em_dash)
To make it "non capturing", (?:-|em_dash);
The question is what "em_dash" should be and how that relates to how the data decoding that was used during the read.
update: under some coding scenarios an em dash is \x{2014}.
I think you need "use utf8;" for that to work, but I am not sure.
Some Monks here are quite experienced with utf8 encoding.
Bring it on!
|
|---|