When you say "perl spits that line back out as", do you mean "prints to your terminal as"? Perhaps Perl's printing the character, but your terminal cannot display it.
If I take this page and save it as emdash.html, and run:
open(FOO,'<emdash.html') or die $!;
while(<FOO>) {
if( /Pierrefonds/ ){
print;
print join ' ',map { ord } split // ;
print "\n";
}
}
both lines containing Bernard Patry's riding name appear in my xterm as "PierrefondsDollard", however, looking at the values of each character printed below each line, I can see that the first one contains an extra unprinted character, decimal value 151. That's the em dash.
The fun part is that the em dash character of 151 isn't actually in ISO-8859-1. It's from the Windows Latin 1 character set, which isn't directly compatible with ISO-8859-1. This could explain why it doesn't display correctly in your (or at least, my) terminal. See http://www.cs.tut.fi/~jkorpela/www/windows-chars.html for more details.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.