cosmicperl has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
  I've moved from a redhat 9 to a Fedora core 3 server. Now something is causing me problems. Fedora seems to have a problem with the broken pipe which unfortunately I use quite a lot ¦.
I have a separate variables file with the line:-
$requiredjoinfields = "name¦address1¦city¦area¦postcode¦country";
If I upload this file all works fine. Now the problem is when I have this variable appear in a text box in a browser. Instead of name¦address1, etc, it has name?address1, etc. This text box is linked to a save routine that saves the value of the text box back to the variables file.
If I don't change a thing and just save, it changes to:-
name�address1�
If I change the ? in the text box to ¦ and save, it changes to:-
name¦address1¦
????? Does that make any sence to anyone. I checked the $LANG setting which was en_US.UTF-8, so I tried changing it to en_US but it made no difference.
Stranger still if I download the file off the server and open it locally, it opens as:-
name¦address1¦
If I upload this file back, it makes no changes. If I use nano through SSH to remove the  before the pipes the script works properly again, but it still shows in the browser as name?address1?.

I'm going mad. Please tell me someone is familiar with this and has somethign I can do to stop it? (Apart from not using broken pipe as I've used it in almost all the scripts I've ever made!

Lyle

Replies are listed 'Best First'.
Re: Fedora and broken pipe ¦
by halley (Prior) on Aug 12, 2005 at 22:18 UTC
    It looks to me that you're not using regular pipe symbols, and you say "broken pipe" in your post, so maybe you're aware of it. Cutting and pasting from your posting, I get ¦ for your "pipes" and | for my pipes. Do those look different to you with your fonts?
    ¦¦¦¦¦¦¦¦¦¦¦¦¦ |||||||||||||

    Your "broken pipe" character is not in the ASCII range, so it's probably getting screwed up with various encoding differences (utf8, latin1, shift-jis, whatever). Maybe you're using some heavyweight word-processing garbage like Word or OpenOffice Writer for your code editor or something?

    --
    [ e d @ h a l l e y . c c ]

Re: Fedora and broken pipe ¦
by jfroebe (Parson) on Aug 12, 2005 at 21:28 UTC

    Umm.... is this perl related? Can you provide an example CGI script where this is occuring along with the browser name (& version)?

    Jason L. Froebe

    Team Sybase member

    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

Re: Fedora and broken pipe ¦
by graff (Chancellor) on Aug 13, 2005 at 00:40 UTC
    You need to learn about controlling character-set (encoding) selection in your browser, in your cgi script(s), and in whatever text editor you're using, and you probably want to look at how to specify what character set you're actually using when you generate HTML content from a perl cgi script, so that when a browser receives the content, it will know (because you've told it) how to display it correctly.

    Whenever you expect to see a non-ASCII character but see a question mark instead, this is a very good clue that the display tool (your browser, for example) is expecting utf8, and is getting some character data that is not parsable as utf8 -- e.g. there are bytes with the eighth bit set that were created/intended as some legacy encoding (e.g. cp1251 or iso-8859-1 or whatever); non-ASCII characters in utf8 must always be conveyed by at least two bytes.

    (Another symptom of that same problem is when you expect to see a string of two or more meaningful non-ASCII characters, and instead you see a smaller number of nonsensical non-ASCII characters. But it fairly rare that non-utf8 data happens to fall into a pattern that could be parsed as utf8 without errors, so you usually do see one or more "?" in the mix.)

    OTOH, whenever you expect to see a single (meaningful) non-ASCII character but you see a string of two or three nonsensical non-ASCII characters instead, this is a good clue that your display tool is expecting data in a legacy single-byte character set (cp125*, iso-8859-*) and has received utf8 data.

      Hi Graff,
         Your right. All browsers we defaulting to UTF-8 even though the page had a meta tag for ISO-8859-1. When I compaired the apache httpd.conf files for RH9 and Fedora I saw that fedor defaulted to UTF-8 while RH9 went for the standard ISO-8859-1. I updated the apache config and restarted and all is now fine. Thanks for pointing me in the right direction.
      AddDefaultCharset UTF-8
      changed to
      AddDefaultCharset ISO-8859-1

      Lyle