mgdude has asked for the wisdom of the Perl Monks concerning the following question:

Earlier today I posted a problem I had with strings containing numbers and some other leading mystery character as being evaluated as having a numerical value of 0. RMGir's advice did get rid of one error that was the cause of Perl thinking my delimited text files were in UTF-8 format. Since then I still experienced the problem and have traced the mystery character to whatever HTML uses as a nonbreaking space character ("&"."nbsp;"). Anyone know why Perl 5.8 on my Unix box can't force numbers preceded by this character to behave as numbers (ex. "&" . "nbsp;7" == 0), but Perl 5.6 on my WinNT can? Anyone have a script for stripping this character out of text files? Your help is much appreciated, Thanks, Carl

Replies are listed 'Best First'.
Re: HTML nonbreaking space character (isspace)
by tye (Sage) on Aug 05, 2003 at 23:21 UTC

    I suspect that one platform considers it to be whitespace while the other does not. There is quite a maze of twisty little C macros involved, but isSPACE() is used in the Perl source code to skip leading spaces on strings being interpretted as numbers.

    I could certainly see Win32's isspace() (C macro/function) knowing that "\xA0" is whitespace but Unix's not realizing this.

    Then there is the twisty maze to consider. When Perl is built, if it appears that locales are supported, then different code might get used for isSPACE().

    I certainly don't consider it a bug either way. I find it reasonable to treat "\xA0" as whitespace (a more modern approach) and reasonable to not treat it as such (a more traditional approach). That different builds of Perl might disagree on this point does not surprise me.

                    - tye
Re: HTML nonbreaking space character
by waswas-fng (Curate) on Aug 05, 2003 at 21:39 UTC
    on unix you may use od -cb to get the octal printout and escaped form of the character that you are dealing with after that it may be easier to get an idea of what is going on. the comamnd would be:
    od -cb <file or cat file |od -cb


    -Waswas
      Also you do not seem to chomp on your original post, is the number that you are trying to match at the end of the | seperated line? could the issue be an \n appended to the number?

      -Waswas