d-napizzle has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,

I have a string like 6 1/2 which, according to emacs, has a _ between the 6 and the 1. However, emacs chooses to highlight this mysterious underscore in red, and when I replace it by typing _, I see a normal _ in black. Also, when I paste that string into non-emacs things, as you can see, there is no underscore, but there is whitespace.

Believe it or not, all I want to do is
$str="6 1/2"; # actually it comes from a file of tokenized strings $str=~s/_/-/;
I want to replace that mystery character between the 6 and the 1 with a -. That's it. But figuring out what that character is is driving me crazy.

I'd happily post this on like UTF8Monks if there was one. :)

Thanks, D

Replies are listed 'Best First'.
Re: Weird underscore/whitespace failing regex
by JavaFan (Canon) on May 12, 2009 at 23:03 UTC
    Also, when I paste that string into non-emacs things, as you can see, there is no underscore, but there is whitespace.
    Look, puny mortal, are you questioning Emacs? The Holy Developers Environment written by God1 Himself? If Emacs shows a red underscore, there is a red underscore; do not question Emacs by using software written by false gods2 which cannot display red underscores. The correct way to replace a red underscore is:
       $str =~ s/_/-/;
    
    Getting a red underscore in Emacs is easy - just hit the right 7 keys at once, and Emacs will start up the underscore wizard. You find the keys to hit in the info page (never use the manual page!). Inferior editors will not be able to do this.

    1Beard, long hair, open arms, must be a god.
    2Shaven, has seen a haircutter recently, black turtleneck, cannot be a true god.

      What does god need with a starship editor?</Kirk>

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

Re: Weird underscore/whitespace failing regex
by almut (Canon) on May 12, 2009 at 22:50 UTC

    Just a wild guess...  Maybe there really is no mysterious character in between 6 and 1, and it's just some ultra-clever emacs mode (inadvertendly activated) that's trying to alert you that there should be something other than a space in between 6 and 1 (maybe it's expecting an operator, or some such?)...  What happens when you delete the weird character and enter a new space (instead of an underscore)?

Re: Weird underscore/whitespace failing regex
by moritz (Cardinal) on May 12, 2009 at 22:39 UTC
    use Data::Dumper; $Data::Dumper::Useqq = 1; print Dumper($str);

    This will show you what's insider your $str.

Re: Weird underscore/whitespace failing regex (nbsp)
by tye (Sage) on May 12, 2009 at 23:56 UTC

    I bet it is a non-breaking space. Try s/\xa0/-/.

    Instead of emacs, use something that doesn't mind making things uglier but clearer, such as recommended elsewhere in this thread.

    - tye        

Re: Weird underscore/whitespace failing regex
by przemo (Scribe) on May 12, 2009 at 22:20 UTC

    I'm not sure, if I understand correctly, but please send us hex dump of the file (e.g. with hd -C file on Linux). Then it will be evident what mysterious bytes live under your text.

      Since we're dealing with a var,

      use Data::Dumper qw( Dumper ); $Data::Dumper::Useqq = 1; print(Dumper($str));
      or
      use Devel::Peek qw( Dump ); Dump($str);
Re: Weird underscore/whitespace failing regex
by d-napizzle (Initiate) on May 13, 2009 at 13:54 UTC
    Wow, I didn't think anyone would actually respond to this. @JavaFan Best. Response. Ever.

    I tried running
    use Data::Dumper; $Data::Dumper::Useqq = 1; print Dumper($str);
    from the command line with
    print Dumper("6 1/2")
    but what's interesting is that every time I would copy/paste "6 1/2" from the Most Holiest of Editors Emacs, it would come out as "1/26" on the command line. Then Dumper would say it just contains "1/26", which is not helpful.

    So I gave up on that and put the code right in my script. The results from that were much more interesting:
    $VAR1 = "6\2401\\/2";
    So, it looks like I have a \240 running a muck in my data. Googling tells me this should actually show up as ð but The Most Righteous of Editors was displaying my data in iso-latin-1-unix encoding, as opposed to straight-up UTF-8.

    Thanks so much, monks.