rimvydazas has asked for the wisdom of the Perl Monks concerning the following question:

Have anyone experienced the problem like this one? Look at the script and string I have in $msg:
#!/usr/bin/perl -w use strict; my $msg = '1324^@^@^@'; my $lng = length($msg); print "The length of the message is: $lng\n"; print "This is the message itself: [$msg]\n";
When I execute it, I get the following:
cdruser1@cdr1:~/cdr/scripts> ./test1.pl The length of the message is: 7 This is the message itself: [1324]
The problem is with those characters ^@^@^@. When I print those characters, I can't see nothing! However, the length, as you see, includes them. What I really want to do is to get rid of them from my string, but I am not sure what to use in substitution command. By the way, those characters in Windows are printed as spaces.

Replies are listed 'Best First'.
Re: Weird characters' issue
by ikegami (Patriarch) on Dec 06, 2007 at 19:16 UTC

    ^@ is how your editor represents the NUL character.
    In the file, each of those ^@ is the single byte chr(0).

    >perl -e"printf qq{^%s %2d\n}, chr($_+64), $_ for 0..31" ^@ 0 ^A 1 ^B 2 .. ^Y 25 ^Z 26 ^[ 27 ^\ 28 ^] 29 ^^ 30 ^_ 31
Re: Weird characters' issue
by FunkyMonk (Bishop) on Dec 06, 2007 at 19:21 UTC
    They look like chr(0)'s to me. You can get rid of them using $msg =~ s/\0//g;.

      That sounds like an aweful idea. Adding s/\0//g might make the program *appear* to work, but would almost certainly add a bug. It would be best to find out why they're there in the first place. For example, if they're present after every other character or so, it because text encoded as UTF-16 or UCS-2 is being treated as the text encoded in another encoding (probably iso-latin-1).
        I use s/\0+//g right now and it seems to work fine. Where do I get those strings? They come on the socket I am listening to. String are generated by a phone switch. NULL characters are always added whenever the date is inserted in a string like in the example bellow:
        120507 1324 01300 6417 13639820545 801<CR +><LF>13:24 12/05<CR><LF>^@^@^@13:24 12/05<CR><LF>^@^@^@120507 1324 00 +456 2867416007 808 14972681971 801<CR><LF>13:24 12 +/05<CR><LF>^@^@^@
        I don't see any other places with those characters (not in the data I am capturing, or anywhere else), only when the timestamp is added in string. I guess it is safe to use s/\0+//g in this case, am I right?