Strange character beginning text files

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Strange character beginning text files by beable (Friar) on Jul 20, 2004 at 02:49 UTC
You should try using the `ord($ch)` function to find out what the character is. It looks like it might be ASCII character number 1, but it's hard to tell on a webpage. If it is an unprintable ASCII character, you could use the `tr` operator to delete it: `#!/usr/bin/perl use strict; use warnings; my $string = "\001hello\002world\003lookit\004these\005weird\006charac +ters"; my $string2 = $string; # replace ASCII chars from 0 to 8 with spaces $string =~ tr[\000-\010][ ]; # or delete weird chars: $string2 =~ tr[\000-\010][]d; print "string is $string\n"; print "string2 is $string2\n"; __END__` [download]	[reply] [d/l] [select]
Re^2: Strange character beginning text files by tachyon (Chancellor) on Jul 20, 2004 at 05:08 UTC
To limit data to the ASCII printable set you can just do `tr/\011\012\015\040-\176//cd` You may want to lose or change one of the CR LF chars as well. Opps, forgot tab, thanks beable cheers tachyon	[reply] [d/l]
Re^3: Strange character beginning text files by beable (Friar) on Jul 20, 2004 at 07:55 UTC
Tut tut, sirrah. \011 is printable. From "man ascii": `Oct Dec Hex Char 011 9 09 HT '\t'` [download] </nitpick>	[reply] [d/l]
Re^4: Strange character beginning text files by tachyon (Chancellor) on Jul 21, 2004 at 00:58 UTC
Re: Strange character beginning text files by crabbdean (Pilgrim) on Jul 20, 2004 at 04:37 UTC
Well if chomp is not stripping it then its not a return character. You could try chop, if its always on the end of the line. Alternatively you could convert it to bits like this: `map { print unpack "B*", chr } qw\0001\` Or, to just characters like this: `map {print chr }qw/0001/` ... and then check it out against an ASCII table on the net and see what it converts to.try here Interestingly I ran this on the character and got a NULL string, that is, 00000000. Hence why chomp mightn't be picking it up. What you are seeing could be how your NULL appears in your flat file, which would also explain why its probably not showing in Notepad. Additionally when I attempt to convert it to a character using "chr" I get nothing appearing on my console, which also explains the possibility of NULL character as well. Furthermore, strings in memory are terminated with a NULL character, which the computer uses to signify the end of the string. If your flat file is the result of something that was written to it from another program the character could very well be NULL's at the end of each string. Again all this is hypothesis. How to remove them depends on where they are appearing in the flat file. If you are the creator of the flat file trying amended the program that writes it to chop the last character from each string/line before writing it to the flat file. Or convert the whole file to bits, delete all nulls, and then convert back to characters (probably not required unless you're really desperate). As a side note, the 1 on the end of this 0001 suggests to me if could also be the 00000001 character which is the "Start of heading" character unless your question is simply relating to the box () which for me comes out as 00000000 Dean The Funkster of Mirth Programming these days takes more than a lone avenger with a compiler. - sam RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers	[reply] [d/l] [select]
Re^2: Strange character beginning text files by tachyon (Chancellor) on Jul 20, 2004 at 04:59 UTC
Actually chomp typically eats \n only which is the line feed char LF not the carriage return char CR.... `printf "CR \\r \\%03o 0x%02x\n", ord("\r"), ord("\r");; printf "LF \\n \\%03o 0x%02x\n\n", ord("\n"), ord("\n");; my $str = "str\015\012"; for( 1..2 ) { print "string '$str'\n"; print "length ", length $str, "\n"; chomp $str; print "string '$str'\n"; print "length ", length $str, "\n\n"; }` [download] Technically chomp removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module). cheers tachyon	[reply] [d/l]
Re^3: Strange character beginning text files by graff (Chancellor) on Jul 20, 2004 at 05:12 UTC
Well, to be exact, chomp removes whatever string happens to match the current value of "$/" (input record separator), which defaults to ~~"\015\012" for windows text-mode,~~ "\n" for unix. (update: see replies below for correct info) And it only does this when the string matching $/ happens to occur at the end of the scalar value being chomped. `perl -e '$/ = "\n"; $_ = "str\015\012"; chomp; s/(\s)/sprintf("%o",ord +($1))/eg; print $_,$/' # prints "str15" perl -e '$/ = "\r\n"; $_ = "str\015\012"; chomp; s/(\s)/sprintf("%o",o +rd($1))/eg; print $_,$/' # prints "str" perl -e '$/="\r\n"; $_ = "foo\015\012str\015\012"; chomp; s/(\s)/sprin +tf("%o",ord($1))/eg; print $_,$/' # prints "foo1512str"` [download] Update: Honest, I really did (start to) post this before tachyon made it redundant. And I confess I was not speaking from personal experience (lucky me) about the default value of $/ on ms-win -- thanks to tachyon for the correction.	[reply] [d/l]
Re^4: Strange character beginning text files by tachyon (Chancellor) on Jul 20, 2004 at 05:26 UTC
Re^5: Strange character beginning text files by wfsp (Abbot) on Jul 20, 2004 at 07:28 UTC
Re^4: Strange character beginning text files by tachyon (Chancellor) on Jul 20, 2004 at 05:17 UTC