Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^4: getting rid of UTF-8

by BernieC (Pilgrim)
on Nov 25, 2022 at 02:57 UTC ( [id://11148368]=note: print w/replies, xml ) Need Help??


in reply to Re^3: getting rid of UTF-8
in thread getting rid of UTF-8

I'll try to get something together and paste a hex dump. But: i know that there are nothing but plain lower 128 ASCII characters {I just mentioned ISO-latin out of habit}. It is all data that I entered in and there's no data in the CSV files that isn't something I entered. I have no idea why there's a bom in the middle of the first record..... I'll get the dump

Replies are listed 'Best First'.
Re^5: getting rid of UTF-8
by BernieC (Pilgrim) on Nov 25, 2022 at 03:25 UTC
    OK. I've got a hex dump. Here's what the file looks like in a text editor:
    Importance,"First Name","Middle Name","Last Name","Full Name",Company, +Department,"Job Title","Street (b.)","City (b.)","State (b.)","ZIP Co +de (b.)","Country/Region (b.)","Home Phone","Business Phone","Mobile +Phone","Business Phone 2","Business Phone 3","Business Phone 4","Busi +ness Fax","Business Web Page","Street (h.)","City (h.)","State (h.)", +"ZIP Code (h.)","Country/Region (h.)","Home Phone 2","Home Phone 3"," +Home Phone 4","Home Fax","Personal Web Page","Mobile Phone 2","Mobile + Phone 3","Mobile Phone 4",E-mail,"E-mail 2","E-mail 3","E-mail 4",x, +y,z,w,Office,Supervisor,Assistant,Salutation,Nickname,Gender,Spouse,B +irthday,Anniversary,Family,Hobbies,Specialty,Strengths,Personality,No +tes,"Custom 2","Custom 3","Custom 4","Custom 5","Custom 6","Custom 7" +,"Custom 8",Comment,Group,"Birthday Reminder On/Off","Anniversary Rem +inder On/Off" Normal,,,,"A-1 Heating and Cooling","A-1 Heating and Cooling",,,"PO Bo +x 94",Newport,Virginia,24128,"United States",,544-7810,,,,,,,,,,,"Uni +ted States",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"953-1513 Scott - service mgr - cell 540 357 2816","Emergency contacts",No,No
    And here's the hex dump of it
    EF BB BF 49 6D 70 6F 72 74 61 6E 63 65 2C 22 46 69 72 73 74 20 4E 61 6 +D 65 22 2C 22 4D 69 64 64 6C 65 20 4E 61 6D 65 22 2C 22 4C 61 73 74 2 +0 4E 61 6D 65 22 2C 22 46 75 6C 6C 20 4E 61 6D 65 22 2C 43 6F 6D 70 6 +1 6E 79 2C 44 65 70 61 72 74 6D 65 6E 74 2C 22 4A 6F 62 20 54 69 74 6 +C 65 22 2C 22 53 74 72 65 65 74 20 28 62 2E 29 22 2C 22 43 69 74 79 2 +0 28 62 2E 29 22 2C 22 53 74 61 74 65 20 28 62 2E 29 22 2C 22 5A 49 5 +0 20 43 6F 64 65 20 28 62 2E 29 22 2C 22 43 6F 75 6E 74 72 79 2F 52 6 +5 67 69 6F 6E 20 28 62 2E 29 22 2C 22 48 6F 6D 65 20 50 68 6F 6E 65 2 +2 2C 22 42 75 73 69 6E 65 73 73 20 50 68 6F 6E 65 22 2C 22 4D 6F 62 6 +9 6C 65 20 50 68 6F 6E 65 22 2C 22 42 75 73 69 6E 65 73 73 20 50 68 6 +F 6E 65 20 32 22 2C 22 42 75 73 69 6E 65 73 73 20 50 68 6F 6E 65 20 3 +3 22 2C 22 42 75 73 69 6E 65 73 73 20 50 68 6F 6E 65 20 34 22 2C 22 4 +2 75 73 69 6E 65 73 73 20 46 61 78 22 2C 22 42 75 73 69 6E 65 73 73 2 +0 57 65 62 20 50 61 67 65 22 2C 22 53 74 72 65 65 74 20 28 68 2E 29 2 +2 2C 22 43 69 74 79 20 28 68 2E 29 22 2C 22 53 74 61 74 65 20 28 68 2 +E 29 22 2C 22 5A 49 50 20 43 6F 64 65 20 28 68 2E 29 22 2C 22 43 6F 7 +5 6E 74 72 79 2F 52 65 67 69 6F 6E 20 28 68 2E 29 22 2C 22 48 6F 6D 6 +5 20 50 68 6F 6E 65 20 32 22 2C 22 48 6F 6D 65 20 50 68 6F 6E 65 20 3 +3 22 2C 22 48 6F 6D 65 20 50 68 6F 6E 65 20 34 22 2C 22 48 6F 6D 65 2 +0 46 61 78 22 2C 22 50 65 72 73 6F 6E 61 6C 20 57 65 62 20 50 61 67 6 +5 22 2C 22 4D 6F 62 69 6C 65 20 50 68 6F 6E 65 20 32 22 2C 22 4D 6F 6 +2 69 6C 65 20 50 68 6F 6E 65 20 33 22 2C 22 4D 6F 62 69 6C 65 20 50 6 +8 6F 6E 65 20 34 22 2C 45 2D 6D 61 69 6C 2C 22 45 2D 6D 61 69 6C 20 3 +2 22 2C 22 45 2D 6D 61 69 6C 20 33 22 2C 22 45 2D 6D 61 69 6C 20 34 2 +2 2C 78 2C 79 2C 7A 2C 77 2C 4F 66 66 69 63 65 2C 53 75 70 65 72 76 6 +9 73 6F 72 2C 41 73 73 69 73 74 61 6E 74 2C 53 61 6C 75 74 61 74 69 6 +F 6E 2C 4E 69 63 6B 6E 61 6D 65 2C 47 65 6E 64 65 72 2C 53 70 6F 75 7 +3 65 2C 42 69 72 74 68 64 61 79 2C 41 6E 6E 69 76 65 72 73 61 72 79 2 +C 46 61 6D 69 6C 79 2C 48 6F 62 62 69 65 73 2C 53 70 65 63 69 61 6C 7 +4 79 2C 53 74 72 65 6E 67 74 68 73 2C 50 65 72 73 6F 6E 61 6C 69 74 7 +9 2C 4E 6F 74 65 73 2C 22 43 75 73 74 6F 6D 20 32 22 2C 22 43 75 73 7 +4 6F 6D 20 33 22 2C 22 43 75 73 74 6F 6D 20 34 22 2C 22 43 75 73 74 6 +F 6D 20 35 22 2C 22 43 75 73 74 6F 6D 20 36 22 2C 22 43 75 73 74 6F 6 +D 20 37 22 2C 22 43 75 73 74 6F 6D 20 38 22 2C 43 6F 6D 6D 65 6E 74 2 +C 47 72 6F 75 70 2C 22 42 69 72 74 68 64 61 79 20 52 65 6D 69 6E 64 6 +5 72 20 4F 6E 2F 4F 66 66 22 2C 22 41 6E 6E 69 76 65 72 73 61 72 79 2 +0 52 65 6D 69 6E 64 65 72 20 4F 6E 2F 4F 66 66 22 0D 0A 4E 6F 72 6D 6 +1 6C 2C 2C 2C 2C 22 41 2D 31 20 48 65 61 74 69 6E 67 20 61 6E 64 20 4 +3 6F 6F 6C 69 6E 67 22 2C 22 41 2D 31 20 48 65 61 74 69 6E 67 20 61 6 +E 64 20 43 6F 6F 6C 69 6E 67 22 2C 2C 2C 22 50 4F 20 42 6F 78 20 39 3 +4 22 2C 4E 65 77 70 6F 72 74 2C 56 69 72 67 69 6E 69 61 2C 32 34 31 3 +2 38 2C 22 55 6E 69 74 65 64 20 53 74 61 74 65 73 22 2C 2C 35 34 34 2 +D 37 38 31 30 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 22 55 6E 69 74 65 64 2 +0 53 74 61 74 65 73 22 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2 +C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2 +C 22 EF BB BF 39 35 33 2D 31 35 31 33 0D 0A 53 63 6F 74 74 20 2D 20 7 +3 65 72 76 69 63 65 20 6D 67 72 20 2D 20 63 65 6C 6C 20 35 34 30 20 3 +3 35 37 20 32 38 31 36 22 2C 22 45 6D 65 72 67 65 6E 63 79 20 20 63 6 +F 6E 74 61 63 74 73 22 2C 4E 6F 2C 4E 6F 0D 0A 4E 6F 72 6D 61 6C 2C 2 +C 2C 2C 22 41 62 69 6E 67 64 6F 6E 20 45 71 75 69 70 6D 65 6E 74 22 2 +C 22
    Notice, from the dump that there another EFBBBF toward the end of the file. And: I tried to brute force it and it didn't work!! I did the
    $line =~ s/\xef\xbb\xbf//
    and it didn't remove the characters! I'll try again...
      I did the $line =~ s/\xef\xbb\xbf// and it didn't remove the characters!

      Using the advice from kcott here to use /g, it works for me. If it really doesn't work for you, then perhaps the data you have in your Perl string is not what you think it is. See my node here for advice on how to show us the real data, in particular Devel::Peek, and make sure to provide an SSCCE that we can run to see the problem for ourselves.

        I must be doing something stupid. Here's my little test program:
        #!/usr/bin/perl use v5.10 ; use strict; use warnings ; my $BOM = "\xef\xbb\xbf" ; die "no args\n" unless @ARGV == 2 ; open (my $i, "<", $ARGV[0]) or die "Can't open $ARGV[0]\n" ; open (my $o, ">", $ARGV[1]) or die "Can't write to $ARGV[1]\n" ; say "marker is" ; printhex ($BOM) ; say "" ; while (my $line = <$i>) { my $newline = $line ; printhex ($newline) ; $newline =~ s/$BOM//g; die "didn't change" if $newline eq $line ; print $o $newline ; } close $i ; close $o ; exit ; sub printhex { my $str = $_[0] ; for my $chr (split(//,$str)) { printf("%x ", ord($chr)) ; } }
        and when I run it on one of teh BOM'ed files I get:
        marker is ef bb bf didn't change at D:\Desktop\striputf.pl line 19, <$i> line 3. ef bb bf 49 6d 70 6f 72 74 61 6e 63 65 2c 46 69 72 73 74 20 4e 61 6d 6 +5 2c 4d 69 64 64 6c 65 20 4e 61 6d 65 2c 4c 61 73 74 20 4e 61 6d 65 2 +c 46 75 6c 6c 20 4e 61 6d 65 2c 43 6f 6d 70 61 6e 79 2c 44 65 70 61 7 +2 74 6d 65 6e 74 2c 4a 6f 62 20 54 69 74 6c 65 2c 53 74 72 65 65 74 2 +0 28 62 2e 29 2c 43 69 74 79 20 28 62 2e 29 2c 53 74 61 74 65 20 28 6 +2 2e 29 2c 5a 49 50 20 43 6f 64 65 20 28 62 2e 29 2c 43 6f 75 6e 74 7 +2 79 2f 52 65 67 69 6f 6e 20 28 62 2e 29 2c 48 6f 6d 65 20 50 68 6f 6 +e 65 2c 42 75 73 69 6e 65 73 73 20 50 68 6f 6e 65 2c 4d 6f 62 69 6c 6 +5 20 50 68 6f 6e 65 2c 42 75 73 69 6e 65 73 73 20 50 68 6f 6e 65 20 3 +2 2c 42 75 73 69 6e 65 73 73 20 50 68 6f 6e 65 20 33 2c 42 75 73 69 6 +e 65 73 73 20 50 68 6f 6e 65 20 34 2c 42 75 73 69 6e 65 73 73 20 46 6 +1 78 2c 42 75 73 69 6e 65 73 73 20 57 65 62 20 50 61 67 65 2c 53 74 7 +2 65 65 74 20 28 68 2e 29 2c 43 69 74 79 20 28 68 2e 29 2c 53 74 61 7 +4 65 20 28 68 2e 29 2c 5a 49 50 20 43 6f 64 65 20 28 68 2e 29 .....
        What am I getting wrong/missing?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148368]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-04-16 15:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found