comment on

I can't seem to wrap my brain around how Perl traslates Unicode. I'm trying to gather SMTP addresses from a source, and interject them into a Mailsweeper config file. Mailsweeper, however, seems to store it's config files in Unicode, rather than ASCII. Therefore, when the following script is run, all of the data appears to be there, there's just a blank space in between every character, except for the portion that I interjected. Help is much appreciated.

$filesenders = "z:/train/whitesenders.txt";
$fileaddrlist = "z:/Program Files/Mailsweeper for SMTP/Config/shared/a
+ddrlist.test";

#Read and arrayify list of SMTP addresses
open SENDERSREAD, $filesenders or die "unable to open sender file" ;
foreach $newsender (<SENDERSREAD>) {
  chomp $newsender;
  push @senders, $newsender;
}
#Open MailSweeper Addrlist.cfg and pull address lists into a hash of a
+rrays
open (CONFIGREAD, "<:utf8", $fileaddrlist) or die "UNable to open conf
+ig file for read";
print "Reading Config File...\n";
foreach (<CONFIGREAD>) {
  chomp;
  if (/\[(.*)\]/) {
    $currhead = $1;
    print "$currhead\n";
    next;
  }
  next if (!(/./)); #Check for and omit blank lines
  push @{$config{$currhead}}, $_;
}
close CONFIGREAD;

#Put senders captured from file into appropriate list
foreach (@senders) {
  push @{$config{'AddressList\Scripted Whitelist'}}, "v:Member=\$S\"$_
+\"";
}
#write updated config to file, overwriting existing file
open (CONFIGWRITE, ">:encoding(UTF-8)", $fileaddrlist) or die "unable 
+to open config file for write";
foreach (keys %config) {
  print CONFIGWRITE "[$_]\n";
  foreach (@{$config{$_}}) {
    print CONFIGWRITE $_;
    print CONFIGWRITE "\n";
  }
print CONFIGWRITE "\n";
}
close CONFIGWRITE;
[download]

===== Or, to make it even simpler, how do I get this snippet to print out a copy of the config file:

open (READ, "z:/test.txt") or die "unable to open file";

foreach (<READ>) {
  push @array, $_;
}
foreach (@array) {
  print $_;
}
[download]

If requested, I can make a sample of the source file available. ===== OK, further work, further progress. I've finally determined that if I read the file in with UTF-16LE encoding, I get the strings in usable form. But if I output the fule using :encoding(UTF-16LE) I'm back to having a space between every character. Is this because Perl uses UTF-8 internally? If so, do I maybe have to use Unicode::String to convert the strings before sending them back to the filehandle?

In reply to Reading and writing to unicode files by cbingel

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.