Morgon - Thanks for your help. When I run it using UTF-16, Perl finished running without any error message but my output includes some junks like following: original data line: 5||1|||SFDC|| new data line: 簀㄀簀㄀㐀㘀簀簀䠀䠀簀簀ഀഊ409||1|||SFDC|| Following is my code. It works fine on ANSI text file. # this program is to fix line breaks in a "|" (or "," with minor changes in the code) delimted file #!/usr/bin/perl -w ############################################################################## # Following are some parameters you may need to change before running program ############################################################################## $file_folder="C:\\Test"; $log_folder="C:\\Test"; $ori_datafile="CT_Vendor_Summary.txt"; $new_datafile="CT_Vendor_Summary_new.txt"; $fix_log="LineBreak_fix_log.txt"; $error_log="LineBreak_error_log.txt"; $rptfile="LineBreak_fix_report.txt"; ######################################################################## $brklinenum=0; $fixlinenum=0; $newline; #open FH, "$file_folder\\$ori_datafile" or die "can't open file"; open FH, "<:encoding(UTF-16)", "$file_folder\\$ori_datafile" or die "can't open file"; $/="\n"; $ori_line_number=0; $pipe_thisline=0; $pipe_sum=0; $right_pipes=7; #open (NEWFILE, ">$file_folder\\$new_datafile") or die "can't open file"; open (NEWFILE, ">:encoding(UTF-16)","$file_folder\\$new_datafile") or die "can't open file"; open (FIXLOG, ">$log_folder\\$fix_log") or die "can't open file"; open (ERRLOG, ">$log_folder\\$error_log") or die "can't open file"; open (FIXRPT, ">$log_folder\\$rptfile") or die "can't open file"; while (<FH>) { chop; # aviod \n in last field; $ori_line_number=$ori_line_number+1; # if ($_ =~ /\r/) {print OUT1 "$count\n";} $pipe_thisline=($_ =~ tr/\|/\|/); if ($ori_line_number eq 1) { print FIXRPT "Report on Fixing Line Breaks in CT_Vendor_Summary File\n\n"; print FIXRPT "Correct number of pipes in each line is $right_pipes \n\n"; } if ($pipe_thisline eq $right_pipes) { if ($pipe_sum eq 0) { print NEWFILE "$_" . "\n"; } else { print ERRLOG "A: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n"; print ERRLOG " " . "$_" . "\n"; } } ### else { if ( $pipe_thisline > $right_pipes ) { print ERRLOG "B: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n"; print ERRLOG " " . "$_" . "\n"; } else { if ($pipe_sum eq 0 ) { $pipe_sum=$pipe_thisline; print FIXLOG "Break: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n"; print FIXLOG " " . "$_"."\n"; $newline = $_; $brklinenum=$brklinenum+1; } else { $pipe_sum = $pipe_sum + $pipe_thisline; $newline = "$newline" . " " . "$_"; $brklinenum=$brklinenum+1; if ($pipe_sum > $right_pipes) { print ERRLOG "C: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n"; print ERRLOG " " . "$_" . "\n"; } elsif ($pipe_sum eq $right_pipes ) { $fixlinenum=$fixlinenum+1; print NEWFILE "$newline" . "\n"; print FIXLOG "Fixed: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n"; print FIXLOG " " . "$newline" . "\n"; $pipe_sum=0; $newline=""; } else { print FIXLOG "Break: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n"; print FIXLOG " " . "$newline" . "\n"; } } } } } $newlinenum=$ori_line_number + $fixlinenum - $brklinenum; print FIXRPT "Total Line # in Original File: $ori_line_number\n"; print FIXRPT "Total #of Line breaks: $brklinenum \n"; print FIXRPT "After fixing: $fixlinenum \n"; print FIXRPT "Total Line # in New File: $newlinenum \n";

In reply to Re^2: Perl to Read/Write Window Unicode Text files by maylin
in thread Perl to Read/Write Window Unicode Text files by maylin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.