Morgon - Thanks for your help.
When I run it using UTF-16, Perl finished running without any error message but my output includes some junks like following:
original data line:
5||1|||SFDC||
new data line:
簀簀㐀㘀簀簀䠀䠀簀簀ഀഊ409||1|||SFDC||
Following is my code. It works fine on ANSI text file.
# this program is to fix line breaks in a "|" (or "," with minor changes in the code) delimted file
#!/usr/bin/perl -w
##############################################################################
# Following are some parameters you may need to change before running program
##############################################################################
$file_folder="C:\\Test";
$log_folder="C:\\Test";
$ori_datafile="CT_Vendor_Summary.txt";
$new_datafile="CT_Vendor_Summary_new.txt";
$fix_log="LineBreak_fix_log.txt";
$error_log="LineBreak_error_log.txt";
$rptfile="LineBreak_fix_report.txt";
########################################################################
$brklinenum=0;
$fixlinenum=0;
$newline;
#open FH, "$file_folder\\$ori_datafile" or die "can't open file";
open FH, "<:encoding(UTF-16)", "$file_folder\\$ori_datafile" or die "can't open file";
$/="\n";
$ori_line_number=0;
$pipe_thisline=0;
$pipe_sum=0;
$right_pipes=7;
#open (NEWFILE, ">$file_folder\\$new_datafile") or die "can't open file";
open (NEWFILE, ">:encoding(UTF-16)","$file_folder\\$new_datafile") or die "can't open file";
open (FIXLOG, ">$log_folder\\$fix_log") or die "can't open file";
open (ERRLOG, ">$log_folder\\$error_log") or die "can't open file";
open (FIXRPT, ">$log_folder\\$rptfile") or die "can't open file";
while (<FH>) {
chop; # aviod \n in last field;
$ori_line_number=$ori_line_number+1;
# if ($_ =~ /\r/) {print OUT1 "$count\n";}
$pipe_thisline=($_ =~ tr/\|/\|/);
if ($ori_line_number eq 1) { print FIXRPT "Report on Fixing Line Breaks in CT_Vendor_Summary File\n\n";
print FIXRPT "Correct number of pipes in each line is $right_pipes \n\n";
}
if ($pipe_thisline eq $right_pipes) {
if ($pipe_sum eq 0) {
print NEWFILE "$_" . "\n";
}
else {
print ERRLOG "A: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n";
print ERRLOG " " . "$_" . "\n";
}
}
###
else {
if ( $pipe_thisline > $right_pipes ) {
print ERRLOG "B: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n";
print ERRLOG " " . "$_" . "\n";
}
else { if ($pipe_sum eq 0 ) {
$pipe_sum=$pipe_thisline;
print FIXLOG "Break: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n";
print FIXLOG " " . "$_"."\n";
$newline = $_;
$brklinenum=$brklinenum+1;
}
else {
$pipe_sum = $pipe_sum + $pipe_thisline;
$newline = "$newline" . " " . "$_";
$brklinenum=$brklinenum+1;
if ($pipe_sum > $right_pipes) {
print ERRLOG "C: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n";
print ERRLOG " " . "$_" . "\n";
}
elsif ($pipe_sum eq $right_pipes ) {
$fixlinenum=$fixlinenum+1;
print NEWFILE "$newline" . "\n";
print FIXLOG "Fixed: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n";
print FIXLOG " " . "$newline" . "\n";
$pipe_sum=0;
$newline="";
}
else {
print FIXLOG "Break: Original Line #: $ori_line_number; Pipes this line: $pipe_thisline; \$pipe_sum: $pipe_sum\n";
print FIXLOG " " . "$newline" . "\n";
}
}
}
}
}
$newlinenum=$ori_line_number + $fixlinenum - $brklinenum;
print FIXRPT "Total Line # in Original File: $ori_line_number\n";
print FIXRPT "Total #of Line breaks: $brklinenum \n";
print FIXRPT "After fixing: $fixlinenum \n";
print FIXRPT "Total Line # in New File: $newlinenum \n";
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.