redskie007 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I was given a file to reformat, the sample is shown below:

MSC_name,chunum,date,count
MSCBCR1,1,01-JUL-03,168
MSCBCR1,1,02-JUL-03,163
MSCBCR1,1,03-JUL-03,166
MSCBCR1,1,04-JUL-03,159
MSCBCR1,1,05-JUL-03,161
MSCBCR1,1,06-JUL-03,161
MSCBCR1,1,07-JUL-03,159
MSCBCR1,1,08-JUL-03,158
MSCBCR1,1,09-JUL-03,160
MSCBCR1,1,10-JUL-03,161
MSCBCR1,1,11-JUL-03,164
MSCBCR1,1,12-JUL-03,166
MSCBCR1,1,13-JUL-03,165
MSCBCR1,1,14-JUL-03,160
MSCBCR1,1,15-JUL-03,163
MSCBCR1,1,16-JUL-03,170
MSCBCR1,1,17-JUL-03,162
MSCBCR1,1,18-JUL-03,161
MSCBCR1,1,19-JUL-03,165
MSCBCR1,1,20-JUL-03,162
MSCBCR1,1,21-JUL-03,155
MSCBCR1,1,22-JUL-03,160
MSCBCR1,1,23-JUL-03,161
MSCBCR1,1,24-JUL-03,159
MSCBCR1,1,25-JUL-03,160
MSCBCR1,1,26-JUL-03,161
MSCBCR1,1,27-JUL-03,170
MSCBCR1,1,28-JUL-03,170
MSCBCR1,1,29-JUL-03,170
MSCBCR1,1,30-JUL-03,170
MSCBCR2,1,01-JUL-03,168
MSCBCR2,1,02-JUL-03,163
MSCBCR2,1,03-JUL-03,166
MSCBCR2,1,04-JUL-03,159
MSCBCR2,1,05-JUL-03,161
MSCBCR2,1,06-JUL-03,161
MSCBCR2,1,07-JUL-03,159
MSCBCR2,1,08-JUL-03,158
MSCBCR2,1,09-JUL-03,160
MSCBCR2,1,10-JUL-03,161
MSCBCR2,1,11-JUL-03,164
MSCBCR2,1,12-JUL-03,166
MSCBCR2,1,13-JUL-03,165
MSCBCR2,1,14-JUL-03,160
MSCBCR2,1,15-JUL-03,163
MSCBCR2,1,16-JUL-03,170
MSCBCR2,1,17-JUL-03,162
MSCBCR2,1,18-JUL-03,161
MSCBCR2,1,19-JUL-03,165
MSCBCR2,1,20-JUL-03,162
MSCBCR2,1,21-JUL-03,155
MSCBCR2,1,22-JUL-03,160
MSCBCR2,1,23-JUL-03,161
MSCBCR2,1,24-JUL-03,159
MSCBCR2,1,25-JUL-03,160
MSCBCR2,1,26-JUL-03,161
MSCBCR2,1,27-JUL-03,170
MSCBCR2,1,28-JUL-03,170
MSCBCR2,1,29-JUL-03,170
MSCBCR2,1,30-JUL-03,170
The output that will be needed will be:
MSC_name,count_per_date1...end_date(e.g. 30-JUL-03)
total_cf_produced,total_per_date...total_per_date(e.g. end_date 30-JUL-03
I've been trying out this code but can't get it to work:

#!c:/perl/bin/perl open (INFILE,"c:/playgnd/sql_result_cf.txt") || die("Could not open fi +le!"); while( <INFILE> ) { next if $. == 1; # skip first line chomp; my %F; @F{qw(msc_name chunum date count_per_date)} = split /\s*,\s*/; $Data{$F{date}} = { } unless exists $Data{$F{msc_name}}; $Data{$F{date}}->{$F{msc_name}} = [('')x5] unless ref( $Data{$F{date}}->{$F{msc_name}}); ${$Data{$F{date}}->{$F{msc_name}}}[$F{hour}] = $F{count_per_hour}; } foreach my $date (keys %Data) { foreach my $msc_name (keys %{$Data{$date}}) { print join(",", $date, $msc_name, @{$Data{$date}->{$msc_name}}), +"\n" ; } } close(INFILE);

Edit: Added <code> tags, larsen.

Replies are listed 'Best First'.
Re: Formating Text file
by CountZero (Bishop) on Aug 15, 2003 at 10:00 UTC

    What do you think of this:

    use strict; use warnings; my %msc_names; my %dates; my $grand_total; <DATA>; #throw away first line while (<DATA>) { chomp; my ($msc_name, $undef, $date, $count) = split /,/; #we don't use c +hunum, so we throw it away $date=~s/(\d\d)-(\d\d)-(\d\d)/$3-$2-$1/; #normalize date so it is +easier to sort $grand_total += $count; $msc_names{$msc_name}{$date} += $count; $dates{$date} += $count; } print "The grand total is: $grand_total\n\n"; print "Detailed results (by msc_name):\n"; for my $msc_name (sort keys %msc_names) { print "\t$msc_name:\n"; for my $date (sort keys %{$msc_names{$msc_name}}) { my $date2=$date; $date2=~s/(\d\d)-(\d\d)-(\d\d)/$3-$2-$1/; #reformat the date print "\t\t$date2: $msc_names{$msc_name}{$date}\n"; } } print "\nDetailed results (by date):\n"; for my $date (sort keys %dates) { my $date2=$date; $date2=~s/(\d\d)-(\d\d)-(\d\d)/$3-$2-$1/; #reformat the date print "\t$date2: $dates{$date}\n"; } __DATA__ MSC_name,chunum,date,count MSCBCR1,1,01-JUL-03,168 MSCBCR1,1,02-JUL-03,163 MSCBCR1,1,03-JUL-03,166 MSCBCR1,1,04-JUL-03,159 MSCBCR1,1,05-JUL-03,161 MSCBCR1,1,06-JUL-03,161 MSCBCR1,1,07-JUL-03,159 MSCBCR1,1,08-JUL-03,158 MSCBCR1,1,09-JUL-03,160 MSCBCR1,1,10-JUL-03,161 MSCBCR1,1,11-JUL-03,164 MSCBCR1,1,12-JUL-03,166 MSCBCR1,1,13-JUL-03,165 MSCBCR1,1,14-JUL-03,160 MSCBCR1,1,15-JUL-03,163 MSCBCR1,1,16-JUL-03,170 MSCBCR1,1,17-JUL-03,162 MSCBCR1,1,18-JUL-03,161 MSCBCR1,1,19-JUL-03,165 MSCBCR1,1,20-JUL-03,162 MSCBCR1,1,21-JUL-03,155 MSCBCR1,1,22-JUL-03,160 MSCBCR1,1,23-JUL-03,161 MSCBCR1,1,24-JUL-03,159 MSCBCR1,1,25-JUL-03,160 MSCBCR1,1,26-JUL-03,161 MSCBCR1,1,27-JUL-03,170 MSCBCR1,1,28-JUL-03,170 MSCBCR1,1,29-JUL-03,170 MSCBCR1,1,30-JUL-03,170 MSCBCR2,1,01-JUL-03,168 MSCBCR2,1,02-JUL-03,163 MSCBCR2,1,03-JUL-03,166 MSCBCR2,1,04-JUL-03,159 MSCBCR2,1,05-JUL-03,161 MSCBCR2,1,06-JUL-03,161 MSCBCR2,1,07-JUL-03,159 MSCBCR2,1,08-JUL-03,158 MSCBCR2,1,09-JUL-03,160 MSCBCR2,1,10-JUL-03,161 MSCBCR2,1,11-JUL-03,164 MSCBCR2,1,12-JUL-03,166 MSCBCR2,1,13-JUL-03,165 MSCBCR2,1,14-JUL-03,160 MSCBCR2,1,15-JUL-03,163 MSCBCR2,1,16-JUL-03,170 MSCBCR2,1,17-JUL-03,162 MSCBCR2,1,18-JUL-03,161 MSCBCR2,1,19-JUL-03,165 MSCBCR2,1,20-JUL-03,162 MSCBCR2,1,21-JUL-03,155 MSCBCR2,1,22-JUL-03,160 MSCBCR2,1,23-JUL-03,161 MSCBCR2,1,24-JUL-03,159 MSCBCR2,1,25-JUL-03,160 MSCBCR2,1,26-JUL-03,161 MSCBCR2,1,27-JUL-03,170 MSCBCR2,1,28-JUL-03,170 MSCBCR2,1,29-JUL-03,170 MSCBCR2,1,30-JUL-03,170

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Um, this part won't work:
      $date=~s/(\d\d)-(\d\d)-(\d\d)/$3-$2-$1/; #normalize date so it is +easier to sort
      ... at least, not until you do something to convert the 3-letter month abbrevs in your DATA (and in the OP) into two-digit month numbers. (Date::Calc is the do-all module, of course, but for something like this, just a 12-element lookup hash would do fine.)
Re: Formating Text file
by esh (Pilgrim) on Aug 15, 2003 at 09:30 UTC

    1. When you post code surround it by <code> and </code>. This way the indentation and square brackets will not get lost.

    2. Always use the "-w" option and "use strict;" in your programs. This will help identify bugs. The "-T" option is also fun to add to keep yourself honest. If/when you get warnings about the use of tainted data then you sit down and think about the security implications of your program.

    3. In your code you create a data structure which has an array of five elemnts which are indexed by a field called "hour". (a) I find it strange that "hour" would only have five values. (b) There is no field called "hour" in the input you parse, nor does the data itself seem to have anything resembling an hour.

    4. There seems to be an extra dereferencing on the last line of the first loop.

    5. The print statement does not seem to match the number/types of fields you describe should be output (though it's not easy for me to understand from the description exactly what is desired).

    Here is your program with a few tweaks. It generates output, but has the warnings about uninitialized values caused by the fact that there is no "hour" input field.

    #!c:/perl/bin/perl -wT use strict; my %Data; open (INFILE,"c:/playgnd/sql_result_cf.txt") || die("Could not open file!"); my $header = <INFILE>; while( <INFILE> ) { chomp; my %F; @F{qw(msc_name chunum date count_per_date)} = split /\s*,\s*/; $Data{$F{date}} = { } unless exists $Data{$F{msc_name}}; $Data{$F{date}}->{$F{msc_name}} = [('')x5] unless ref( $Data{$F{date}}->{$F{msc_name}}); $Data{$F{date}}->{$F{msc_name}}[$F{hour}] = $F{count_per_hour}; } foreach my $date (keys %Data) { foreach my $msc_name (keys %{$Data{$date}}) { print join(",", $date, $msc_name, @{$Data{$date}->{$msc_name}}), "\n" ; } } close(INFILE);

    This obviously does not do what you want, but it runs and is perhaps it is further in the right direction.

    -- Eric Hammond

Re: Formating Text file
by redskie007 (Initiate) on Aug 15, 2003 at 06:10 UTC
    my code was:
    #!c:/perl/bin/perl open (INFILE,"c:/playgnd/sql_result_cf.txt") || die("Could not open fi +le!"); while( <INFILE> ) { next if $. == 1; # skip first line chomp; my %F; @F{qw(msc_name chunum date count_per_date)} = split /\s*,\s*/; $Data{$F{date}} = { } unless exists $Data{$F{msc_name}}; $Data{$F{date}}->{$F{msc_name}} = [('')x5] unless ref( $Data{$F{date}}->{$F{msc_name}}); ${$Data{$F{date}}->{$F{msc_name}}}[$F{hour}] = $F{count_per_hour}; } foreach my $date (keys %Data) { foreach my $msc_name (keys %{$Data{$date}}) { print join(",", $date, $msc_name, @{$Data{$date}->{$msc_name}}), +"\n" ; } } close(INFILE);

    edited by ybiC: balanced <code> tags

      Hi,

      You might want to consider providing more and better formatted information here. I've looked at this, tried to run your code and failed but because I really can't work out exactly what you are trying to do it's really hard to try and fix it.

      Starting with a smaller example of the input data, a corresponding example of what you are trying to output and possibly some comments in your code would go a long way to making it all a lot clearer for people wanting to help you.

      Struan

        Well I'm actually pulling the data from a database and saved it in the format given in the question then display it on the web, the formatting changed to just MSC_name,count_per_date1...end_date(e.g. 30-JUL-03) for example:

        MSC_name,count_per_date1...end_date30(e.g. 30-JUL-03)
        MSCBCR1,168,...,170
        MSCBCR2,178,...189
        Total_cf_produced,346,..,359

        The code I'm currently testing is:

        #!c:/perl/bin/perl open (INFILE,"c:/playgnd/sql_result_cf.txt") || die("Could not open fi +le!"); while( <INFILE> ) { next if $. == 1; # skip first line chomp; my %F; @F{qw(msc_name chunum date count_per_date)} = split /\s*,\s*/; $Data{$F{date}} = { } unless exists $Data{$F{msc_name}}; $Data{$F{date}}->{$F{msc_name}} = [('')x31] unless ref( $Data{$F{date}}->{$F{msc_name}}); ${$Data{$F{date}}->{$F{msc_name}}}[$F{chunum}] = $F{count_per_date}; } foreach my $date (keys %Data) { foreach my $msc_name (keys %{$Data{$date}}) { print join(",",$msc_name,$date, @{$Data{$date}->{$msc_name}}), "\ +n" ; } } close(INFILE);
        Thanks.
Re: Formating Text file
by redskie007 (Initiate) on Aug 18, 2003 at 01:00 UTC
    Well I'm actually pulling the data from a database and saved it in the format given in the question then display it on the web, the formatting changed to just MSC_name,count_per_date1...end_date(e.g. 30-JUL-03) for example:
    MSC_name,count_per_date1...end_date30(e.g. 30-JUL-03)
    MSCBCR1,168,...,170
    MSCBCR2,178,...189
    Total_cf_produced,346,..,359
    Thanks.