rfc: logging changes per file from subroutines

Lotus1 has asked for the wisdom of the Perl Monks concerning the following question:

I started off with short simple scripts to modify thousands of xml files. I would put some simple print statements in print "text,text,text\n";to log what was happening in each file in CSV format. The more I learned about XML::LibXML the more complex my scripts became and the more data points I wanted to log. I switched to print "$str1,$str2,$str3,$str4\n";. All the action was being done in a 'run_per_file' type sub. I ended up with a couple of hundred lines of code that is getting hard to keep track of in that sub. I wanted to break it up into smaller subs but the whole print statement problem was too much hassle. If something was found in an xml file that caused the need to skip to the next file I couldn't just use "return" since it would only return from that sub, not the 'run_per_file' subroutine. That means I needed to test the return value of the subs from within the 'run_per_file' subroutine.

So I have two problems here:

Logging CSV format without having to change a bunch of print statements to add a column. Also I need to reinitialize the column values each time the 'run_per_file' subroutine started.

Breaking up a large subroutine and still being able to print and skip to the next file from within a sub-sub.

I have looked at log4perl but by the time you format the string you pass it you have these same problems.

I'm not thrilled with the approach below. I'm using the csv hash for program variables. I'm using the global constant in my subroutines. Is there some more straightforward or standard way to do this?

#!/usr/bin/perl
use warnings;
use strict;

use constant CSV_COLUMNS => qw(
    file analog_id analog_cid epp_res_ck note
);
header();

#my @substation_files = glob( "*_subeditor.xml" );
my @substation_files = qw( test1.xml test2.xml );

if (@substation_files == 0) { die "No _subeditor.xml files found.\n" }

run_per_file($_) foreach @substation_files;

#**************************************************
#
#
sub run_per_file { 

    my ($xml_file) = @_;     
    
    my %csvdata;

    #my @vars = qw( file analog_id analog_cid epp_res_ck note);
    
    #$csvdata{$_} = '' foreach @vars;
    $csvdata{$_} = '' foreach (CSV_COLUMNS);
    
    $csvdata{file} = $xml_file;
    $csvdata{analog_cid} = "test.53.cb.stts";
    $csvdata{epp_res_ck} = find_epp(\%csvdata);
    
    return unless $csvdata{epp_res_ck} ne '';
    
    print_line(\%csvdata);
}

sub find_epp {
    my $hr = shift;
    
    if($hr->{file} eq "test1.xml") {
        return 'DEVTYPE=SW';
    }
    else {
        $hr->{note} .= ":error @ find_epp. ";
        print_line($hr);
        return '';
    }
}

sub print_line {

    my $hr = shift;

    #print "$hr->{file}, $hr->{analog_id}, $hr->{analog_cid}, $hr->{ep
+p_res_ck}, $hr->{note}\n";
    
    my $string = join ',', map {"$hr->{$_}"} ( CSV_COLUMNS );
    print $string, "\n";
}
 
#########################################################
# 
# subroutine to print header information, csv format.
#
sub header {
    my $timestamp=localtime;
    $timestamp =~ s/(.*) (\d{4})/$2 $1/; 

    print "report generated by $0\n";
    print "$timestamp\n";
    print "This program does something. \n\n";
    #print "Substation filename,analog_id,analog_cid,epp_res_ck, note\
+n";
    print join ',',( CSV_COLUMNS );
    print "\n";
}            

__DATA__
report generated by printsub.pl
2011 Wed Nov  9 10:16:50
This program does something. 

file,analog_id,analog_cid,epp_res_ck,note
test1.xml,,test.53.cb.stts,DEVTYPE=SW,
test2.xml,,test.53.cb.stts,,:error @ find_epp.
[download]

Comment on rfc: logging changes per file from subroutines Select or Download Code

Replies are listed 'Best First'.
Re: rfc: logging changes per file from subroutines by RichardK (Parson) on Nov 09, 2011 at 16:58 UTC
If I've understood you correctly, your problem seems to be that you've decided to log to a CSV file. So it must have a fixed column order -- the only suggestion I can make is don't do that, then :) Most logs are free format text, but if you want to process them later you could identify each item, so something like `file:testfile.xml, id:42, note:"it all gone wrong" file:test3.xml, DEVTYPE:SW, cid:42a, note:"bingo"` [download] So removing the need to have the items in any particular order and it's simple to load each line into a hash	[reply] [d/l]
Re^2: rfc: logging changes per file from subroutines by Lotus1 (Vicar) on Nov 09, 2011 at 19:30 UTC
I need csv format so I can make sense of what happened by sorting and browsing in Excel. There are usually 5000-10,000 rows but a hopefully small number of problems. From your suggestion I could log free format and then at the end sort each line of the logfile into the desired order. I'm already loading each data point into a hash and then using a subroutine to print the hash in order. Since I already have a hash I don't see how that would help.	[reply]