Re: Reduce the time taken for Huge Log files

I think that this consolidating and splitting logs method is very inefficient but i think that i could help just by cleaning the code to some degree....

#!/usr/bin/perl

use strict;
use warnings;

#-------------------------------------------------------

sub generate_date_str ($) {
    my ($time) = @_;

    my ($mday,$mon,$year) = (localtime($time))[3,4,5];
    $year += 1900;
    $mon++;

    return sprintf("%04d%02d%02d", $year, $mon, $mday);
}

#-------------------------------------------------------

sub get_matching_filenames ($$) {
    my ($dir, $match_str) = @_;

    opendir(DIR, $dir) or die "couldn't open directory \"$dir\"";
    my @names = grep {/$match_str/} readdir(DIR);
    closedir(DIR);

    return @names;
}

#-------------------------------------------------------

sub consolidate_logs ($$$) {
    my ($destination_file, $dir, $filename_str) = @_;

    my @files = get_matching_filenames($dir, $filename_str);

    open(OUT,"> $destination_file") or die "Could not open file \"$des
+tination_file\" for writing";

    foreach my $source_file (@files) {
        print "Processing of log \"$source_file\" started at " . local
+time() . "\n";

        open(OLD,"< $dir/$source_file") or die("Could not open file \"
+$dir/$source_file\" for reading");
        while (<OLD>) {
            print OUT $_;
        }
        close(OLD);

        print "Processing of log \"$source_file\" ended at " . localti
+me() . ".\n";
    }

    close(OUT);
}

#-------------------------------------------------------

sub split_logs ($$$) {
    my ($source_file, $business_list, $filename_prefix) = @_;

    foreach my $business (@$business_list) {
        my ($domain, $file) = @$business;

        my $outfile = "/inside29/urchin/test/newfeed/$filename_prefix-
+$file";
        my $newigebusiness = $domain;

        print  "Creating of log for $newigebusiness started at " . loc
+altime() . "\n";

        open(OUT,">> $outfile") || die("Could not open out file \"$out
+file\" for appending");
        open(OLD,"< $source_file") || die ("Could not open the consoli
+dated file \"$source_file\" for reading");
        while (<OLD>) {
            if ((index($_,$newigebusiness))> -1) { 
                print OUT $_;
            }
        }
        close(OLD);
        close(OUT);

        print "Log for $newigebusiness created at " . localtime() . "\
+n";
    }
}

#-------------------------------------------------------

my @businesses = (
 [ "\"corp.home.ge.com\"", "new_corp_home_ge_com.log" ],
 [ "\"scotland.gcf.home.ge.com\", "new_scotland_gcf_home_ge_com.log" ]
+,
 [ "\"marketing.ge.com\"", "new_marketing_ge_com.log" ]
);

my $consolidated_log = "consolidatedlog.txt";
my $logfiles_dir = '/inside29/urchin/test/logfiles';

my $today = generate_date_str( time() );
my $yesterday = generate_date_str( time() - (24 * 60 * 60) );

consolidate_logs($consolidated_log, $logfiles_dir, $yesterday);

split_logs($consolidated_log, \@businesses, $today);
[download]

hope that helps.
bartek

Comment on Re: Reduce the time taken for Huge Log files Download Code

Replies are listed 'Best First'.
Re^2: Reduce the time taken for Huge Log files by Anonymous Monk on Mar 21, 2005 at 01:39 UTC
consolidate_logs function could be then optimized to this `sub consolidate_logs ($$$) { my ($destination_file, $dir, $filename_str) = @_; my @files = get_matching_filenames($dir, $filename_str); open(OUT,"> $destination_file") or die "Could not open file \"$des +tination_file\" for writing"; foreach my $source_file (@files) { print "Processing of log \"$source_file\" started at " . local +time() . "\n"; system("cat $dir/$source_file >> $destination_file"); print "Processing of log \"$source_file\" ended at " . localti +me() . ".\n"; } close(OUT); }` [download] (using "cat" program instead of perl code for simply transferring large quantities of data) or even to this `sub consolidate_logs ($$$) { my ($destination_file, $dir, $filename_str) = @_; system("ls $dir \| grep $filename_str \| xargs -iX cat $dir/X >> $de +stination_file"); }` [download] split_logs function could be simplified to this `sub split_logs ($$$) { my ($source_file, $business_list, $filename_prefix) = @_; foreach my $business (@$business_list) { my ($name, $file) = @$business; my $outfile = "/inside29/urchin/test/newfeed/$filename_prefix- +$file"; print "Creating of log for $name started at " . localtime() . + "\n"; system("grep \"$name\" $source_file >> $outfile"); print "Log for $name created at " . localtime() . "\n"; } }` [download] again - using external program ("grep" this time) for simple string matching but in big quantities of data. bartek	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Reduce the time taken for Huge Log files
by Anonymous Monk on Mar 21, 2005 at 01:39 UTC

sub consolidate_logs ($$$) {
    my ($destination_file, $dir, $filename_str) = @_;

    my @files = get_matching_filenames($dir, $filename_str);

    open(OUT,"> $destination_file") or die "Could not open file \"$des
+tination_file\" for writing";

    foreach my $source_file (@files) {
        print "Processing of log \"$source_file\" started at " . local
+time() . "\n";

        system("cat $dir/$source_file >> $destination_file");

        print "Processing of log \"$source_file\" ended at " . localti
+me() . ".\n";
    }

    close(OUT);
}
[download]

sub consolidate_logs ($$$) {
    my ($destination_file, $dir, $filename_str) = @_;

    system("ls $dir | grep $filename_str | xargs -iX cat $dir/X >> $de
+stination_file");
}
[download]

sub split_logs ($$$) {
    my ($source_file, $business_list, $filename_prefix) = @_;

    foreach my $business (@$business_list) {
        my ($name, $file) = @$business;

        my $outfile = "/inside29/urchin/test/newfeed/$filename_prefix-
+$file";

        print  "Creating of log for $name started at " . localtime() .
+ "\n";

        system("grep \"$name\" $source_file >> $outfile");

        print "Log for $name created at " . localtime() . "\n";
    }
}
[download]

[reply]
[d/l]
[select]