comment on

I think your efficiency concern is about the wrong thing. Look at the results of this (call like your program but with one more arg that indicates iterations to perform):

#!/usr/bin/perl -w

use strict;
use Benchmark;

my $uclc    = shift or Usage();
my $infile  = shift or Usage();
my $outfile = shift or Usage();

my $iterations = shift || 10;

Usage() unless ($uclc eq 'lc' or 'uc');

timethese($iterations, {
    original => sub { setup(); original(); teardown(); },
    Masem_1  => sub { setup(); Masem_1();  teardown(); },
    Masem_2  => sub { setup(); Masem_2();  teardown(); },
    clemburg => sub { setup(); clemburg(); teardown(); },
});

sub original {
my @in = <IN>;
my @munged;
for(@in) {
   my $munged = lc() if ($uclc eq 'lc');
   $munged    = uc() if ($uclc eq 'uc');
   push @munged, $munged;
}
print OUT for(@munged);
}

sub Masem_1 {
my @in = <IN>;
@in = map { $uclc eq 'lc' ? lc : uc } @in;
print OUT for(@in);
}

sub Masem_2 {
my @in = <IN>;
@in = $uclc eq 'lc' ? map { lc } @in : map { uc } @in;
print OUT for(@in);
}

sub clemburg {
while (<IN>) {
   my $munged = lc() if ($uclc eq 'lc');
   $munged    = uc() if ($uclc eq 'uc');
   print OUT $munged;
}
}

sub setup {
open (IN, "< $infile") 
    or die "Error opening $infile for read: $!";
open (OUT, "> $outfile") 
    or die "Error opening $outfile for write: $!";
}


sub teardown {
close IN 
    or die "Error closing $infile after write: $!";
close OUT 
    or die "Error closing $outfile after write: $!";
}

######################################################
sub Usage {
    die "\n Usage: uclc.pl (lc|uc) infile outfile\n";
}
######################################################
[download]

The results for 10 iterations on my machine are like this, using a 1MB text file with mixed case and some markup characters:

> perl benchmark.pl lc terms.por terms.try 10
Benchmark: timing 10 iterations of Masem_1, Masem_2, clemburg, origina
+l...
   Masem_1: 14 wallclock secs (11.01 usr +  1.34 sys = 12.36 CPU)
   Masem_2: 16 wallclock secs ( 9.88 usr +  0.84 sys = 10.72 CPU)
  clemburg: 10 wallclock secs ( 5.11 usr +  0.75 sys =  5.86 CPU)
  original: 32 wallclock secs (10.52 usr +  0.98 sys = 11.50 CPU)
[download]

Obviously, the approach you and Masem use is very resource-intensive, reading the whole file into memory. That could kill your process or cause worse things if you work with really large files.

Christian Lemburg
Brainbench MVP for Perl
http://www.brainbench.com

In reply to Re: More efficient munging if infile very large by clemburg
in thread More efficient munging if infile very large by ybiC

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.