Re: script optimization
by MidLifeXis (Monsignor) on Dec 15, 2011 at 14:15 UTC
|
#!/usr/local/bin/perl -w
#
$fn1 = '/in.CSV';
# Useless stringification, and check your errors
# open (INST,"$fn1")
open(INST, '<', $fn1) or die "Unable to open '$fn1' for reading: $!";
# Preference, but possibly a better habit
# consider: open(ABI, ">$foo") where $foo
# contains ">blah".
# Also - check your errors
# open (ABI,">/out.ins");
open(ABI,">", "/out.ins") or die "Unable to open /out.ins: $!";
while (<INST>) {
# Moved from below - fail fast [1]
next if /^"Branchno"\t/;
next if /^""\t/;
# Not necessary [3]
# chomp;
# Fishy - does this remove the last \t? [2]
# chop;
s/\õ/\ä/g;
s/\"-/\"Ä/g;
s/\--/\-Ä/g;
s/\ - /\*-*/g;
s/\ -/\ Ä/g;
s/\*-*/\ - /g;
s/\"_/\"Ü/g;
s/\ _/\ Ü/g;
s/\__/\_Ü/g;
s/\³/\ü/g;
s/\§/\õ/g;
# @array = ' '; # Not necessary, useless
# no longer necessary [1,2]
# @array = split(/\t/);
# Moved to top of loop - fail fast [1]
# if ($array[0] eq "\"Branchno\"") { next; }
# if ($array[0] eq "\"\"" ) {next;}
# No longer necessary, assuming chop above removed \t [2]
# $result = join ("|",@array)."|";
# Replace split / join with another s/// [2]
s/\t/|/g;
# Work on $_, no longer need $result
# $result =~ s/\"//g;
# print ABI $result,"\n";
s/"//g;
# newline not necessary since chomp removed [3]
print;
}
close (INST);
close (ABI);
- If you fail fast, you can avoid doing the s/// on the discarded lines.
- If "\t" was removed by the chop, this will handle it as well
- If you don't chomp, you don't need to print the newline, as it is still there
There are also some tr/// uses that could possibly make this faster (see trizen's post above). However given the type of input data I am assuming from your code, this is a very fragile solution.
| [reply] [d/l] [select] |
Re: script optimization
by Tux (Canon) on Dec 15, 2011 at 14:18 UTC
|
your loop does chomp AND chop. Why? It there a trailing character that needs to be removed?
Your loop initializes @array twice in every iteration. That takes unneeded time (you asked for speedups)
You escape too many characters that do not need escaping.
You can combine single character replacements into a single tr/// call
use strict;
use warnings;
my ($fi, $fo) = ("/in.CSV", "/out.ins");
open my $hi, "<", $fi or die "$fi: $!\n";
open my $ho, ">", $fo or die "$fo: $!\n";
while (<$hi>) {
chomp;
chop; # <-- is this really needed?
tr/õ³§/äüõ/;
s/ - /*-*/g;
s/\*-*/ - /g; # <-- is this really what you want?
s/([" _])-/$1Ä/g;
s/([" _])_/$1Ü/g;
my @array = split /\t/ => $_, -1;
$array[0] =~ m/^"(?:Branchno)?"$/ and next;
(my $result = join "|" => @array, "") =~ tr/"//d;
print $ho $result, "\n";
}
close $hi;
close $ho;
Enjoy, Have FUN! H.Merijn
);
open my $hi, | [reply] [d/l] [select] |
Re: script optimization
by BrowserUk (Patriarch) on Dec 15, 2011 at 13:42 UTC
|
| [reply] |
Re: script optimization
by marto (Cardinal) on Dec 15, 2011 at 13:46 UTC
|
| [reply] |
Re: script optimization
by Anonymous Monk on Dec 15, 2011 at 13:45 UTC
|
my %Replacement = {
q/"-/ => q/"Ä/,
...
);
...
while ...
s/($rere)/$Replacement{$1}/g;
...
| [reply] [d/l] |
Re: script optimization
by trizen (Hermit) on Dec 15, 2011 at 14:01 UTC
|
#!/usr/bin/perl
use warnings;
use strict;
my $fn1 = '/in.CSV';
open INST, '<', $fn1 or die $!;
open ABI, '>', '/out.ins' or die $!;
while (defined($_ = <INST>)) {
next if substr($_, 0, 3) eq qq{""\t};
next if substr($_, 0, 11) eq qq{"Branchno"\t};
tr/õ³§/äüõ/;
s/(["-])-/$1Ä/g;
s/ - /*-*/g;
s/ -/ Ä/g;
s/\*-\*/ - /g;
s/([" _])_/$1Ü/g;
chomp $_;
chop $_;
tr/"//d;
tr/\t/|/;
print ABI "${_}|\n";
}
close INST;
close ABI;
| [reply] [d/l] |
Re: script optimization
by thargas (Deacon) on Dec 15, 2011 at 13:53 UTC
|
Your script isn't doing much, so I'd guess you have a lot of data or a slow machine to run it on.
Please provide:
- sample data
- expected output for that data
- size of input file (bytes and records)
| [reply] |
Re: script optimization
by Tux (Canon) on Dec 15, 2011 at 14:19 UTC
|
your loop does chomp AND chop. Why? It there a trailing character that needs to be removed?
Your loop initializes @array twice in every iteration. That takes unneeded time (you asked for speedups)
You escape too many characters that do not need escaping.
You can combine single character replacements into a single tr/// call
use strict;
use warnings;
my ($fi, $fo) = ("/in.CSV", "/out.ins");
open my $hi, "<", $fi or die "$fi: $!\n";
open my $ho, ">", $fo or die "$fo: $!\n";
while (<$hi>) {
chomp;
chop; # <-- is this really needed?
tr/õ³§/äüõ/;
s/ - /*-*/g;
s/\*-*/ - /g; # <-- is this really what you want?
s/([" _])-/$1Ä/g;
s/([" _])_/$1Ü/g;
my @array = split /\t/ => $_, -1;
$array[0] =~ m/^"(?:Branchno)?"$/ and next;
(my $result = join "|" => @array, "") =~ tr/"//d;
print $ho $result, "\n";
}
close $hi;
close $ho;
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
|
my ($fi, $fo) = ("/in.CSV", "/out.ins");
open my $hi, "<", $fi or die "$fi: $!\n";
open my $ho, ">", $fo or die "$fo: $!\n";
just my 2cents... apart of the chomp-chop issue and the other things said, can't see the point on to create two extra variables here, why simply not do this?
open my $hi, "<", "/in.CSV" or die $!;
open my $ho, ">", "/out.ins" or die $!;
You could also do something with this:
if ($array[0] eq "\"Branchno\"") { next; }
if ($array[0] eq "\"\"" ) {next;}
You want to discard this lines?, then do it as early as you can, put both before the substitution lines. In a very big file this can make a difference.
While(<INST>) {
next if /^\"(Branchno|)\"/;
s/...
| [reply] [d/l] [select] |
|
|
Because the die message does not include the file name.
Another reason to do so, is that when you would do it really neat, you'd also check the close calls, and you'd still have the filenames ready for the diagnostics if those fail.
A third reason for this would be that it is now easy to rewrite the script to take arguments from the command line and replace the default names.
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
Re: script optimization
by mctaylor (Novice) on Dec 16, 2011 at 21:55 UTC
|
In addition to the other suggestions, I would add that you want to compile the RegEx only once, since they appear to be unchanging, at least during the execution of the script.
This can be done with either the /o modifier, which tells Perl to compile the regex only once. And potentially use with the qr "quote regex" (perlop#Regexp-Quote-Like-Operators).
| [reply] |