Dismas has asked for the wisdom of the Perl Monks concerning the following question:

Gentle Monks, sweet purveyors of wisdom...

I'm trying to optimize a subroutine which uses up more than 40% of my code's runtime. I thought that using a regexp to split a line of data would be faster than calling split(), but it doesn't seem to work that way.

Here's the split:
my ($p, $l, $o, $g, $s) = split( /\s+/, $lsl );
and here's the regexp:

my ($p, $l, $o, $g, $s) = ($lsl =~ /^(.*?)\s+(.*?)\s+(.*?)\s+(.*?)\s+(.*)$/);
To further slow things down, in both subroutines the value of $lsl is gotten by the following:
my $lsl = `ls -l "$fnm" 2>&1`;
I need all the info provided by that line, but is there a quicker way to get it?

I'm still something of a novice, so perhaps I did something unperlish. I'd appreciate any thoughts or suggestions.

Thanks!

Replies are listed 'Best First'.
Re: Which witch is the quicker witch?
by broquaint (Abbot) on Aug 15, 2003 at 14:27 UTC
    Your regex will be slower as it has to a lot more thinking due to back-tracking and capturing and the like, whereas split just zips through the provided string returning chunks at \s+ intervals. As for getting information on files the opendir and readdir functions will be very useful in conjunction with the -x operators.
    HTH

    _________
    broquaint

Re: Which witch is the quicker witch?
by bm (Hermit) on Aug 15, 2003 at 14:47 UTC
    Perhaps the readdir and stat will be better than your current approach. See:

    perldoc -f readir perldoc -f stat
    Platform independent as well. hth.
    --
    bm
Re: Which witch is the quicker witch?
by tcf22 (Priest) on Aug 15, 2003 at 14:26 UTC
    In general simple stuff like spliting strings, or checking if a string has an 'a' for example, is quicker using split() or index() instead of a regex. So I would stick with the split().

    For capturing the output of a system call
    my $lsl = `ls -l "$fnm" 2>&1`;
    looks good to me.
Re: Which witch is the quicker witch?
by ChrisS (Monk) on Aug 15, 2003 at 15:02 UTC

    You might want to try using the Benchmark module to compare the speed of two (or more) approaches.

    It's part of the standard distribution, and pretty useful.

    perldoc Benchmark
Re: Which witch is the quicker witch?
by rir (Vicar) on Aug 15, 2003 at 20:37 UTC
    What people are tellng you about split, etc. is good. I would guess that the dog in your routine is the $lsl = `ls -l "$fnm" 2>&11`; line. Invoking a process is expensive, going to the filesystem is also expensive.

    ... Got called away for a few hours ...

    This is a design issue most likely. Instead of passing one filename to ls repeatedly it is better to collect your filenames and make one run of ls. Then mangle all that data to your hearts content.

Re: Which witch is the quicker witch?
by bart (Canon) on Aug 16, 2003 at 11:21 UTC
    To be honest, I don't think your main slowdown isn't in the splitting up of the data, but in the system call. After all, split and a regex do their job in a matter of microseconds.

    As you're only interested in the properties of one file, stat does indeed look like the more appropriate approach, especially since you seem to care about micro-optimizations in speed so much. It'll return you the user and group IDs as an integer, so look at getpwuid and getgrgid in scalar context, if you want the names.

    And finally, as still one more an alternative to your solution while using `ls`, you could use unpack, since `ls` produces fixed with columns. This seems to work for me:

    my ($p, $l, $o, $g, $s) = unpack "A10xA4xA8xA8xA8", `ls -l "$fnm"`;
    It will strip trailing spaces for the names, but it will not strip leading spaces, for the numbers. I don't know of a template that does strip leading spaces (see perldoc -f pack for the available templates), but it doesn't prevent you using these strings as a number anyway, so I wouldn't worry about them.