Which witch is the quicker witch?

Dismas has asked for the wisdom of the Perl Monks concerning the following question:

Gentle Monks, sweet purveyors of wisdom...

I'm trying to optimize a subroutine which uses up more than 40% of my code's runtime. I thought that using a regexp to split a line of data would be faster than calling split(), but it doesn't seem to work that way.

Here's the split:

  my ($p, $l, $o, $g, $s) = split( /\s+/, $lsl );
[download]

and here's the regexp:

  my ($p, $l, $o, $g, $s) = ($lsl =~ 
      /^(.*?)\s+(.*?)\s+(.*?)\s+(.*?)\s+(.*)$/);
[download]

To further slow things down, in both subroutines the value of $lsl is gotten by the following:

  my $lsl = `ls -l "$fnm" 2>&1`;
[download]

I need all the info provided by that line, but is there a quicker way to get it?

I'm still something of a novice, so perhaps I did something unperlish. I'd appreciate any thoughts or suggestions.

Thanks!

Comment on Which witch is the quicker witch? Select or Download Code

Replies are listed 'Best First'.
Re: Which witch is the quicker witch? by broquaint (Abbot) on Aug 15, 2003 at 14:27 UTC
Your regex will be slower as it has to a lot more thinking due to back-tracking and capturing and the like, whereas `split` just zips through the provided string returning chunks at `\s+` intervals. As for getting information on files the `opendir` and `readdir` functions will be very useful in conjunction with the `-x` operators. HTH `_________ broquaint`	[reply]
Re: Which witch is the quicker witch? by bm (Hermit) on Aug 15, 2003 at 14:47 UTC
Perhaps the `readdir` and `stat` will be better than your current approach. See: `perldoc -f readir perldoc -f stat` [download] Platform independent as well. hth. -- bm	[reply] [d/l] [select]
Re: Which witch is the quicker witch? by tcf22 (Priest) on Aug 15, 2003 at 14:26 UTC
In general simple stuff like spliting strings, or checking if a string has an 'a' for example, is quicker using `split()` or `index()` instead of a regex. So I would stick with the split(). For capturing the output of a system call my $lsl = `ls -l "$fnm" 2>&1`; [download] looks good to me.	[reply] [d/l] [select]
Re: Which witch is the quicker witch? by ChrisS (Monk) on Aug 15, 2003 at 15:02 UTC
You might want to try using the Benchmark module to compare the speed of two (or more) approaches. It's part of the standard distribution, and pretty useful. `perldoc Benchmark` [download]	[reply] [d/l]
Re: Which witch is the quicker witch? by rir (Vicar) on Aug 15, 2003 at 20:37 UTC
What people are tellng you about split, etc. is good. I would guess that the dog in your routine is the $lsl = `ls -l "$fnm" 2>&11`; line. Invoking a process is expensive, going to the filesystem is also expensive. ... Got called away for a few hours ... This is a design issue most likely. Instead of passing one filename to `ls` repeatedly it is better to collect your filenames and make one run of `ls`. Then mangle all that data to your hearts content.	[reply] [d/l] [select]
Re: Which witch is the quicker witch? by bart (Canon) on Aug 16, 2003 at 11:21 UTC
To be honest, I don't think your main slowdown isn't in the splitting up of the data, but in the system call. After all, split and a regex do their job in a matter of microseconds. As you're only interested in the properties of one file, stat does indeed look like the more appropriate approach, especially since you seem to care about micro-optimizations in speed so much. It'll return you the user and group IDs as an integer, so look at getpwuid and getgrgid in scalar context, if you want the names. And finally, as still one more an alternative to your solution while using `ls`, you could use unpack, since `ls` produces fixed with columns. This seems to work for me: my ($p, $l, $o, $g, $s) = unpack "A10xA4xA8xA8xA8", `ls -l "$fnm"`; [download] It will strip trailing spaces for the names, but it will not strip leading spaces, for the numbers. I don't know of a template that does strip leading spaces (see perldoc -f pack for the available templates), but it doesn't prevent you using these strings as a number anyway, so I wouldn't worry about them.	[reply] [d/l]