in reply to Help needed with reducing function to a single regex

it need to *not* match "." or ".." entries

How about getting rid of these just like you do with the other useless lines:

... foreach(@_) { # skip if it doesn't start with either a space or backslash next if (!/^[ \\]/); # skip if it's just "." or ".." next if (/^\s*\.{1,2}\s/); if (/^\\/) { $path = "$_\\" } elsif (/^\s*(.*\S)\s{5,}([HDRSA]+)\s*(\d+)\s*(.*)/) { my ($file,$att,$size,$date) = ($1,$2,$3,$4); my $ext = ( $file =~ /\.([^.]+)$/ ) ? $1 : "DIR"; print ...; } }

That last bit about setting $ext follows your assumption that if there's no dot in the name, it must be a directory (but I think this is not a reliable assumption). Note that file names may contain multiple periods, and I think you want $ext to hold just the characters after the last one (in your original version, a file name like "rel_3.1.tar.gz" would set $ext to "1.tar.gz").

Your expression for getting/setting $path was also a bit odd. The perlre man page says:

Also remember that "|" is interpreted as a literal within square brackets, so if you write "[fee|fie|foe]" you're really only matching "[feio|]".
And of course, directory names might include dash, period or other punctuation that wouldn't match \w. Your code made it seem like lines with initial slash would contain only a path name and nothing else, so I simplified on that basis (but I don't know if this assumption is correct).

Replies are listed 'Best First'.
Re: Re: Happy fun regexping
by cyberconte (Scribe) on Apr 07, 2002 at 12:06 UTC
    Well, taking into account the many suggestions i've recieved from both here and from fellow programmers (thank you, who's responded), i've done it! This is what i've come up with...
    sub adj3_gi { my $path='\\'; my $computer = "LAIN"; foreach(@_) { # skip if it doesn't start with either a space or backslash, or if + it starts with " ." or " .." next if ((/^[^ \\]/) || (/^ {2}\.{1,2}\s/)); # process path if first char is '\' if (/^\\/) { chop; $path = "$_\\"; } # break apart returned directory and file info elsif (/^ {2}(.*?(\.([^\.]+?))?) {5} *([HDRSA]*) +(\d+) {2}(.*)/gi +`) { #print "{$computer\:$path$1, ". (defined $3 ? $3 : "").", $4, +$5, $6 }\n"; } } }
    putting everything in that one regexp made everything much faster. However theres one anomaly that i don't quite understand. I played a little with the options at the end of the regexp, mainly "g" and "i".
    with g: 4 wallclock secs ( 3.91 usr + 0.00 sys = 3.91 CPU) with gi: 4 wallclock secs ( 3.64 usr + 0.00 sys = 3.64 CPU) with i: 7 wallclock secs ( 6.40 usr + 0.01 sys = 6.41 CPU) with none: 6 wallclock secs ( 6.77 usr + 0.00 sys = 6.77 CPU)
    I ran this several times, and the results were all similar. Now i could *possibly* understand the g making things faster. but the i? i was always under the impression (from both professors and fellow coders) that the /i would make things slower. Noone i asked can explain it. Or is this more regexp voodoo? ^_^

      Do not attempt to remove the . and .. entries with a regexp, because you will no doubt get it wrong, and cheaper alternatives exist. A reasonable way to get rid of them is:

        next if $_ eq '.' or $_ eq '..'

      An even more reasonable way is to isolate yourself from cross-platform diffences by using File::Spec.

        next if $_ eq File::Spec::curdir or $_ eq File::Spec::updir

      See Re: Is readdir ever deterministic? for the canonical question and answer on the subject.


      print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'