Mad_Mac has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am trying to learn to write more compact code. I currently have a bit of a script that builds a hash of all the AVI files in my movies directory (see below). It works just fine, but it seems like I could tighten this up eliminating my initial regex and/or the loop over the @temp array.

In case it's not obvious, the values of the hash are a string with the season and episode (if it exists) of each AVI

Any suggestions on clever ways to make this code more compact?

our $dir = `dir $movies\\*.avi /s`; our %inventory my @temp=$dir =~ /,\d{3} ((.*?)\.avi)/gi; for $temp (@temp) { if (($temp=~/(.+?)(\d{1,2}x\d{1,2})/i) or ($temp=~/(.*?)s\d{1,2}e\d{1,2})/i) ) { $inventory{$1}=$2; } }

If it matters, I am running this script on Windows 7 x64 and Strawberry Perl ver 5.12.1.

Thanks.

Replies are listed 'Best First'.
Re: Writing compact conditional regex's
by moritz (Cardinal) on Feb 06, 2011 at 12:20 UTC
    Not a real answer to your question, but you can get rid of the @temp variable:
    while ($dir =~ /,\d{3} ((.*?)\.avi)/gi) { # work with $1 here }

    Since I have no MS DOS or descendent operation systems available here, I don't know exactly how the data looks, so I don't know if the regexes can be simplified or merged.

    A perlish way to transverse directories recursively is to use the core module File::Find, which eliminates the use for parsing dir /s output entirely, and makes the script more portable across operation systems.

Re: Writing compact conditional regex's
by ambrus (Abbot) on Feb 06, 2011 at 14:40 UTC

    This is not a direct answer, just some points you might want to know about.

    • There's nothing wrong with using two regexen. Don't try to be too clever.
    • However, you can probably unify the two regexes with something like (UNTESTED)
      if ($temp =~ /(.+?)(\d{1,2}x\d{1,2}|s\d{1,2}e\d{1,2})/ix) { $inventory{$1}=$2; }
    • You could also try substitutions like
      if (($temp=~s/(\d{1,2}x\d{1,2})//i) or ($temp=~s/(s\d{1,2}e\d{1,2})//i) ) { $inventory{$_}=$1; }
    • It seems that the second regex in your code is missing an opening parenthesis.
    • You may want to include a \z anchor to your regexen, because the way they are they could match an 1x1 marker inside the string and lose the part after it.
    • The same applies to the regex that removes .avi: it could match .avi in the middle of the string.
    • If you want to unify two regexen but they're not so simple, you may want to use named captures, such as
      $ perl -we 'for (qw"14/3 03-14") { m"^(?:(?<m>\d\d)-(?<d>\d\d)|(?<d>\d +{1,2})/(?<m>\d{1,2}))$" or die; print "$_ => month=$+{m}, day=$+{d}.\ +n"; }' 14/3 => month=3, day=14. 03-14 => month=03, day=14.
      If you didn't have named captures (such as before perl 5.10), you would have to write this in an uglier way:
      perl -we 'for (qw"14/3 03-14") { m"^(?:(\d\d)-(\d\d)|(\d{1,2})/(\d{1,2 +}))$" or die; my $m = $1||$4; my $d = $2||$3; print "$_ => month=$m, +day=$d.\n"; }'
      which is prone to errors if you have to change the regex later so the captures are renumbered.
    • Similarly sometimes the $+ special variable can help.
Re: Writing compact conditional regex's
by furry_marmot (Pilgrim) on Feb 06, 2011 at 20:29 UTC

    If I understand your first two lines correctly, you're trying to isolate just the filename, without '.avi'. You can use the /b switch, along with /s, to get a bare listing, and then parse out the filename on the fly.

    File::Find would also be an excellent choice, but I'm just commenting on the code you have.

    The two patterns are almost identical, so just anchor the second one, use a character class ([xe]), and you should be good.

    And lastly, you can put most of that into the for loop, and use the regexes on $_ instead of creating temporary variables.

    for( map{ s/\\*([^\\]+)\.avi\n/$1/; $_ } `cmd /c dir /s /b *.avi` ) { $inventory{$1} = $2 if /^(.*?)s\d{1,2}[xe]\d{1,2})/i); } # BROKEN OUT -- follow the numbers for( map{ s/\\*([^\\]+)\.avi\n/$1/; # 2: isolate filename between last # backslash (if any) and newli +ne $_ } # 3: pass on $_, not result of s// +/, which is the # number of replacements made `cmd /c dir /s /b *.avi` # 1: get avi files, including sub- +dirs. # dir is a shell (cmd) command +, so # you might need to call it th +is way. ) { $inventory{$1} = $2 if /^(.*?)s\d{1,2}[xe]\d{1,2})/i); }
    --marmot
Re: Writing compact conditional regex's
by Anonymous Monk on Feb 07, 2011 at 03:03 UTC
    An expert would probably use map here and your data might require both regexes but I would write it something like this:
    our $dir = `dir $movies\\*.avi /s`; our $inv; @_ = $dir =~ /,\d{3} ((.*?)\.avi)/gi; for (@_) { $inv->{$1} = $2 if /(.*?)s?(\d{1,2}[ex]\d{1,2})/i }