Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I wanted to develop a capability to generate new, unique filenames, so I thought that I would write a routine to do so. The idea is that it strips the input to its basename, then finds the next value in lexicographic order that a given alphabet can generate. Finally, it pastes the extension back on and sends it back. I took the prudent step of eliminating the letters that can easily be confused with numbers in this list. What I have seems to work, so what I'm looking for are improvements to the style, form, and robustness. This isn't looking very perlish yet. What follows is the caller, the routine, and stdout.

my @filenames = qw/a.png dpq.jpg zx.png ott.jpg zzz.tiff oi.png/; foreach (@filenames) { my $next = next_file($_); say "next is $next"; }
sub next_file { use strict; use warnings; use 5.010; my ($previous) = shift; my @alphabet = qw/a b c d e f g h j k m n p q r s t u v w x y z/; $previous =~ /(.*)\.(.*)/; my $word = $1; my $ext = $2; say "word is $word"; my @letters = split( //, $word ); my $z = 0; # main control outer loop...all indexes begin with 0 foreach my $dummy ( 0 .. $#letters ) { foreach my $index ( 0 .. $#alphabet ) { if ( $alphabet[$index] eq $letters[$dummy] ) { my $highest = $#alphabet; if ( $index < $highest ) { $letters[$dummy] = $alphabet[ $index + 1 ]; my $string = join( '', @letters, ".$ext" ); return $string; } else { say "it was a $alphabet[-1]"; $z = $z + 1; } } } } if ( $z eq ( $#letters + 1 ) ) { say "number of zees was $z"; my $newindex = $#letters + 1; my @newword; foreach ( 0 .. $newindex ) { $newword[$_] = $alphabet[0]; } my $newstring = join( '', @newword, ".$ext" ); return $newstring; } else { say "snakeeyes: you hit nothing with $word"; my $string3 = join( "", $alphabet[0], ".$ext" ); return $string3; } }
word is a next is b.png word is dpq next is epq.jpg word is zx it was a z next is zy.png word is ott next is out.jpg word is zzz it was a z it was a z it was a z number of zees was 3 next is aaaa.tiff word is oi snakeeyes: you hit nothing with oi next is a.png

Thanks for your comment.

Replies are listed 'Best First'.
Re: generating unique filenames
by GrandFather (Saint) on Oct 03, 2014 at 03:47 UTC

    You may like:

    use strict; use warnings; use 5.010; printf "%s -> %s\n", $_, next_file($_) for qw/a.png z.jpg az.png zaz.png ott.jpg zzz.tiff oi.png/; sub next_file { my ($previous) = @_; my ($name, $ext) = $previous =~ /(.*)\.(.*)/; if ($name !~ tr/ilo/jmp/) { $name = reverse $name; for my $chIdx (0 .. length($name) - 1) { substr($name, $chIdx, 1) =~ tr/abcdefghjkmnpqrstuvwxy/bcdefghjkmnpqrstuvwxyz/ and last; substr($name, $chIdx, 1) = 'a'; } $name = reverse $name; $name = 'a' x (1 + length $name) if $name =~ /^a+$/; } return "$name.$ext"; }

    Prints:

    a.png -> b.png z.jpg -> aa.jpg az.png -> ba.png zaz.png -> zba.png ott.jpg -> ptt.jpg zzz.tiff -> aaaa.tiff oi.png -> pj.png
    Perl is the programming world's equivalent of English

      Thanks all for replies. I've been doing all I could to follow the logic here, but I think I've got it.

      reverse is a chIdx is 0 a.png -> b.png reverse is z chIdx is 0 z.jpg -> aa.jpg reverse is za chIdx is 0 chIdx is 1 az.png -> ba.png reverse is zaz chIdx is 0 chIdx is 1 zaz.png -> zba.png ott.jpg -> ptt.jpg reverse is zzz chIdx is 0 chIdx is 1 chIdx is 2 zzz.tiff -> aaaa.tiff oi.png -> pj.png reverse is cba chIdx is 0 abc.jpg -> abd.jpg $
      sub start_magick { use strict; use warnings; use 5.010; printf "%s -> %s\n", $_, next_file($_) for qw/a.png z.jpg az.png zaz.png ott.jpg zzz.tiff oi.png abc.jp +g/; } sub next_file { use strict; use warnings; use 5.010; my ($previous) = @_; my ( $name, $ext ) = $previous =~ /(.*)\.(.*)/; if ( $name !~ tr/ilo/jmp/ ) { $name = reverse $name; say "reverse is $name"; for my $chIdx ( 0 .. length($name) - 1 ) { say "chIdx is $chIdx"; substr( $name, $chIdx, 1 ) =~ tr/abcdefghjkmnpqrstuvwxy/bcdefghjkmnpqrstuvwxyz/ and last; substr( $name, $chIdx, 1 ) = 'a'; } $name = reverse $name; $name = 'a' x ( 1 + length $name ) if $name =~ /^a+$/; } return "$name.$ext"; }

      I really like how this works, and it took me all day to figure it out. I like how he deals with the case of the triple z. He turns them all into a's with the tr operator, and since they're the only thing that can do that, he tests for that in his if condition, which, if true, uses the x operator to create what I had to have a full-on loop for. Also nice was how to deal with the case of having 'ilo' right in the test condition for the loop, with the tr operator again, and the loop is entered only if nothing transliterates. If something does, you skip the loop and you're done. Also, the double reverse has us working on the right end of things in order to be in proper lexicographic order as it's thought of in english, unlike mine, which was working left to right. Pretty nifty, all in all.

Re: generating unique filenames
by Athanasius (Archbishop) on Oct 03, 2014 at 02:48 UTC

    Hello Datz_cozee75,

    Here are some observations on coding style:

    1. The calling code can be reduced from 4 lines to 2:

      printf "next is %s\n", next_file($_) for qw( a.png dpq.jpg zx.png ott.jpg zzz.tiff oi.png );
    2. sub next_file { ... my @alphabet = qw/a b c d e f g h j k m n p q r s t u v w x y z/;

      creates and populates a new @alphabet on each call to sub next_file. Here is one way to avoid this:

      { my @alphabet; BEGIN { @alphabet = ('a' .. 'h', 'j', 'k', 'm', 'n', 'p' .. 'z'); +} sub next_file { ... } }

      (You could also use state, but as state variables can only be scalars, the variable would have to be an array reference, and you would then incur the additional overhead of a dereference on each access.)

    3. $previous =~ /(.*)\.(.*)/; my $word = $1; my $ext = $2;

      may be written more succinctly as:

      my ($word, $ext) = $previous =~ /(.*)\.(.*)/;
    4. $z = $z + 1; may be written more succinctly as ++$z;

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks, Athanasius, I've been trying to have @alphabet populated before I call next_file, but my caller is a hot mess. I have a main hash for this program, and what I try to do if I think data needs to be persistent over differing subroutine calls, is throw it onto the main hash. It usually starts out ugly, but eventually settles down once I've written the routines better. This is the first time I've tried to throw a reference to an array on this hash. Furthermore, I try to pass a reference to the main hash and a scalar to next_file, and what I have isn't initializing. I list all of start_magick and and then next_file to line 116 of this module, which is where stderr is telling me I'm not catching the passed variables properly.

      sub start_magick { use strict; use warnings; use 5.010; use Path::Class; use File::Copy "cp"; my ($rvars) = shift; my %vars = %$rvars; # this is supposed to be instantiated once my @alphabet = qw/a b c d e f g h j k m n p q r s t u v w x y z/; my $ref_alpha = \@alphabet; my $initial = $alphabet[0]; my $path1 = $vars{"to_images"}; my $path2 = $vars{"to_magick"}; #might need creating: my $return = mkdir($path2); say "return on mkdir is $return"; #additions to main hash $vars{"target"} = 100; #k $vars{"bias"} = 2; #k $vars{"pixel_min"} = 600; #k $vars{"previous_file"} = $initial; $vars{"ref_alpha"} = $ref_alpha; #get filenames minus directories my @basenames; opendir my $hh, $path1 or warn "warn $!\n"; while ( defined( $_ = readdir($hh) ) ) { next if ( $_ =~ m/^\./ ); say "default is $_"; push @basenames, $_; } @basenames = sort @basenames; for (@basenames){ my $file1 = file( $path1, $_ ); $_ =~ /(.*)\.(.*)/; my $ext = $2; my $word = join( '', $vars{"previous_file"}, '.', $ext ); my $next = next_file($rvars, $word); say "next is $next"; $next =~ /(.*)\.(.*)/; my $newword = $1; $vars{"previous_file"} = $newword; my $file2 = file( $path2, $word ); cp ("$file1","$file2"); } } sub next_file { use strict; use warnings; use 5.010; my ($rvars, $previous) = @_; my %vars = %$rvars; my $array_ref = $vars{"ref_alpha"};
      default is Screenshot from 2014-08-21 13:10:18.png default is Screenshot from 2014-09-25 17:14:08.png default is Screenshot from 2014-08-21 13:22:42.png default is zbears.jpg default is yjj.jpg Can't use an undefined value as an ARRAY reference at template_stuff/n +ibley1.pm line 116. $

      The loop over @basenames is this tortured munging that one resorts to when unable to pass arguments well. Thanks for your comment.

        Hello Datz_cozee75,

        Consider what the following assignments do:

        my ($rvars) = shift; my %vars = %$rvars;

        The first line shifts an argument off the call stack and copies it to the lexical variable $rvars. So far, so good. But then the second line dereferences $rvars as a hash (using the %{ ... } dereferencing syntax), and copies the contents of that hash into the new lexical hash variable %vars. From this point on, %vars is a local copy of what was in %$vars when the subroutine was called. Any changes made to %vars remain local to the subroutine, and have no effect on the contents of the original hash. So when next_file() is called, it receives $rvars as its first argument, but the hash to which that variable refers has no key named ref_alpha, so the attempt to access it on line 116 fails and produces the error message you are seeing.

        But why do you want to make a copy of the hash %$rvars? Just remove the variable %vars, and access the “master hash” directly:

        my ($rvars) = shift; ... my $path1 = $vars->{"to_images"}; my $path2 = $vars->{"to_magick"}; ...

        Study perlreftut and get comfortable with Perl’s dereferencing syntax. It can be quite confusing at first, but if you persevere there will come a point where it will start to “click”, and then you’ll never look back!

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: generating unique filenames (next letter in list)
by Anonymous Monk on Oct 03, 2014 at 02:30 UTC

    Um, why (what is the use case)? Why is it prudent to eliminate letters that can be easily confused with numbers? Wouldn't a good font solve that problem?

    Most of the time I want this

    $ perl -le " my @f =qw/ abc def gh00 /; for my $f ( @f ){ print qq/\n +$f/; for(1..3){ $f++; print $f; } }" abc abd abe abf def deg deh dei gh00 gh01 gh02 gh03

    The rest of the time I want a UUID

      I stopped improving nixtFile after seeing the results (the rules I don't get)

      #!/usr/bin/perl -- ## ## ## ## perltidy -olq -csc -csci=3 -cscl="sub : BEGIN END " -otr -opr -ce +-nibc -i=4 -pt=0 "-nsak=*" ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if " -otr -opr +-ce -nibc -i=4 -pt=0 "-nsak=*" ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if while " -otr + -opr -ce -nibc -i=4 -pt=0 "-nsak=*" #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw/ dd /; exit newTest(); sub newTest { my @list; my @filenames = qw/a.png dpq.jpg zx.png ott.jpg zzz.tiff oi.png/; for my $curr ( @filenames ) { #~ my $next = next_file( $curr ); #~ my $nixt = nixtFile( $curr ); my $next = next_file( "$curr" ); my $nixt = nixtFile( "$curr" ); my $eq = int( $next eq $nixt ); dd( [ $curr, $eq, $next, $nixt ] ); push @list, [ $curr, $eq, $next, $nixt ]; } dd( \@list ); } ## end sub newTest sub nixtFile { my( $word ) = shift; my( $file, $ext ) = $word =~ m{(.+?)(\.[^.]+)?$}; #~ dd( $file, $ext ); my @alphabet = qw/a b c d e f g h j k m n p q r s t u v w x y z/; my %next = map { my $ix = ( $_ + 1 ) % @alphabet; @alphabet[ $_, $ix ]; } 0 .. $#alphabet; #~ die dd( \%next ); my $newfile = ''; for my $char ( split //, $file ) { #~ my $newchar = $next{$char} || 'FUDGE'; my $newchar = $next{$char} || $char; $newfile .= $newchar; } return $newfile . $ext; } ## end sub nixtFile sub next_file { use strict; use warnings; use 5.010; my ($previous) = shift; my @alphabet = qw/a b c d e f g h j k m n p q r s t u v w x y z/; $previous =~ /(.*)\.(.*)/; my $word = $1; my $ext = $2; say "word is $word"; my @letters = split( //, $word ); my $z = 0; # main control outer loop...all indexes begin with 0 foreach my $dummy ( 0 .. $#letters ) { foreach my $index ( 0 .. $#alphabet ) { if ( $alphabet[$index] eq $letters[$dummy] ) { my $highest = $#alphabet; if ( $index < $highest ) { $letters[$dummy] = $alphabet[ $index + 1 ]; my $string = join( '', @letters, ".$ext" ); return $string; } else { say "it was a $alphabet[-1]"; $z = $z + 1; } } } } if ( $z eq ( $#letters + 1 ) ) { say "number of zees was $z"; my $newindex = $#letters + 1; my @newword; foreach ( 0 .. $newindex ) { $newword[$_] = $alphabet[0]; } my $newstring = join( '', @newword, ".$ext" ); return $newstring; } else { say "snakeeyes: you hit nothing with $word"; my $string3 = join( "", $alphabet[0], ".$ext" ); return $string3; } } __END__ word is a ["a.png", 1, "b.png", "b.png"] word is dpq ["dpq.jpg", 0, "epq.jpg", "eqr.jpg"] word is zx it was a z ["zx.png", 0, "zy.png", "ay.png"] word is ott ["ott.jpg", 0, "out.jpg", "ouu.jpg"] word is zzz it was a z it was a z it was a z number of zees was 3 ["zzz.tiff", 0, "aaaa.tiff", "aaa.tiff"] word is oi snakeeyes: you hit nothing with oi ["oi.png", 0, "a.png", "oi.png"] [ ["a.png", 1, "b.png", "b.png"], ["dpq.jpg", 0, "epq.jpg", "eqr.jpg"], ["zx.png", 0, "zy.png", "ay.png"], ["ott.jpg", 0, "out.jpg", "ouu.jpg"], ["zzz.tiff", 0, "aaaa.tiff", "aaa.tiff"], ["oi.png", 0, "a.png", "oi.png"], ]
Re: generating unique filenames
by karlgoethebier (Abbot) on Oct 03, 2014 at 11:21 UTC
Re: generating unique filenames
by McA (Priest) on Oct 04, 2014 at 00:30 UTC

    Hi all,

    besides the different code examples for creating "the next" filename I was wondering why the function should be restricted to filenames with file extension?

    Why not allow filenames without file extension?

    What is the behaviour with filenames which have two extensions like bundle.tar.gz? In the latter case I would expect only bundle to be incremented but not tar, destroying the meaning of the filename extensions.

    What do you think of just using an very old core module: File::Basename?

    use File::Basename qw(fileparse); ... my ($word, $path, $ext) = fileparse($filename, qr/\..*/);

    and suddenly you would also have the possibility to create "the next" file in relative and absolute file names. Wouldn't that be a valuable generalisation of the API?

    Best regards
    McA

Re: generating unique filenames
by Anonymous Monk on Oct 04, 2014 at 01:00 UTC

    It doesn't really matter whether code is "Perlish" as long as it's maintainable and it works - as determined by test cases! That'd be my main suggestion, Test::More and friends.

    You could also look into using Perl's magic string increment, as mentioned by the other anon above.

    Another interesting approach is the "odometer" analogy from Chapter 4 of Mark Jason Dominus' Higher-Order Perl - a highly enlightening book overall!