generating unique filenames

Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: generating unique filenames
by GrandFather (Saint) on Oct 03, 2014 at 03:47 UTC

You may like:

use strict;
use warnings;
use 5.010;

printf "%s -> %s\n", $_, next_file($_)
    for qw/a.png z.jpg az.png zaz.png ott.jpg zzz.tiff oi.png/;


sub next_file {
    my ($previous) = @_;
    my ($name, $ext) = $previous =~ /(.*)\.(.*)/;

    if ($name !~ tr/ilo/jmp/) {
        $name = reverse $name;

        for my $chIdx (0 .. length($name) - 1) {
            substr($name, $chIdx, 1) =~
                tr/abcdefghjkmnpqrstuvwxy/bcdefghjkmnpqrstuvwxyz/
                and last;
            substr($name, $chIdx, 1) = 'a';
        }

        $name = reverse $name;
        $name = 'a' x (1 + length $name) if $name =~ /^a+$/;
    }

    return "$name.$ext";
}
[download]

Prints:

a.png -> b.png
z.jpg -> aa.jpg
az.png -> ba.png
zaz.png -> zba.png
ott.jpg -> ptt.jpg
zzz.tiff -> aaaa.tiff
oi.png -> pj.png
[download]

Perl is the programming world's equivalent of English

[reply]
[d/l]
[select]

Re^2: generating unique filenames

by Aldebaran (Curate) on Oct 04, 2014 at 00:46 UTC

Thanks all for replies. I've been doing all I could to follow the logic here, but I think I've got it.

 
reverse is a
chIdx is 0
a.png -> b.png
reverse is z
chIdx is 0
z.jpg -> aa.jpg
reverse is za
chIdx is 0
chIdx is 1
az.png -> ba.png
reverse is zaz
chIdx is 0
chIdx is 1
zaz.png -> zba.png
ott.jpg -> ptt.jpg
reverse is zzz
chIdx is 0
chIdx is 1
chIdx is 2
zzz.tiff -> aaaa.tiff
oi.png -> pj.png
reverse is cba
chIdx is 0
abc.jpg -> abd.jpg
$
[download]

sub start_magick {
    use strict;
    use warnings;
    use 5.010;
    printf "%s -> %s\n", $_, next_file($_)
      for qw/a.png z.jpg az.png zaz.png ott.jpg zzz.tiff oi.png abc.jp
+g/;
}

sub next_file {
    use strict;
    use warnings;
    use 5.010;
    my ($previous) = @_;
    my ( $name, $ext ) = $previous =~ /(.*)\.(.*)/;
    if ( $name !~ tr/ilo/jmp/ ) {
        $name = reverse $name;
        say "reverse is $name";
        for my $chIdx ( 0 .. length($name) - 1 ) {
            say "chIdx is $chIdx";
            substr( $name, $chIdx, 1 ) =~
              tr/abcdefghjkmnpqrstuvwxy/bcdefghjkmnpqrstuvwxyz/
              and last;
            substr( $name, $chIdx, 1 ) = 'a';
        }
        $name = reverse $name;
        $name = 'a' x ( 1 + length $name ) if $name =~ /^a+$/;
    }
    return "$name.$ext";
}
[download]

I really like how this works, and it took me all day to figure it out. I like how he deals with the case of the triple z. He turns them all into a's with the tr operator, and since they're the only thing that can do that, he tests for that in his if condition, which, if true, uses the x operator to create what I had to have a full-on loop for. Also nice was how to deal with the case of having 'ilo' right in the test condition for the loop, with the tr operator again, and the loop is entered only if nothing transliterates. If something does, you skip the loop and you're done. Also, the double reverse has us working on the right end of things in order to be in proper lexicographic order as it's thought of in english, unlike mine, which was working left to right. Pretty nifty, all in all.

[reply]
[d/l]
[select]

Re: generating unique filenames
by Athanasius (Archbishop) on Oct 03, 2014 at 02:48 UTC

Hello Datz_cozee75,

Here are some observations on coding style:

The calling code can be reduced from 4 lines to 2:

printf "next is %s\n", next_file($_)
    for qw( a.png dpq.jpg zx.png ott.jpg zzz.tiff oi.png );
[download]

```
sub next_file {
    ...
    my @alphabet = qw/a b c d e f g h j k m n p q r s t u v w x y z/;
[download]
```
creates and populates a new @alphabet on each call to sub next_file. Here is one way to avoid this:
```
{
    my @alphabet;

    BEGIN { @alphabet = ('a' .. 'h', 'j', 'k', 'm', 'n', 'p' .. 'z'); 
+}

    sub next_file
    {
        ...
    }
}
[download]
```
(You could also use state, but as state variables can only be scalars, the variable would have to be an array reference, and you would then incur the additional overhead of a dereference on each access.)

$previous =~ /(.*)\.(.*)/;
my $word = $1;
my $ext  = $2;
[download]

may be written more succinctly as:

my ($word, $ext) = $previous =~ /(.*)\.(.*)/;
[download]

$z = $z + 1; may be written more succinctly as ++$z;

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: generating unique filenames

by Aldebaran (Curate) on Oct 05, 2014 at 19:38 UTC

Thanks, Athanasius, I've been trying to have @alphabet populated before I call next_file, but my caller is a hot mess. I have a main hash for this program, and what I try to do if I think data needs to be persistent over differing subroutine calls, is throw it onto the main hash. It usually starts out ugly, but eventually settles down once I've written the routines better. This is the first time I've tried to throw a reference to an array on this hash. Furthermore, I try to pass a reference to the main hash and a scalar to next_file, and what I have isn't initializing. I list all of start_magick and and then next_file to line 116 of this module, which is where stderr is telling me I'm not catching the passed variables properly.

sub start_magick {
    use strict;
    use warnings;
    use 5.010;
    use Path::Class;
    use File::Copy "cp";

    my ($rvars) = shift;
    my %vars = %$rvars;

    # this is supposed to be instantiated once
    my @alphabet  = qw/a b c d e f g h j k m n p q r s t u v w x y z/;
    my $ref_alpha = \@alphabet;
    my $initial   = $alphabet[0];

    my $path1 = $vars{"to_images"};
    my $path2 = $vars{"to_magick"};

    #might need creating:
    my $return = mkdir($path2);
    say "return on mkdir is $return";

    #additions to main hash
    $vars{"target"}        = 100;          #k
    $vars{"bias"}          = 2;            #k
    $vars{"pixel_min"}     = 600;          #k
    $vars{"previous_file"} = $initial;
    $vars{"ref_alpha"}     = $ref_alpha;

    #get filenames minus directories
    my @basenames;
    opendir my $hh, $path1 or warn "warn  $!\n";
    while ( defined( $_ = readdir($hh) ) ) {
        next if ( $_ =~ m/^\./ );
        say "default is $_";
        push @basenames, $_;
    }
    @basenames = sort @basenames;
    for (@basenames){
        my $file1 = file( $path1, $_ );
        $_ =~ /(.*)\.(.*)/;
        my $ext  = $2;
        my $word = join( '', $vars{"previous_file"}, '.', $ext );
        my $next = next_file($rvars, $word);
        say "next is $next";
        $next =~ /(.*)\.(.*)/;
        my $newword = $1;
        $vars{"previous_file"} = $newword;
        my $file2 = file( $path2, $word );
        cp ("$file1","$file2");
}
}

sub next_file {
    use strict;
    use warnings;
    use 5.010;

    my ($rvars, $previous) = @_;
    my %vars = %$rvars;
    my $array_ref = $vars{"ref_alpha"};
[download]

default is Screenshot from 2014-08-21 13:10:18.png
default is Screenshot from 2014-09-25 17:14:08.png
default is Screenshot from 2014-08-21 13:22:42.png
default is zbears.jpg
default is yjj.jpg
Can't use an undefined value as an ARRAY reference at template_stuff/n
+ibley1.pm line 116.
$
[download]

The loop over @basenames is this tortured munging that one resorts to when unable to pass arguments well. Thanks for your comment.

[reply]
[d/l]
[select]

Re^3: generating unique filenames

by Athanasius (Archbishop) on Oct 06, 2014 at 02:58 UTC

Hello Datz_cozee75,

Consider what the following assignments do:

my ($rvars) = shift;
my %vars = %$rvars;
[download]

The first line shifts an argument off the call stack and copies it to the lexical variable $rvars. So far, so good. But then the second line dereferences $rvars as a hash (using the %{ ... } dereferencing syntax), and copies the contents of that hash into the new lexical hash variable %vars. From this point on, %vars is a local copy of what was in %$vars when the subroutine was called. Any changes made to %vars remain local to the subroutine, and have no effect on the contents of the original hash. So when next_file() is called, it receives $rvars as its first argument, but the hash to which that variable refers has no key named ref_alpha, so the attempt to access it on line 116 fails and produces the error message you are seeing.

But why do you want to make a copy of the hash %$rvars? Just remove the variable %vars, and access the “master hash” directly:

my ($rvars) = shift;
...
my $path1 = $vars->{"to_images"};
my $path2 = $vars->{"to_magick"};
...
[download]

Study perlreftut and get comfortable with Perl’s dereferencing syntax. It can be quite confusing at first, but if you persevere there will come a point where it will start to “click”, and then you’ll never look back!

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: generating unique filenames (next letter in list)
by Anonymous Monk on Oct 03, 2014 at 02:30 UTC

Um, why (what is the use case)? Why is it prudent to eliminate letters that can be easily confused with numbers? Wouldn't a good font solve that problem?

Most of the time I want this

 $ perl -le " my @f =qw/ abc def gh00 /; for my $f ( @f ){ print qq/\n
+$f/; for(1..3){ $f++; print $f; } }"

abc
abd
abe
abf

def
deg
deh
dei

gh00
gh01
gh02
gh03
[download]

The rest of the time I want a UUID

[reply]
[d/l]

Re^2: generating unique filenames (next letter in list)

by Anonymous Monk on Oct 03, 2014 at 03:06 UTC

I stopped improving nixtFile after seeing the results (the rules I don't get)

#!/usr/bin/perl --
## 
## 
## 
## perltidy -olq  -csc -csci=3 -cscl="sub : BEGIN END " -otr -opr -ce 
+-nibc -i=4 -pt=0   "-nsak=*"
## perltidy -olq  -csc -csci=10 -cscl="sub : BEGIN END if " -otr -opr 
+-ce -nibc -i=4 -pt=0   "-nsak=*"
## perltidy -olq  -csc -csci=10 -cscl="sub : BEGIN END if while " -otr
+ -opr -ce -nibc -i=4 -pt=0   "-nsak=*"
#!/usr/bin/perl --
use strict;
use warnings;
use Data::Dump qw/ dd /;

exit newTest();

sub newTest {
    my @list;
    my @filenames = qw/a.png dpq.jpg zx.png ott.jpg zzz.tiff  oi.png/;
    for my $curr ( @filenames ) {

#~         my $next = next_file( $curr );
#~         my $nixt = nixtFile( $curr );
        my $next = next_file( "$curr" );
        my $nixt = nixtFile( "$curr" );
        my $eq   = int( $next eq $nixt );
        dd( [ $curr, $eq, $next, $nixt ] );
        push @list, [ $curr, $eq, $next, $nixt ];
    }
    dd( \@list );
} ## end sub newTest

sub nixtFile {
    my( $word ) = shift;
    my( $file, $ext ) = $word =~ m{(.+?)(\.[^.]+)?$};

#~     dd( $file, $ext );
    my @alphabet = qw/a b c d e f g h j k m n p q r s t u v w x y z/;
    my %next = map {
        my $ix = ( $_ + 1 ) % @alphabet;
        @alphabet[ $_, $ix ];
    } 0 .. $#alphabet;

#~     die dd( \%next );

    my $newfile = '';
    for my $char ( split //, $file ) {
#~         my $newchar = $next{$char} || 'FUDGE';
        my $newchar = $next{$char} || $char;
        $newfile .= $newchar;
    }
    return $newfile . $ext;
} ## end sub nixtFile

sub next_file {
    use strict;
    use warnings;
    use 5.010;
    my ($previous) = shift;
    my @alphabet = qw/a b c d e f g h j k m n p q r s t u v w x y z/;
    $previous =~ /(.*)\.(.*)/;
    my $word = $1;
    my $ext  = $2;
    say "word is $word";
    my @letters = split( //, $word );
    my $z = 0;
    # main control outer loop...all indexes begin with 0
    foreach my $dummy ( 0 .. $#letters ) {
        foreach my $index ( 0 .. $#alphabet ) {
            if ( $alphabet[$index] eq $letters[$dummy] ) {
                my $highest = $#alphabet;
                if ( $index < $highest ) {
                    $letters[$dummy] = $alphabet[ $index + 1 ];
                    my $string = join( '', @letters, ".$ext" );
                    return $string;
                }
                else {
                    say "it was a $alphabet[-1]";
                    $z = $z + 1;
                }
            }
        }
    }
    if ( $z eq ( $#letters + 1 ) ) {
        
        say "number of zees was $z";
        my $newindex = $#letters + 1;
        my @newword;
        foreach ( 0 .. $newindex ) {
            $newword[$_] = $alphabet[0];
        }
        my $newstring = join( '', @newword, ".$ext" );
        return $newstring;
    }
    else {
        say "snakeeyes: you hit nothing with $word";
        my $string3 = join( "", $alphabet[0], ".$ext" );
        return $string3;
    }
}

__END__
word is a
["a.png", 1, "b.png", "b.png"]
word is dpq
["dpq.jpg", 0, "epq.jpg", "eqr.jpg"]
word is zx
it was a z
["zx.png", 0, "zy.png", "ay.png"]
word is ott
["ott.jpg", 0, "out.jpg", "ouu.jpg"]
word is zzz
it was a z
it was a z
it was a z
number of zees was 3
["zzz.tiff", 0, "aaaa.tiff", "aaa.tiff"]
word is oi
snakeeyes: you hit nothing with oi
["oi.png", 0, "a.png", "oi.png"]
[
  ["a.png", 1, "b.png", "b.png"],
  ["dpq.jpg", 0, "epq.jpg", "eqr.jpg"],
  ["zx.png", 0, "zy.png", "ay.png"],
  ["ott.jpg", 0, "out.jpg", "ouu.jpg"],
  ["zzz.tiff", 0, "aaaa.tiff", "aaa.tiff"],
  ["oi.png", 0, "a.png", "oi.png"],
]
[download]

[reply]
[d/l]

Re: generating unique filenames
by karlgoethebier (Abbot) on Oct 03, 2014 at 11:21 UTC

Regards, Karl

�The Crux of the Biscuit is the Apostrophe�

[reply]

Re: generating unique filenames
by McA (Priest) on Oct 04, 2014 at 00:30 UTC

Hi all,

besides the different code examples for creating "the next" filename I was wondering why the function should be restricted to filenames with file extension?

Why not allow filenames without file extension?

What is the behaviour with filenames which have two extensions like bundle.tar.gz? In the latter case I would expect only bundle to be incremented but not tar, destroying the meaning of the filename extensions.

What do you think of just using an very old core module: File::Basename?

use File::Basename qw(fileparse);
...
my ($word, $path, $ext) = fileparse($filename, qr/\..*/);
[download]

and suddenly you would also have the possibility to create "the next" file in relative and absolute file names. Wouldn't that be a valuable generalisation of the API?

Best regards
McA

[reply]
[d/l]
[select]

Re: generating unique filenames
by Anonymous Monk on Oct 04, 2014 at 01:00 UTC

It doesn't really matter whether code is "Perlish" as long as it's maintainable and it works - as determined by test cases! That'd be my main suggestion, Test::More and friends.

You could also look into using Perl's magic string increment, as mentioned by the other anon above.

Another interesting approach is the "odometer" analogy from Chapter 4 of Mark Jason Dominus' Higher-Order Perl - a highly enlightening book overall!

[reply]