Hossein has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a list of files.

These files have some mutual frases in their name. It varies from dir to dir.

I need to guess what these mutual frases/patterns are in order to remove them.

Example files: hhk_1sss-(hello).txt hhk_2abc-(hello).txt hhk_3xyz-(hello).txt I need the results (new file names) to be: 1sss.txt 2abc.txt 3xyz.txt

Thanks :)

Replies are listed 'Best First'.
Re: how to guess mutual frases?
by QM (Parson) on Jul 30, 2013 at 09:20 UTC
    Did you write some code? We'd be glad to critique it and give pointers.

    Thinking fuzzily before sufficient coffee, you want something like longest common substring.

    Another way to look at it is to join the names, and match that way. Here's an untried, unsophisticated approach:

    ... # get the list of filenames (somehow) my @files = readdir; # make a copy my @files_copy = @files; # the number of files, less 1 my $file_count_less_one = $#files; # hash map for results my %new_file_names; # loop until explicit last while (1) { # join them with a character that's unlikely to be in the names my $joined = join('|',@files_copy); # look for matches, catch the first one only (they're all the same +) if (my ($match) = $joined =~ m/[^|]*?([^|]+)[^|]*?(?:[|][^|]*?\1[^ +|]*?){$file_count_less_one/) { # remove the matched substrings my @files_new; for my $file (@files_copy) { $file =~ s/$match//; push @files_new, $file; } # get ready for next loop @files_copy = @files_new; next; } else { # no more matches, make a hash map for the rename @new_file_names{@files} = @files_copy; last; # no more matches } } while (my ($old,$new) = each %new_file_names) { unless (rename $old, $new) { warn "Error renaming $old to $new"; } }

    Lots of room for improvement there, including checking whether a new file name already exists, aliasing array elements in for loops, considering dir names in file names, etc.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: how to guess mutual frases?
by choroba (Cardinal) on Jul 30, 2013 at 11:38 UTC
    If the file names are of the same length, you can just walk over them character by character and keep only those that are different.
    #!/usr/bin/perl use warnings; use strict; my @files = qw( hhk_1sss-(hello).txt hhk_2abc-(hello).txt hhk_3xyz-(hello).txt ); my @short = @files; my $pos = 0; my $file = shift @short; while ($pos < length $file and '.' ne substr $file, $pos, 1) { my $char = substr $file, $pos, 1; if (@short == grep $char eq substr($_, $pos, 1), @short) { substr $_, $pos, 1, q() for $file, @short; } else { $pos++; } } unshift @short, $file; while (@files) { my $old = shift @files; my $new = shift @short; print "rename $old $new\n"; }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: how to guess mutual frases?
by arkturuz (Curate) on Jul 30, 2013 at 09:00 UTC
    You could use regexp to extract names:
    my @files = (qw( hhk_1sss-(hello).txt hhk_2abc-(hello).txt hhk_3xyz-(hello).txt )); my @new_files = map { /^.*_(\d[a-z]{3}).*(\.txt)$/; "$1$2" } @files;
Re: how to guess mutual (f)phrases?
by kcott (Archbishop) on Jul 31, 2013 at 10:41 UTC

    G'day Hossein,

    Here's my take on a solution. It involves no guesswork; doesn't require filenames to be of the same length; and preserves the suffixes (which don't need to be the same or, in fact, even present). It only checks for common parts at the beginning and end of the filename: it won't try to remove any common strings from the middle.

    #!/usr/bin/env perl -l use strict; use warnings; use File::Basename; my @paths = qw{hhk_1sss-(hello).txt hhk_2abc-(hello).txt hhk_3xyz-(hel +lo).txt}; my @split_paths = map { [ fileparse($_ => qr{[.]?[^.]*}) ] } @paths; my @names = map { $split_paths[$_][0] } 0 .. $#split_paths; my $common_start = get_common(@names); my $common_end = reverse get_common(map { scalar reverse } @names); for (@split_paths) { print +($_->[0] =~ /^\Q$common_start\E(.*)\Q$common_end\E$/)[0], $ +_->[2]; } sub get_common { my ($control, $compare, @rest) = @_; my $common = ''; my ($control_len, $compare_len) = map { length } $control, $compar +e; for (0 .. ($control_len < $compare_len ? $control_len : $compare_l +en) - 1) { my $control_char = substr $control, $_, 1; last if $control_char ne substr $compare, $_, 1; $common .= $control_char; } return @rest ? get_common($common, @rest) : $common; }

    Sample run:

    $ pm_ex_same_strings.pl 1sss.txt 2abc.txt 3xyz.txt

    If you want to use pathnames with directories, just change "print +(..." to "print $_->[1], (...".

    -- Ken