comment on

Depending on what you consider "similar enough", and how much external knowledge you want to invest into the system, the following case comes relatively close for your sample data::

use strict;
my @names = sort map { chomp; $_ } <DATA>;

my $len = 2; # adjust to suit your taste

my @bucket;
FILL: {
  my $prefix = common_prefix(@bucket, $names[0]);
  if (length $prefix >= $len) {
    push @bucket, shift @names;
  } else {
    print common_prefix(@bucket),"\n";
    print join("\n", map { "-- $_" } @bucket), "\n";
    @bucket = ();
  };

  redo FILL while (@names)
};
print common_prefix(@bucket)
  if @bucket;

=head2 C<< common_prefix LIST >>

Extracts the common prefix out of a list of strings. The strings
may not contain the character C<\x00> because I'm lazy.

=cut

sub common_prefix {
  local $" = "\x00";
  "@_" =~ m!^([^\x00]*)[^\x00]*(\0\1[^\x00]*)*$!sm
    or die "Internal error: '@_' does not match the RE";
  $1;
};

__DATA__
U2 - October
U2 - Rattle and Hum
U2 - The Joshua Tree
Talking Heads - Sand In The Vaseline - Disc 1
Talking Heads - Sand In The Vaseline - Disc 2
[download]

Making $len larger than 4 will break for the case of "U2 -", and it might well be simpler to invest the knowledge that all directories are of the format $ARTIST - $ALBUM, and to split up that list and then simplify it. But for a braindead approach this script does well enough and gave me a nice situation to employ a regular expression... Of course, without the external knowledge, the pattern matching is not really good, as you see in the case of Disc 1 vs. Disc 2, where the common prefix is Disc; a human would have left off the whole thing.

In reply to Re: finding groups in a text list by pattern? by Corion
in thread finding groups in a text list by pattern? by howie

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.