Filename Parsing

The_Jesus has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Filename Parsing
by liz (Monsignor) on Oct 03, 2003 at 14:51 UTC

my %filename;
while (<DATA>) {
    warn "No match for $_" unless m#^(\w+?\d+)#;
    $filename{$1}++;
}
print "$_\n" foreach sort keys %filename;
__DATA__
smith_13_503_de7.p
smith_13_502_de7.v
jones_104_503_de7.p
jones_104_502_de7.v
[download]

Please note the use of +? to prevent \w from being greedy and eating everything until the period.

Liz

[reply]
[d/l]

Re: Filename Parsing
by Roger (Parson) on Oct 03, 2003 at 15:02 UTC

use strict;
my %fnames; # use a hash to eliminate duplicates
while (<DATA>) {
   if (/^(\w+_\d+)_\d+\w+/) {
       $fnames{$1} = 1;
   }
}
foreach (keys %fnames) {
   print "$_\n";
}

__DATA__
smith_13_503_de7.p
smith_13_502_de7.v
jones_104_503_de7.p
jones_104_502_de7.v
[download]

[reply]
[d/l]

Re: Re: Filename Parsing

by The_Jesus (Novice) on Oct 03, 2003 at 15:32 UTC

Thank you works great!

[reply]

Re: Filename Parsing
by mirod (Canon) on Oct 03, 2003 at 15:22 UTC

Here are some functions that will do this, but more important maybe, a harness that will let you try your own:

#!/usr/bin/perl -w
use strict; 

use Test::More qw(no_plan);

my @data             = read_data();
my @expected_results = read_data();

foreach my $name qw(cut_join_split cut_regexp cut_substr)
  { no strict 'refs';
    my $func= \&$name;
    my @results= map { $func->($_) } @data;
    ok( eq_array(\@results, \@expected_results), $name)
      or diag( "expected: " . join( ' - ', @expected_results) . "\n",
               "got     : " . join( ' - ', @results) . "\n");
  }
   


sub cut_join_split
  { my $filename= shift;
    return join( '_', (split /_/, $filename)[0..1]);
  }

sub cut_regexp
  { my $filename= shift;
    return $filename=~ m{^([^_]*_[^_]*)};
  }

sub cut_substr
  { my $filename= shift;
    my( $index, $sep, $sep_number)= ( 0, '_', 2); # to find _ number 2
    foreach (1..$sep_number)
      { $index = index( $filename, $sep, $index+1); }
    return substr( $filename, 0, $index);
  }


sub read_data
  { local $/="\n\n"; return grep {$_} map { s{#.*}{}; chomp; $_ } spli
+t( "\n", <DATA>); }


__DATA__
# data
smith_13_503_de7.p
smith_13_502_de7.v
jones_104_503_de7.p
jones_104_502_de7.v
# you might want to add more filenames to your test set

# expected results
smith_13
smith_13
jones_104
jones_104
[download]

[reply]
[d/l]

Re: Filename Parsing
by davido (Cardinal) on Oct 03, 2003 at 15:22 UTC

liz

if ( $string =~ /^([^\W\d_]+_\d+)/ ) {
    print "Matched $1\n";
}
[download]

Because the metacharacter \w embodies [A-Za-z_0-9], and \W is the same as the negated character class, "[^A-Za-z_0-9]", and because \d embodies [0-9], what the negated character class "[^\W\d]" does is it says, "Match anything that's NOT a non-word, and not a digit." The regexp engine then continues matching the first underscore and all subsequent digits, stopping when you reach something that's not a digit.

Also, if your question was about how to obtain the contents of the directory in the first place, I happen to like the standard module File::Find. It's POD is available at your local Perl installation.

Dave

"If I had my life to do over again, I'd be a plumber." -- Albert Einstein

[reply]
[d/l]
[select]

Re: Filename Parsing
by blue_cowdawg (Monsignor) on Oct 03, 2003 at 15:26 UTC

What is the best way to do this in perl?

Best way? I don't know if there is a "best way", just lots of ways to do it.

Here is my solution:

#!/usr/bin/perl -w
use strict;
use warnings;
use diagnostics;
                                                                      
+          
my %fn=();
while (my $fnam=<DATA>) {
  chomp $fnam;
  next unless $fnam =~ m@
                       ^(            # Begin capture $1
                         [a-zA-Z]+   # One or more alphas
                         \_          # followed by underscore
                         \d+         # one or more digits
                        )            # End capture $1
                        .*$          # Who cares about the rest
                       @x;
  $fn{$1}++;
}
                                                                      
+          
printf "%s\n",join("\n",sort keys %fn);
exit(0);
__END__
smith_13_503_de7.p
smith_13_502_de7.v
jones_104_503_de7.p
jones_104_502_de7.v
[download]

When run it produces:

--$  perl fnamParse.pl
jones_104
smith_13
--$
[download]

Peter L. Berghold -- Unix Professional Peter at Berghold dot Net
	Dog trainer, dog agility exhibitor, brewer of fine Belgian style ales. Happiness is a warm, tired, contented dog curled up at your side and a good Belgian ale in your chalice.

[reply]
[d/l]
[select]