The_Jesus has asked for the wisdom of the Perl Monks concerning the following question:

I have a directory filled with medical image files. They look like the following:

smith_13_503_de7.p
smith_13_502_de7.v
jones_104_503_de7.p
jones_104_502_de7.v
etc....

I need to parse that filename so that I just have:

smith_13
jones_104

What is the best way to do this in perl? I need something similar to "cut" in a shell script. Thanks in advance.

Replies are listed 'Best First'.
Re: Filename Parsing
by liz (Monsignor) on Oct 03, 2003 at 14:51 UTC
    Something like:
    my %filename; while (<DATA>) { warn "No match for $_" unless m#^(\w+?\d+)#; $filename{$1}++; } print "$_\n" foreach sort keys %filename; __DATA__ smith_13_503_de7.p smith_13_502_de7.v jones_104_503_de7.p jones_104_502_de7.v

    Please note the use of +? to prevent \w from being greedy and eating everything until the period.

    Liz

Re: Filename Parsing
by Roger (Parson) on Oct 03, 2003 at 15:02 UTC
    Keep it nice and simple...
    use strict; my %fnames; # use a hash to eliminate duplicates while (<DATA>) { if (/^(\w+_\d+)_\d+\w+/) { $fnames{$1} = 1; } } foreach (keys %fnames) { print "$_\n"; } __DATA__ smith_13_503_de7.p smith_13_502_de7.v jones_104_503_de7.p jones_104_502_de7.v
      Thank you works great!
Re: Filename Parsing
by mirod (Canon) on Oct 03, 2003 at 15:22 UTC

    Here are some functions that will do this, but more important maybe, a harness that will let you try your own:

    #!/usr/bin/perl -w use strict; use Test::More qw(no_plan); my @data = read_data(); my @expected_results = read_data(); foreach my $name qw(cut_join_split cut_regexp cut_substr) { no strict 'refs'; my $func= \&$name; my @results= map { $func->($_) } @data; ok( eq_array(\@results, \@expected_results), $name) or diag( "expected: " . join( ' - ', @expected_results) . "\n", "got : " . join( ' - ', @results) . "\n"); } sub cut_join_split { my $filename= shift; return join( '_', (split /_/, $filename)[0..1]); } sub cut_regexp { my $filename= shift; return $filename=~ m{^([^_]*_[^_]*)}; } sub cut_substr { my $filename= shift; my( $index, $sep, $sep_number)= ( 0, '_', 2); # to find _ number 2 foreach (1..$sep_number) { $index = index( $filename, $sep, $index+1); } return substr( $filename, 0, $index); } sub read_data { local $/="\n\n"; return grep {$_} map { s{#.*}{}; chomp; $_ } spli +t( "\n", <DATA>); } __DATA__ # data smith_13_503_de7.p smith_13_502_de7.v jones_104_503_de7.p jones_104_502_de7.v # you might want to add more filenames to your test set # expected results smith_13 smith_13 jones_104 jones_104
Re: Filename Parsing
by davido (Cardinal) on Oct 03, 2003 at 15:22 UTC
    I think you're asking how to capture the "alpha" portion, the first underscore, and the first digits up until the next underscore. liz's method works. I just wanted to propose a different regexp that is more explicit (leaves less for the +? and + quantifiers to tug-of-war over).

    if ( $string =~ /^([^\W\d_]+_\d+)/ ) { print "Matched $1\n"; }

    Because the metacharacter \w embodies [A-Za-z_0-9], and \W is the same as the negated character class, "[^A-Za-z_0-9]", and because \d embodies [0-9], what the negated character class "[^\W\d]" does is it says, "Match anything that's NOT a non-word, and not a digit." The regexp engine then continues matching the first underscore and all subsequent digits, stopping when you reach something that's not a digit.

    Also, if your question was about how to obtain the contents of the directory in the first place, I happen to like the standard module File::Find. It's POD is available at your local Perl installation.


    Dave


    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein
Re: Filename Parsing
by blue_cowdawg (Monsignor) on Oct 03, 2003 at 15:26 UTC

        What is the best way to do this in perl?

    Best way? I don't know if there is a "best way", just lots of ways to do it.

    Here is my solution:

    #!/usr/bin/perl -w use strict; use warnings; use diagnostics; + my %fn=(); while (my $fnam=<DATA>) { chomp $fnam; next unless $fnam =~ m@ ^( # Begin capture $1 [a-zA-Z]+ # One or more alphas \_ # followed by underscore \d+ # one or more digits ) # End capture $1 .*$ # Who cares about the rest @x; $fn{$1}++; } + printf "%s\n",join("\n",sort keys %fn); exit(0); __END__ smith_13_503_de7.p smith_13_502_de7.v jones_104_503_de7.p jones_104_502_de7.v

    When run it produces:

    --$ perl fnamParse.pl jones_104 smith_13 --$


    Peter L. Berghold -- Unix Professional
    Peter at Berghold dot Net
       Dog trainer, dog agility exhibitor, brewer of fine Belgian style ales. Happiness is a warm, tired, contented dog curled up at your side and a good Belgian ale in your chalice.