Plotinus has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks. Using opendir and readdir to pass a directory listing into an array. This leaves me with filename.ext (in this case .pdf). Before processing further I wish to strip off the trailing .pdf extension for which I thought s/// would be ideal so I put the following line together.
$approvedpdfs =~ s/[A-Z]*.pdf/[A-Z]*/e;
It don't work - it gives me a syntax error. Just to make it explicit, the filename always begins with a capitalised letter hence [A-Z] but it also has spaces in it. I thought * was zero or more occurances of any possible character including the space.

Hmm.

Today is a good day to be stupid.

Plotinus

p.s. this isn't the first version of the string I've tried but haven't noted them down and can't remember what they are because regex's are just *sooo* readable and intuitive.

Replies are listed 'Best First'.
Re: using s/// to remove file extensions
by polettix (Vicar) on Mar 30, 2005 at 13:23 UTC
    Just delete the extension at the end of the name.
    $approvedpdfs =~ s/\.pdf$//;
    You could find File::Spec useful for this filename stuff, anyway.

    Update: you should also take a look at perldoc perlretut, regexes aren't shell expansion rules :)

    Flavio

    Don't fool yourself.
      File::Basename might also be useful. Its got info on parsing out filenames (directories, basenames and extensions). I use the fileparse function to break apart the file name for renaming purposes.
Re: using s/// to remove file extensions
by davis (Vicar) on Mar 30, 2005 at 13:32 UTC
    You're confusing Perl's regular expressions with shell filename globbing rules. Using YAPE::Regex::Explain thusly:
    #!/usr/bin/perl use warnings; use strict; use YAPE::Regex::Explain; my $re = qr/[A-Z]*.pdf/; print YAPE::Regex::Explain->new($re)->explain;
    gives
    The regular expression: (?-imsx:[A-Z]*.pdf) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- [A-Z]* any character of: 'A' to 'Z' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- pdf 'pdf' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    davis
    Kids, you tried your hardest, and you failed miserably. The lesson is: Never try.
Re: using s/// to remove file extensions
by cog (Parson) on Mar 30, 2005 at 13:32 UTC
    You're confusing wildcards with regular expressions.

    With wildcards, * is definitely "zero or more of any possible character including the space."

    With regular expressions, * is "zero or more of the last token"

    Your last token before the * is [A-Z]; hence, you'r [A-Z]*.pdf matches things like:

    • HELLO.pdf
    • CAPITAL.pdf

    ...but not something like file.pdf, for instance.

    So regarding your regex, what you really wanted was [A-Z].*\.pdf... you see, a "." means (in Perl's regular expressions), "any single character except for the newline." (this is also why you have to escape the following dot, the one separating the filename from the extension)

    Also, your right-hand side of the substitution won't work... that's going to name the file, literally, [A-Z]*

    Instead, you have to capture the filename in a special variable (with brackets) and use that variable (in this case it's going to be $1, because it's going to be the first bracket counting from the left). You do it like this:

    s/([A-Z].*)\.pdf/$1/;

    There. Your filename, without the extension.

    frodo72's solution is, however, better (and there are others, of course), but this is what you were trying to do, corrected.

Re: using s/// to remove file extensions
by ihb (Deacon) on Mar 30, 2005 at 13:30 UTC

    s/// doesn't work like that. In s/PATTERN/REPLACEMENT/ the PATTERN in a regular expression, like the one you've given, but the REPLACEMENT part isn't. The replacement is just a string use like a regular double-quoted string. The /e makes the string evaluate, i.e. act like code instead of a piece of chars and [A-Z]* isn't valid Perl code. So remove the e, don't use a pattern in the replacement part, and ultimately fix your pattern. That might include capturing a part of the match matched by PATTERN, and you then interpolate that match in the replacement part (as the replacement is just a double-quoted string). What you really want to have, I think, is

    $file =~ s/^([A-Z].*)\.pdf\z/$1/;
    Now this introduce quite a few things, like ^, (), .*, \., \z, and $1. You can read more about them in perlretut and other documents that come with Perl. To learn about s/// see perlop.

    Update: Didn't read closely enough; adjusted the pattern to fit the OP.

    ihb

    See perltoc if you don't know which perldoc to read!

Re: using s/// to remove file extensions
by Plotinus (Sexton) on Mar 30, 2005 at 13:41 UTC
    Scary people, 5 replies in 3 minutes. I'm having multiple 'doh!' moments whilst reading them. Thank you and I'll play some more.

    Plotinus

      I've recently began recommending this place in my Learning Perl classes, mostly for the effect you just described. A fairly decent search mechanism, coupled with the "XP feeding frenzy", yields cross-checked high-quality answers to questions at all levels.

      And I do mean all levels... I've asked some tough questions here recently, and gotten some amazingly cool responses. Nice place, vroom.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

Re: using s/// to remove file extensions
by inman (Curate) on Mar 30, 2005 at 13:54 UTC
    You may also wish to check out File::Find as this will help you to recursively scan an entire directory structure and do something with each file that you find. The code below should work OK for you.
    use File::Find; my @directories = @ARGV ? @ARGV : ('.'); find(\&wanted, @directories); sub wanted { return unless /\.pdf$/i; print "working with $File::Find::name\n"; }
Re: using s/// to remove file extensions
by FitTrend (Pilgrim) on Mar 30, 2005 at 13:29 UTC

    How about using split instead?

    ($fName, $extension) = split (/\./, $FullFileName); print scalar $fName;

      Of course I'm assuming all files don't have extra periods in them. A module may be best, unless you know what kind of files to expect consistantly.