Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

G'day Monks,

I need some help with a regular expression. I need to parse through a scalar value and extract from it the following:
/some/path/to/a/file[some numbers here...don't know how many, but prob +ably at least ten].ext

An example of what I need to extract would be: /home/monks/thanksforhelping1234567890.ext

So, basically, I need to extract the path of a file from this scalar, so I can eventually open a filehandle. I'm pretty new to pattern matching, so I'm having lots of trouble. Can anyone help, please?

Replies are listed 'Best First'.
Re: Regular Expression Assistance
by dda (Friar) on Jun 10, 2002 at 12:41 UTC
    Simple way of extracting directory name:
    use File::Basename; $s = '/home/monks/thanksforhelping1234567890.ext'; print dirname $s;
Re: Regular Expression Assistance
by Bilbo (Pilgrim) on Jun 10, 2002 at 12:42 UTC
    How about this:
    my $root = "/home/monks/thanksforhelping"; my $ext = ".ext"; $string =~ /($root \d+ $ext)/x; my $filename = $1;
    Where $string is the string contaning the filename to be extracted, $root is the known bit of the filename and $ext is the file extension. The regex matches $root followed by one or more digits followed by $ext. The x on the end of the regex means that spaces in the pattern don't do anything (except make it more readable).
Re: Regular Expression Assistance
by moxliukas (Curate) on Jun 10, 2002 at 12:47 UTC

    Well, a quick solution to this could be

    $_ =~ /(\/home\/monks\/thanksforhelping\d+\.ext)/

    However there are some points that I would like to make:

    • This will work only if the scalar contains the path you are searching
    • If there are two or more paths like this i a scalar, this will match only the first one
    • I am not an expert in regexp either, so always take my advice with a grain of salt ;)
      This is what I thought it should look like. Unfortunately, when I tested it, I also got stuff that was on the same line as the path I wanted, but was not part of the path itself. For instance, I got:

      but you might try looking at /home/monks/thanksforhelping1234567890.ext for help

      Any suggestions on how to get rid of the excess stuff?

        I don't really know what is happening at your end, but this seems

        $_ = "but you might try looking at /home/monks/thanksforhelping1234567 +890.ext for help"; $_ =~ /(\/home\/monks\/thanksforhelping\d+\.ext)/; print $1;

        to output:

        /home/monks/thanksforhelping1234567890.ext

        So check for typos... that could be your problem

Re: Regular Expression Assistance
by Anonymous Monk on Jun 10, 2002 at 12:27 UTC
    One more thing I should specify: I know what the directory should be and I know the alphabetic part of the filename...the only part I don't know is the number part. So, in the example I gave, I know that the pattern I want to find has /home/monks/thanksforhelping and I know that it ends in .ext, but I don't know that it contains 1234567890...I only know that it contains some random string of numbers.
      I would do it using File::Basename to separate the path from the filename. This simplifies the task a little: you just have to extract the number from the filename:
      #!/usr/bin/perl -w use strict; use File::Basename; while (<>) { my ($name, $path, $suffix) = fileparse ($_, ".txt"); $name =~ /\D+(\d+)/; print "number: $1\n"; }

      I hope this helps.

      marcos
      If you know the absolute filename and want to open that file, what prevents you from doing just that?

      _Ass_uming safe input data here:

      my $filename = $ARGV[0] or die "No filename given\n"; open INPUT, "<", $filename or die "could not open $filename\n"; while (<INPUT>) { print $_ }; close INPUT;
      Anyway:
      If the complete filename is in $filename you can do:
      my ($file_without_path) = $filename =~ /^.*?\/(\w*?\w*?\.ext)$/;

      and end up with something like: "file23432545335.ext"

      Still, I don't see the point if you trust the filename as being safe, and just want to open it. (Since you will need the path anyway).

      janx

Re: Regular Expression Assistance
by insensate (Hermit) on Jun 10, 2002 at 12:54 UTC
    Most of the above will only help if the path stays the same throughout the scalar value...this will grab multiple paths...you can then push $pathtofile onto an array etc...
    for($scalar){ /((?:\/\w+)+\/file\d+.ext)/; $pathtofile=$1; }

    -Jason
Re: Regular Expression Assistance
by Sifmole (Chaplain) on Jun 10, 2002 at 12:42 UTC
    $path =~ m/thanksforhelping([^.]+)\.ext/; $whatyouwant = $1;