in reply to Help with a regular expression for file name parsing

There are really two parts to this. The first is to match the three patterns; the second to eliminate the unwanted wrapper or backslash characters. I tried to figure out a regex that would do both at once, but it's either impossible or my knowledge of regex isn't up to the task. So I cheated.

use strict; use warnings; my $data = join '', <DATA>; my $file; while ($data =~ m/\@include (".*?"|'.*?'|(?:[^\s\\]|\\ )+)/g) { $file = $1; $file =~ s/["'\\]+//g; print "$file\n"; } __DATA__ #some "random stuff" @include "some file" did you parse that? #more 'random' stuff @include 'another file' you sure? #and more random stuff @include yet\ another\ file positive?

CAVEAT: Assumes that ", ', and \ will never appear within filenames themselves. If they can, this gets much more complex.

Replies are listed 'Best First'.
Re^2: Help with a regular expression for file name parsing
by bontchev (Sexton) on Dec 09, 2011 at 08:22 UTC

    Thanks, you've been the most helpful one so far. Sadly, the above solution also doesn't solve the problem properly. However, I managed to combine it with another of the regular expressions that was proposed, plus some code for better resolving the escape sequences in the string, plus a better way of removing the quotes (only from the ends of the string - not from everywhere).

    Here is what I managed to come up with:

    use strict;
    use warnings;
    
    while (my $data = <DATA>)
    {
    	if ($data =~ /\@include/i)
    	{
    		$data =~ m/\@include\s+('^'+'|"^"+"|.+?(?<!\\))\s/gi;
    		my $fname = $1;
    		$fname =~ s/\\(rnt'"\\ )/"qq|\\$1|"/gee;
    		$fname =~ s/^"(.*)"$/$1/s or
    		$fname =~ s/^'(.*)'$/$1/s;
    		print "File name: <$fname>\n";
    	}
    }
    
    __DATA__
    #some "random stuff" @include 	"some file" did you parse that?
    #more 'random' stuff @include 'another file' you sure?
    #and more random stuff @include yet\ another\ file positive?
    #@Include file
    #	@include		"\"another one\""	hmmm...
    # some stuff

    The "if" is there because, as I've mentioned above, I have to do some other processing of the lines, too. This code mostly works although, as you say, it doesn't handle properly file names containing escaped quotes.

    Perhaps I should give up the idea of parsing this in some clever way and just process the part after the "@include" character-by-character?

      Sigh, the site mangled the code I posted. :-( I guess I've used the wrong tag. Let's try again:

      use strict; use warnings; while (my $data = <DATA>) { if ($data =~ /\@include/i) { $data =~ m/\@include\s+('[^']+'|"[^"]+"|.+?(?<!\\))\s/gi; my $fname = $1; $fname =~ s/\\([rnt'"\\ ])/"qq|\\$1|"/gee; $fname =~ s/^"(.*)"$/$1/s or $fname =~ s/^'(.*)'$/$1/s; print "File name: <$fname>\n"; } } __DATA__ #some "random stuff" @include "some file" did you parse that? #more 'random' stuff @include 'another file' you sure? #and more random stuff @include yet\ another\ file positive? #@Include file # @include "\"another one\"" hmmm... # some stuff
        m/\@include\s+('[^']+'|"[^"]+"|.+?(?<!\\))\s/gi;

        Ah. So my regex wasn't so useless to you after all.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?