Re: Help with a regular expression for file name parsing
by BrowserUk (Patriarch) on Dec 07, 2011 at 07:11 UTC
|
print $data;;
#some "random stuff" @include "some file" did you parse that?
#more 'random' stuff @include 'another file' you sure?
#and more random stuff @include yet\ another\ file positive?
print for $data =~ m[\@include\s('[^']+'|"[^"]+"|.+?(?<!\\))\s]g;;
"some file"
'another file'
yet\ another\ file
Spreading that out a bit: m[
\@include \s ## the introducer followed by a space
( ## capture
'[^']+' ## A single quoted string with no embedded single
+ quotes
| ## or
"[^"]+" ## a double quoted string with no embedded double
+ quotes
| ## or
.+? (?<!\\) ## a min length string that ends in a space that
+isn't escaped
)
\s
]gx;;
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] [select] |
|
|
Your regular expression works, but the code is rather a muddle. Here's a version that he can use to test with:
$data = join '', <DATA>;
print "$_\n" for $data =~ m[\@include\s('[^']+'|"[^"]+"|.+?(?<!\\))\s]
+g;
__DATA__
#some "random stuff" @include "some file" did you parse that?
#more 'random' stuff @include 'another file' you sure?
#and more random stuff @include yet\ another\ file positive?
| [reply] [d/l] |
|
|
When tested with this version, the output is just
1
| [reply] [d/l] |
|
|
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Help with a regular expression for file name parsing
by Anonymous Monk on Dec 07, 2011 at 07:19 UTC
|
Surely such a format has a parser already , but anyway, I'm sure this will work, untested
my $pat = qr~
\@include
\s+
(
(?: '[^']*' )
|
(?: "[^"]*" )
|
(?:
(?:\\.)
|
[^\\s]
)+
)
~x;
[^" makes a good search term to find regex for similar formats, like
?node_id=3989;BIT=%5B%5E%22 -> Re^3: More robust link finding than HTML::LinkExtor/HTML::Parser?, Re: skip over an escaped single quote | [reply] [d/l] [select] |
|
|
#!/usr/bin/perl --
#~ 2011-12-07-04:10:56PDT by Anonymous Monk
#~ perltidy -csc -otr -opr -ce -nibc -i=4
use strict;
use warnings;
use autodie; # dies if open/close... fail
Main( @ARGV );
exit( 0 );
sub Main {
if ( @_ == 2 ) {
NotDemoMeaningfulName(@_);
} else {
Demo();
print '#' x 33 ,"\n", Usage();
}
} ## end sub Main
sub NotDemoMeaningfulName {
my ( $inputFile, $outputFile ) = @_;
open my ($inFh), '<', $inputFile;
open my ($outFh), '>', $outputFile;
while( defined( my $data = <$inFh>) ){
print $outFh "$_\n"
for $data =~ m~
\@include
\s+
(
(?: '[^']*' )
|
(?: "[^"]*" )
|
(?:
(?:\\.)
|
[^\\\s]
)+
)
~xg;
#~ for $data =~ m[\@include\s('[^']+'|"[^"]+"|.+?(?<!\\))\
+s]g;
# /\@include\s+('[^']+'|"[^"]+"|.+?(?<!\\))\s+
+/g
}
close $inFh;
close $outFh;
} ## end sub NotDemoMeaningfulName
sub Usage {
<<"__USAGE__";
$0
$0 dataFile newDataFile
__USAGE__
} ## end sub Usage
sub Demo {
my ( $Input, $WantedOutput ) = DemoData();
NotDemoMeaningfulName( \$Input, \my $Output );
require Test::More;
Test::More::is(
$Output,
$WantedOutput,
' NotDemoMeaningfulName Works Aas Designed'
);
Test::More::done_testing();
print "\n$Output\n";
} ## end sub Demo
sub DemoData {
#~ http://perlmonks...
my $One = <<'__One__';
@include test
#some "random stuff" @include "some file" did you parse that?
#more 'random' stuff @include 'another file' you sure?
#and more random stuff @include yet\ another\ file positive?
__One__
#~ http://perlmonks...
my $Two = <<'__Two__';
test
"some file"
'another file'
yet\ another\ file
__Two__
return $One, $Two;
} ## end sub DemoData
__END__
$ perl pm.re.942167.pl
ok 1 - NotDemoMeaningfulName Works Aas Designed
1..1
test
"some file"
'another file'
yet\ another\ file
#################################
pm.re.942167.pl
pm.re.942167.pl dataFile newDataFile
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
Re: Help with a regular expression for file name parsing
by TJPride (Pilgrim) on Dec 07, 2011 at 14:12 UTC
|
There are really two parts to this. The first is to match the three patterns; the second to eliminate the unwanted wrapper or backslash characters. I tried to figure out a regex that would do both at once, but it's either impossible or my knowledge of regex isn't up to the task. So I cheated.
use strict;
use warnings;
my $data = join '', <DATA>;
my $file;
while ($data =~ m/\@include (".*?"|'.*?'|(?:[^\s\\]|\\ )+)/g) {
$file = $1; $file =~ s/["'\\]+//g;
print "$file\n";
}
__DATA__
#some "random stuff" @include "some file" did you parse that?
#more 'random' stuff @include 'another file' you sure?
#and more random stuff @include yet\ another\ file positive?
CAVEAT: Assumes that ", ', and \ will never appear within filenames themselves. If they can, this gets much more complex. | [reply] [d/l] |
|
|
Thanks, you've been the most helpful one so far. Sadly, the above solution also doesn't solve the problem properly. However, I managed to combine it with another of the regular expressions that was proposed, plus some code for better resolving the escape sequences in the string, plus a better way of removing the quotes (only from the ends of the string - not from everywhere).
Here is what I managed to come up with:
use strict;
use warnings;
while (my $data = <DATA>)
{
if ($data =~ /\@include/i)
{
$data =~ m/\@include\s+('^'+'|"^"+"|.+?(?<!\\))\s/gi;
my $fname = $1;
$fname =~ s/\\(rnt'"\\ )/"qq|\\$1|"/gee;
$fname =~ s/^"(.*)"$/$1/s or
$fname =~ s/^'(.*)'$/$1/s;
print "File name: <$fname>\n";
}
}
__DATA__
#some "random stuff" @include "some file" did you parse that?
#more 'random' stuff @include 'another file' you sure?
#and more random stuff @include yet\ another\ file positive?
#@Include file
# @include "\"another one\"" hmmm...
# some stuff
The "if" is there because, as I've mentioned above, I have to do some other processing of the lines, too. This code mostly works although, as you say, it doesn't handle properly file names containing escaped quotes.
Perhaps I should give up the idea of parsing this in some clever way and just process the part after the "@include" character-by-character? | [reply] |
|
|
use strict;
use warnings;
while (my $data = <DATA>)
{
if ($data =~ /\@include/i)
{
$data =~ m/\@include\s+('[^']+'|"[^"]+"|.+?(?<!\\))\s/gi;
my $fname = $1;
$fname =~ s/\\([rnt'"\\ ])/"qq|\\$1|"/gee;
$fname =~ s/^"(.*)"$/$1/s or
$fname =~ s/^'(.*)'$/$1/s;
print "File name: <$fname>\n";
}
}
__DATA__
#some "random stuff" @include "some file" did you parse that?
#more 'random' stuff @include 'another file' you sure?
#and more random stuff @include yet\ another\ file positive?
#@Include file
# @include "\"another one\"" hmmm...
# some stuff
| [reply] [d/l] |
|
|