in reply to Regex match: Ignoring first occurences
G'day cryion,
Welcome to the Monastery.
"But I have no way around using regex at the moment."
Using a regex to parse XML is generally a poor choice. Why do you have no way around this?
On the basis that you must use a regex, there is a distinct disconnect between the code and data you've posted and the regex you say doesn't work.
Parsing your XML code, line by line, with the regex you've shown (i.e. 'file:(.*?).xml'), captures one piece of data:
/path/to/some/file
Had you used different paths, such that you could see which path was being matched, you'd know that 'file:/path/to/some/file.mxf' ("the very first occurence of the file: string") was not matched at all. Consider this test:
#!/usr/bin/env perl -l use strict; use warnings; my $re = qr{file:(.*?).xml}; while (<DATA>) { print $1 if /$re/; } __DATA__ <xml> <info> <file>file:/path/to/someA/file.mxf</file> </info> <info> <file>file:/path/to/someB/file.xml</file> </info> </xml>
Output:
/path/to/someB/file
So, you're matching the right path, but not capturing all of it.
A '.' in a regex matches any character (except newline), so you really need '\.xml', not '.xml'. The closing parenthesis needs to be after '\.xml' to capture to whole pathname.
Making those changes:
#!/usr/bin/env perl -l use strict; use warnings; my $re = qr{file:(.*?\.xml)}; while (<DATA>) { print $1 if /$re/; } __DATA__ <xml> <info> <file>file:/path/to/someA/file.mxf</file> </info> <info> <file>file:/path/to/someB/file.xml</file> </info> </xml>
Gives this output:
/path/to/someB/file.xml
Which is what you state you wanted: "the whole path to the xml file".
— Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regex match: Ignoring first occurences
by cryion (Initiate) on Aug 10, 2015 at 14:15 UTC | |
by GotToBTru (Prior) on Aug 10, 2015 at 15:26 UTC |