Re: Regex match last
by ww (Archbishop) on Aug 20, 2012 at 10:38 UTC
|
C:\>perl -E "my $str='<c:t="AD2343"/><c:p>65677676</c:p>'; if ( $str =
+~ m|(<c:.*)?(?:[/>]{2})| ) {say $1;}"
<c:t=AD2343
Your code asks the regex engine to match a 'c', a colon and any number of anything thereafter(except newlines).
Because you didn't provide the actual code, we can't be sure just what other issues may be in play... such as the previously mentioned use of an alternate regex marker.
Update (based on the redefined problem in Re^2: Regex match last): You'll probably have fewer problems in the long run if you use an html parser of one flavor or another, rather than trying to parse html with regexen. | [reply] [d/l] |
|
|
| [reply] |
|
|
It's called parsing whenever you want to do anything with (specific bits of) the data you read :)
| [reply] |
|
|
| [reply] |
Re: Regex match last
by moritz (Cardinal) on Aug 20, 2012 at 10:18 UTC
|
use 5.010;
use strict;
use warnings;
$_ = q[<c:t="AD2343"/><c:p>65677676</c:p>];
m{<([^>/]+)/>} && say $1
| [reply] [d/l] [select] |
|
|
i want to match only <c/> not </c> or <c>.
| [reply] |
|
|
Then add a c after the opening < in the regex.
| [reply] [d/l] [select] |
Re: Regex match last
by Neighbour (Friar) on Aug 20, 2012 at 10:14 UTC
|
Try this. It uses @ as alternate regex-character (instead of /) and captures any (single) tag within a line.
#!/usr/bin/perl
use strict;
use warnings;
my $data = '<c:t="AD2343"/><c:p>65677676</c:p>';
if ($data =~ m@.*(<c:[^>]*/>).*@) {
print("Match: [$1]\n");
}
| [reply] [d/l] |
|
|
This does not match the required text in the following code :
<w:body><w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsidRDefault="00A654E7" w:rsidP="00A654E7"><w:pPr><w:pStyle w:val="Standard"/><w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:cstheme="minorHAnsi"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r w:rsidRPr="00AD741F"/></w:body>
I want only these to be matched :
1<w:pStyle w:val="Standard"/>
2<w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi"
+w:cstheme="minorHAnsi"/>
3<w:sz w:val="20"/>
4<w:r w:rsidRPr="00AD741F"/>
| [reply] [d/l] [select] |
|
|
#! perl -slw
use strict;
m[(<[^/>]+/>)] and print "'$1'" while <DATA>;
__DATA__
<w:body>
<w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsidRDefault="00A654E7
+" w:rsidP="00A654E7">
<w:pPr>
<w:pStyle w:val="Standard"/>
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:cs
+theme="minorHAnsi"/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00AD741F"/>
</w:body>
Outputs: C:\test>junk23
'<w:pStyle w:val="Standard"/>'
'<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:csthe
+me="minorHAnsi"/>'
'<w:sz w:val="20"/>'
'<w:szCs w:val="20"/>'
'<w:r w:rsidRPr="00AD741F"/>'
BTW: Your sample XML is broken. The second level tag, <w:p ...> is never closed which will break strict XML parsers
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] [select] |
|
|
No kidding. Your original post specified <c:.../> and your current 'does not match' code uses <w:.../>. You asked for a tool to turn a cross-head screw into a board, and when given a cross-head screwdriver say 'ok, but I cannot use it to turn this slotted screw'.
| [reply] [d/l] [select] |
|
|
In that case, you'll want:
#!/usr/bin/perl
use strict;
use warnings;
my $data = '<w:body><w:p w:rsidR="00A654E7" w:rsidRPr="00AD741F" w:rsi
+dRDefault="00A654E7" w:rsidP="00A654E7"><w:pPr><w:pStyle w:val="Stand
+ard"/><w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorH
+Ansi" w:cstheme="minorHAnsi"/><w:sz w:val="20"/><w:szCs w:val="20"/><
+/w:rPr></w:pPr><w:r w:rsidRPr="00AD741F"/></w:body>';
print "Matches found:\n" . join (",\n", $data =~ m@(<[^>]*/>)@g) . "\n
+";
Output:Matches found:
<w:pStyle w:val="Standard"/>,
<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:csthem
+e="minorHAnsi"/>,
<w:sz w:val="20"/>,
<w:szCs w:val="20"/>,
<w:r w:rsidRPr="00AD741F"/>
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Regex match last
by BillKSmith (Monsignor) on Aug 20, 2012 at 13:08 UTC
|
Your regex does match your sample string! (But not in the way you want.) A very good reason to use a module.
A Simple fix is:
use strict;
use warnings;
use English;
my $html = '<c:t="AD2343"/><c:p>65677676</c:p>';
$html =~ m{<c:.*/>};
print $MATCH, "\n";
| [reply] [d/l] |
Re: Regex match last
by Anonymous Monk on Aug 20, 2012 at 12:45 UTC
|
How many times will the substring appear? Not many? Great. Put it in a while-loop with the 'g' modifier and be done. | [reply] |