yoda54 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I am a bit confused with multi line matching, how can I get the value of property name "Text" to match since it spans across multiple lines?

Thanks!

#!/usr/bin/perl use strict; use warnings; while(<DATA>) { if (/<cdset id/.../<\/cdset/) { if ($_ =~ /<property name="(.*)"\s+value="(.*)"\/\>/ms) { print "$1 -> $2\n"; } elsif ($_ =~ m/<property name="(.+)"\s+value=""\/\>/) { # if no value print "$1 -> \"\"\n"; } } } __DATA__ <?xml version="1.0" encoding="utf-8" standalone="yes"?> <set name="01" id="test" catId="81679" > <cdsets> <cdset id="cdset" name="CD Compilation"> <property name="Own" value=""/> <property name="Type" value="Record"/> <property name="Text" value="Sample text more sample text more more same text]."/> <property name="Unique" value="yes"/> </cdset> </cdsets> </set> Output: Own -> Type -> Record Unique -> yes

Replies are listed 'Best First'.
Re: Perl Regex Multiline Matching (XML::Twig)
by beech (Parson) on Oct 14, 2016 at 00:18 UTC

    Hi,

    I am a bit confused with multi line matching, how can I get the value of property name "Text" to match since it spans across multiple lines?

    Use something like XML::Twig it knows how to do that already :)

    #!/usr/bin/perl -- use strict; use warnings; use XML::Twig; my $xml = q{<?xml version="1.0" encoding="utf-8" standalone="yes"?> <set name="01" id="test" catId="81679" > <cdsets> <cdset id="cdset" name="CD Compilation"> <property name="Own" value=""/> <property name="Type" value="Record"/> <property name="Text" value="Sample text more sample text more more same text]."/> <property name="Unique" value="yes"/> </cdset> </cdsets> </set> }; XML::Twig->new( twig_roots => { 'cdset' => sub { print $_->path, ' '; printf "id(%s)=%s\n", $_->att('id'), $_->att('name'); }, 'cdset/property' => sub { print $_->path, ' '; print $_->att('name'), '=', $_->att('value'),"\n"; }, }, )->parse( $xml ); __END__ /set/cdset/property Own= /set/cdset/property Type=Record /set/cdset/property Text=Sample text more sample text more more same +text]. /set/cdset/property Unique=yes /set/cdset id(cdset)=CD Compilation
      Thanks!
Re: Perl Regex Multiline Matching
by choroba (Cardinal) on Oct 14, 2016 at 05:38 UTC
    Note that XML specification says that newlines in attributes should be normalized away during parsing. And that's what XML::LibXML does indeed:
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use XML::LibXML; my $dom = 'XML::LibXML'->load_xml(IO => *DATA{IO}); for my $property ($dom->findnodes('/set/cdsets/cdset/property')) { say join ' -> ', @$property{qw{ name value }}; } __DATA__ <?xml version="1.0" encoding="utf-8" standalone="yes"?> <set name="01" id="test" catId="81679" > <cdsets> <cdset id="cdset" name="CD Compilation"> <property name="Own" value=""/> <property name="Type" value="Record"/> <property name="Text" value="Sample text more sample text more more same text]."/> <property name="Unique" value="yes"/> </cdset> </cdsets> </set>

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Perl Regex Multiline Matching
by morgon (Priest) on Oct 14, 2016 at 00:13 UTC
    Parsing XML with regexes is a punishable offence, but who wants to throw the first parser...

    This

    #!/usr/bin/perl use strict; use warnings; my $data = join "", <DATA>; # slurp DATA-section while( $data =~ /<property name="(.*?)"\s+value="(.*?)"/gs ) { print qq|$1 -> "$2"\n|; } __DATA__ <?xml version="1.0" encoding="utf-8" standalone="yes"?> <set name="01" id="test" catId="81679" > <cdsets> <cdset id="cdset" name="CD Compilation"> <property name="Own" value=""/> <property name="Type" value="Record"/> <property name="Text" value="Sample text more sample text more more same text]."/> <property name="Unique" value="yes"/> </cdset> </cdsets> </set>
    prints
    Own -> "" Type -> "Record" Text -> "Sample text more sample text more more same text]." Unique -> "yes"
Re: Perl Regex Multiline Matching
by kcott (Archbishop) on Oct 14, 2016 at 09:21 UTC

    G'day yoda54,

    "I am a bit confused with multi line matching, how can I get the value of property name "Text" to match since it spans across multiple lines?"

    The main problem with your code is that you're not dealing with multilines. You're reading one line at a time from the DATA filehandle. This is determined by the input record separator[$/] which is set to newline by default: you'd need to change this — I'd recommend localising the change in an anonymous block.

    Here's the technique:

    #!/usr/bin/env perl use 5.014; use strict; use warnings; my $re; BEGIN { $re = qr{(?msx: name="([^"]+)" .*? value="([^"]*)" )} +} { local $/ = '/>'; while (<DATA>) { next unless /$re/; say 'name=[', $1, ']; value=[', $2 =~ y/ \n/ /rs, ']'; } } __DATA__ <property name="Own" value=""/> <property name="Type" value="Record"/> <property name="Text" value="Sample text more sample text more more same text]."/> <property name="Unique" value="yes"/>

    Output:

    name=[Own]; value=[] name=[Type]; value=[Record] name=[Text]; value=[Sample text more sample text more more same text]. +] name=[Unique]; value=[yes]

    — Ken

Re: Perl Regex Multiline Matching
by BrowserUk (Patriarch) on Oct 14, 2016 at 00:13 UTC
    while(<DATA>) { ... processes the file line by line -- as does m// ... m // -- so how could your regex possibly match multiple lines?

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks!