Ananda has asked for the wisdom of the Perl Monks concerning the following question:

Problem Statement:

Given a String:: '<vertical name="abc"><description> Replicant consists of Icon, Title1, Title2, Description, MOre Title, and More Hyperlink (minimum=0, maximum=10)</description><item min-allowed="1" max-allowed="20">'

Goals:

a) Check if the string contains "item" node.

if exits:

b) Pick the immediate preceding elements viz "description" and "item" (case insensitive) Condition : The "description" element may or may not exist.

b-1)move the "<description>.*</description>" content to a variable.

b-2)capture the attributes of the "item" element in a variable.

I have tried using something like this

"<item .(*)?><description>(.*)?</description><replicant (.*)?>"

as search pattern but didnt succeed .

please advice.

Thanks in advance

Ananda

Replies are listed 'Best First'.
Re: some more Regex help reqd
by Joost (Canon) on Jun 27, 2005 at 10:42 UTC
      and smells like homework.


      holli, /regexed monk/
Re: some more Regex help reqd
by rev_1318 (Chaplain) on Jun 27, 2005 at 10:51 UTC
    It looks like XML, but if it is intended as such, it's not valid. I doubt, you will succeed in parsing this data. If it is intended to be XML, use something like XLM::Parser or an other XML-related module. If you realy want to use RE's, you should begin with reading up on them, see perldoc perlretut and perlre for more information.

    Paul

Re: some more Regex help reqd
by anonymized user 468275 (Curate) on Jun 27, 2005 at 13:56 UTC
    First express the valid syntax in BNF http://cui.unige.ch/db-research/Enseignement/analyseinfo/AboutBNF.html to avoid missing any rules.

    Based on that write a parser which follows your rules. A parser loads actual syntax into a parse tree but in your example, a hash will suffice. This is just to start you off...

    my $remains = $string; my %parsed = (); while ( $remains =~ /^\s*\<(.*)$/ ) { # for each tag... $remains = $1; #we expect an identifier ( $remains =~ /^\s*(\w+)(.*)$/ ) or die 'expected identifier'; my $kwd = $1; $remains = $2; if ( $kwd eq 'vertical' ) { ( $remains =~ /^\s*(\w+)\s*\=\s*(\w+)(.*)$/ or die 'expected a +ssignment'; $parsed{ $kwd } and die 'duplicate definition'; $parsed{ $kwd }{ $1 } = $2; $remains = $2; Exhausted( \$remains ); next; # next tag } if ... # next tag processor goes here }
    You'll also need a simple routine Exhausted to make sure nothing is left inside a tag and something to deal with the closing of a named tag -- enough said?

    -S

    One world, one people

Re: some more Regex help reqd
by graff (Chancellor) on Jun 28, 2005 at 02:51 UTC
    Given a String::  <vertical name="abc"><description> Replicant consists of Icon, Title1, Title2, Description, MOre Title, and More Hyperlink (minimum=0, maximum=10)</description><item min-allowed="1" max-allowed="20">

    ... I have tried using something like this

    <item .(*)?><description>(.*)?</description><replicant (.*)?>
    as search pattern but didnt succeed .
    Well, of course something like that wouldn't succeed: the data sample has the "item" tag after the "description" tag, but the regex you tried has these tags the wrong way around. And why are you looking for a "replicant" tag that doesn't occur in the sample data?

    Try putting <code> ... </code> tags around sample data and code snippets when you post or update a node, and try a solution that makes sense for the given data.