some more Regex help reqd

Ananda has asked for the wisdom of the Perl Monks concerning the following question:

Problem Statement:

Given a String:: '<vertical name="abc"><description> Replicant consists of Icon, Title1, Title2, Description, MOre Title, and More Hyperlink (minimum=0, maximum=10)</description><item min-allowed="1" max-allowed="20">'

Goals:

a) Check if the string contains "item" node.

if exits:

b) Pick the immediate preceding elements viz "description" and "item" (case insensitive) Condition : The "description" element may or may not exist.

b-1)move the "<description>.*</description>" content to a variable.

b-2)capture the attributes of the "item" element in a variable.

I have tried using something like this

"<item .(*)?><description>(.*)?</description><replicant (.*)?>"

as search pattern but didnt succeed .

please advice.

Thanks in advance

Ananda

Comment on some more Regex help reqd

Replies are listed 'Best First'.
Re: some more Regex help reqd by Joost (Canon) on Jun 27, 2005 at 10:42 UTC
That seems to be part of an XML file. Use a real XML parser "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re^2: some more Regex help reqd by holli (Abbot) on Jun 27, 2005 at 10:50 UTC
and smells like homework. holli, /regexed monk/	[reply] [d/l]
Re: some more Regex help reqd by rev_1318 (Chaplain) on Jun 27, 2005 at 10:51 UTC
It looks like XML, but if it is intended as such, it's not valid. I doubt, you will succeed in parsing this data. If it is intended to be XML, use something like XLM::Parser or an other XML-related module. If you realy want to use RE's, you should begin with reading up on them, see perldoc perlretut and perlre for more information. Paul	[reply]
Re: some more Regex help reqd by anonymized user 468275 (Curate) on Jun 27, 2005 at 13:56 UTC
First express the valid syntax in BNF http://cui.unige.ch/db-research/Enseignement/analyseinfo/AboutBNF.html to avoid missing any rules. Based on that write a parser which follows your rules. A parser loads actual syntax into a parse tree but in your example, a hash will suffice. This is just to start you off... my $remains = $string; my %parsed = (); while ( $remains =~ /^\s\<(.)$/ ) { # for each tag... $remains = $1; #we expect an identifier ( $remains =~ /^\s(\w+)(.)$/ ) or die 'expected identifier'; my $kwd = $1; $remains = $2; if ( $kwd eq 'vertical' ) { ( $remains =~ /^\s(\w+)\s\=\s(\w+)(.)$/ or die 'expected a +ssignment'; $parsed{ $kwd } and die 'duplicate definition'; $parsed{ $kwd }{ $1 } = $2; $remains = $2; Exhausted( \$remains ); next; # next tag } if ... # next tag processor goes here } [download] You'll also need a simple routine Exhausted to make sure nothing is left inside a tag and something to deal with the closing of a named tag -- enough said? -S One world, one people	[reply] [d/l]
Re: some more Regex help reqd by graff (Chancellor) on Jun 28, 2005 at 02:51 UTC
Given a String:: `<vertical name="abc"><description> Replicant consists of Icon, Title1, Title2, Description, MOre Title, and More Hyperlink (minimum=0, maximum=10)</description><item min-allowed="1" max-allowed="20">` ... I have tried using something like this `<item .()?><description>(.)?</description><replicant (.)?>` [download] as search pattern but didnt succeed . Well, of course something like that wouldn't succeed: the data sample has the "item" tag after* the "description" tag, but the regex you tried has these tags the wrong way around. And why are you looking for a "replicant" tag that doesn't occur in the sample data? Try putting <code> ... </code> tags around sample data and code snippets when you post or update a node, and try a solution that makes sense for the given data.	[reply] [d/l] [select]