perlsen has asked for the wisdom of the Perl Monks concerning the following question:

HI monks
I have the file which contains the details of page no with DistinctiveTitle in ContentItem tag.
I need to get the FirstPageNumber tag details for the given user input DistinctiveTitle information.
the sample input file is

<ContentItem> <TextItem> <TextItemType>03</TextItemType> <FirstPageNumber>29</FirstPageNumber> <LastPageNumber>56</LastPageNumber> <NumberOfPages>28</NumberOfPages> </TextItem> <ComponentTypeName>Chapter</ComponentTypeName> <DistinctiveTitle>The emergence of an Islamic</DistinctiveTitle> </ContentItem> <ContentItem> <TextItem> <TextItemType>03</TextItemType> <FirstPageNumber>29</FirstPageNumber> <LastPageNumber>34</LastPageNumber> <NumberOfPages>6</NumberOfPages> </TextItem> <ComponentTypeName>Chapter</ComponentTypeName> <DistinctiveTitle>The Arab Conquests</DistinctiveTitle> </ContentItem>

Thanks in advance.

Replies are listed 'Best First'.
Re: Pattern matching question
by gopalr (Priest) on Jan 31, 2005 at 06:30 UTC

    Hi Senthil

    Use zero-width negative.

    Syntax: PerlFileName.pl "<distinctivetitle>"

    undef $/; $line=<DATA>; $DisTitle=$ARGV[0]; ($FPage)=$line=~m#<ContentItem>(?:[^<]+|<(?!/?ContentItem>))+<FirstPag +eNumber>(.+?)</FirstPageNumber>(?:[^<]+|<(?!/?ContentItem>))+<Distinc +tiveTitle>$DisTitle</DistinctiveTitle>(?:[^<]+|<(?!/?ContentItem>))+< +/ContentItem>#; print "\n\'$FPage\'"; __DATA__ <ContentItem> <TextItem> <TextItemType>03</TextItemType> <FirstPageNumber>29</FirstPageNumber> <LastPageNumber>56</LastPageNumber> <NumberOfPages>28</NumberOfPages> </TextItem> <ComponentTypeName>Chapter</ComponentTypeName> <DistinctiveTitle>The emergence of an Islamic</DistinctiveTitle> </ContentItem> <ContentItem> <TextItem> <TextItemType>03</TextItemType> <FirstPageNumber>29</FirstPageNumber> <LastPageNumber>34</LastPageNumber> <NumberOfPages>6</NumberOfPages> </TextItem> <ComponentTypeName>Chapter</ComponentTypeName> <DistinctiveTitle>The Arab Conquests</DistinctiveTitle> </ContentItem>

    Thanks

    Gopal.R

Re: Pattern matching question
by davido (Cardinal) on Jan 31, 2005 at 05:53 UTC

    What kind of markup is that? Is it XML? (if it is, XML parsers don't seem to like it). Assuming you're actually passing valid XML, this doesn't have to be a "pattern matching question"; you could just do this:

    use strict; use warnings; use XML::Simple; use Data::Dumper; my $xml = <<'HERE'; <ContentItem> <TextItem> <TextItemType>03</TextItemType> <FirstPageNumber>29</FirstPageNumber> <LastPageNumber>56</LastPageNumber> <NumberOfPages>28</NumberOfPages> </TextItem> <ComponentTypeName>Chapter</ComponentTypeName> <DistinctiveTitle>The emergence of an Islamic</DistinctiveTitle> </ContentItem> <ContentItem> <TextItem> <TextItemType>03</TextItemType> <FirstPageNumber>29</FirstPageNumber> <LastPageNumber>34</LastPageNumber> <NumberOfPages>6</NumberOfPages> </TextItem> <ComponentTypeName>Chapter</ComponentTypeName> <DistinctiveTitle>The Arab Conquests</DistinctiveTitle> </ContentItem> HERE my $ref = XMLin( $xml, ForceArray => 1 ); print Dumper $ref;

    This doesn't run correctly, because the data I'm sending to XMLin() isn't understood to be valid XML by XML::Simple, but if what you've shown us is just not complete, wheras what you're working with is, this would run fine, and would place the XML document into a datastructure that's being dumped by Data::Dumper.

    And then the next step would be to just figure out which portion of the datastructure contains the data you want. No regexps, no guesswork. :)


    Dave

Re: Pattern matching question
by jbrugger (Parson) on Jan 31, 2005 at 05:31 UTC
    rtfm i'd say, but here we go again:
    #!/usr/bin/perl -w use strict; my $test="<blah>234</blah>"; my ($output) = $test =~ m/<blah>(.*?)<\/blah>/; print $output;
Re: Pattern matching question
by perlsen (Chaplain) on Jan 31, 2005 at 06:22 UTC

    I just want to match the given text by using regular expression match in the file.
    so that i was asked in that format. anyway thanks for your replies.

    thanks

    perlsen