Re^2: One more parsing ATOM question

Jenda,

Thanks for your suggestions on the rules. I've gotten to the point where I'm trying to extract values out of the CDATA field. I've tried a lot of different ideas (HTML tables, simple HTML extraction, stripping tags, RegExp, etc.) but I think that using the ::Rules engine would simply be the most straightforward. I've read your CPAN writeup on ::Rules (are you the author? Very cool) and studied but I'm not quite sure how to best proceed.

I can extract the CDATA content and end up with a resultant set of tags and values. Your comment leads me to believe that I can create a hash of the tags and values then pick the ones I want. That seems to be the exact discussion in the ::Rules section about addresses, streets, Larry Wall, multiple tags and hashrefs. But I don't understand the discussion in that section, can you expand further?

Your XML::Rules section, quoted below, would seem to be the relevant part.

our %states = (
          AL => 'Alabama',
          AK => 'Alaska',
          ...
        );
        ...
        state => sub {return 'state' => $states{$_[1]->{_content}}; }

 or

        address => sub {
                if (exists $_[1]->{id}) {
                        $sthFetchAddress->execute($_[1]->{id});
                        my $addr = $sthFetchAddress->fetchrow_hashref(
+);
                        $sthFetchAddress->finish();
                        return 'address' => $addr;
                } else {
                        return 'address' => $_[1];
                }
        }
[download]

Comment on Re^2: One more parsing ATOM question Download Code

Replies are listed 'Best First'.
Re^3: One more parsing ATOM question by Jenda (Abbot) on Jun 14, 2013 at 23:55 UTC
In XML, these two are equivalent: `<foo><bar/></foo>` and `<foo><![CDATA[<bar/>]]></foo>`. Thus the content of the <summary> tag is the `"<p class="quicksummary"><a href="http://earthquake.usgs..."`. If you want to split that into pieces you have to pass that string to another HTML or XML parser. It's like a box that, apart from other things, contains another box so after you've opened the outer box, you have to extract the inner box and open it as well. Jenda Enoch was right! Enjoy the last years of Rome.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: One more parsing ATOM question
by Jenda (Abbot) on Jun 14, 2013 at 23:55 UTC

In XML, these two are equivalent: <foo><bar/></foo> and <foo><![CDATA[<bar/>]]></foo>. Thus the content of the <summary> tag is the "<p class="quicksummary"><a href="http://earthquake.usgs...". If you want to split that into pieces you have to pass that string to another HTML or XML parser. It's like a box that, apart from other things, contains another box so after you've opened the outer box, you have to extract the inner box and open it as well.

Jenda
Enoch was right!
Enjoy the last years of Rome.

[reply]
[d/l]
[select]