jkeenan1 has asked for the wisdom of the Perl Monks concerning the following question:

For the first time in many years of writing POD, I need to place data in =begin ... =end blocks, then extract it for use in a separate program. I seek the advice of the monks on how to do this most expeditiously.

The files into which I need to place these blocks may already have some POD, but mostly do not. My immediate need is to be able to add chunks of JSON or YAML at various points in the file -- chunks which a special POD-parsing program will extract as a list of JSON/YAML chunks for further processing. Example:

Normal text paragraph. =begin specialdoc [Chunk of JSON or YAML] =end specialdoc More regular text. =begin specialdoc [2nd Chunk of JSON or YAML] =end specialdoc Still more regular text.
For my purpose, I only need to extract the JSON/YAML chunks. The regular POD will be handled by Pod::Text or Pod::Perldoc.

I would appreciate any pointers, including links to CPAN modules which extract data from =begin blocks.

Thank you very much.

Jim Keenan

Replies are listed 'Best First'.
Re: Pod: Need to parse from =begin ... =end blocks
by tobyink (Canon) on May 29, 2013 at 07:32 UTC

    The way I've done it is to subclass Pod::Simple::PullParser, and override get_token like this...

    use strict; use warnings; use Data::Dumper (); { package Local::MyPodParser; use base "Pod::Simple::PullParser"; sub get_token { my $self = shift; my $token = $self->SUPER::get_token(@_); # do something with the token print Data::Dumper::Dumper($token); return $token; } } my $parser = "Local::MyPodParser"->new; $parser->parse_file("somefile.pod");

    Take a look at TOBYINK::Pod::HTML::Helper which subclasses Pod::Simple::HTML (which is itself a subclass of Pod::Simple::PullParser) to process =for highlighter ... sections.

    The one annoyance you might notice is that Pod::Simple::PullParser has a _get_titled_section method that fast-forwards and rewinds the stream of tokens occasionally. The way I've coped with that is to wrap _get_titled_section, setting $self->{_in_get_titled_section} = 1 at the start and delete $self->{_in_get_titled_section} at the end, to act as an indicator of whether we're currently inside that method. Then within my get_token, I only do my processing when we're not inside _get_titled_section.

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
      Thanks, Toby. What follows is my first, admittedly crude adaptation of your code to my needs. Suggestions welcome.
      Jim Keenan

        Rather than our %begins, I'd suggest using %{$self->{_begins}}. Using a global hash would probably start to become annoying if you need to parse more than one pod document in a single run of the program.

        package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Pod: Need to parse from =begin ... =end blocks
by Anonymous Monk on May 29, 2013 at 03:50 UTC