Segmented text processing (with POD?)

FoxtrotUniform has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I want to process a mostly-plain text document (one of the atxt documents semi-described in Automating a Static Website) in separate segments, doing slightly different things to each: one segment might be "Usenet-format" text, which I want to mark up with HTML::FromText; the next might be Perl code, which I want to markup as HTML with perltidy; the next might be more text, and so on. So I want to go through all of these segments, processing them appropriately, and end up with a large combined output (which I'll write to an html file or something).

One of my goals is to minimise the amount of explicit markup in the source file (which is why I'm not using XML). It strikes me that POD might be useful, if not directly, then as a base for the format I end up with.

So my questions:

Are there any modules on CPAN that might be useful for this task? I've checked a couple of times, but if I was omniscient I'd be at the racetrack, not here. :-)
Is POD suitable for this task? I don't really need a complex (heirarchical) markup language for this -- although I won't complain if I end up with one -- but POD is nice and clean, and has plenty of module support.

-- The hell with paco, vote for Erudil! :wq

Comment on Segmented text processing (with POD?)

Replies are listed 'Best First'.
Re: Segmented text processing (with POD?) by Corion (Patriarch) on Jul 14, 2002 at 21:02 UTC
Two modules that might do something in the general direction of your wishes (formatting text without special directives or with simple directives) are HTML::FromText and the various Wiki text formatters (have a look at WikiFormat for example). A Wiki formatter in Perl could be Text::WikiFormat by chromatic. I've always liked the way Wikis format their text, but I don't know if the results are good enough for your needs. Matts also has some Wiki modules on the CPAN. My guess is that you'll have to take (at least) one of these modules, and rip out the HTML generating part and replace it with a generating part of your liking (or directly turn to Matts SAX drivers for his Wiki stuff). `perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web` [download]	[reply] [d/l]
Re(2): Segmented text processing (with POD?) by FoxtrotUniform (Prior) on Jul 14, 2002 at 21:09 UTC
I'm already using (a patched version of) HTML::FromText for formatting plain text; sadly, it doesn't DTRT with other content, like Perl code. The Wiki formatters look cool, though; thanks for the pointer. HTML::FromText takes a very free-form approach to marking up text; I'm looking for something just a bit more structured. Perhaps the Wiki formatters will help me out. `-- The hell with paco, vote for Erudil! :wq`	[reply]
Re: Segmented text processing (with POD?) by domm (Chaplain) on Jul 15, 2002 at 08:06 UTC
At least for the POD part, POD::POM is quite powerfull. Stas Backman used it for DocSet (the name might change..) which is used to create the new mod_perl Site. And I wrote a very simple POD2slides translator for doing presentations with HTML (see here (it's in german, look at the links at the end of the page..) `-- #!/usr/bin/perl for(ref bless{},just'another'perl'hacker){s-:+-$"-g&&print$_.$/}` [download]	[reply] [d/l]
Re: Segmented text processing (with POD?) by FoxtrotUniform (Prior) on Jul 14, 2002 at 21:17 UTC
Clarification: I'm writing content for a website in mostly-plaintext files. The first paragraph is headers (think email), containing metadata. The rest is content. Right now, all of the content goes through HTML::FromText to turn it into nice pretty web-friendly markup. Unfortunately, it's difficult to do things like post Perl code sippets with this method, because they end up getting parsed like they're plain text by HTML::FromText. (Yes, there's a document-wide switch to `text2html` to treat indented paragraphs as code; 95% of the time, I don't want that.) What I want is a nice way to say "this chunk of text is plaintext, and should be marked up by HTML::FromText; this chunk is Perl, and should be marked up by perltidy; this chunk is C, and should be marked up by `<foo>`", and so on. I want to do this in as transparent a manner as possible; if I wanted to write my content in some heavyweight markup language, I'd use HTML and skip the middleman. I'm thinking that POD might be a useful way of doing this: I could use one of the many POD modules to chunk my input, and POD has a nice, minimal syntax. Update: Looks like POD's the way to go. Thanks Aristotle and domm! `-- The hell with paco, vote for Erudil! :wq`	[reply] [d/l]
Re^2: Segmented text processing (with POD?) by Aristotle (Chancellor) on Jul 14, 2002 at 21:40 UTC
POD allows this with the `for` (or `begin`) directives; something like `=for email Usenet style stuff here =end email =for perltidy some_code_here(); =end perltidy` [download] See perlpod. I haven't fiddled with the Pod::Parser or siblings to know how much effort it would be to actually implement this though. Makeshifts last the longest.	[reply] [d/l]