in reply to HTML module help please?

I think the code below will do what you want (semi-tested). But I'd have to booster pretty strongly against the perl solution. In this case you probably should do it with CSS. If the blockquote is truly a "quote," okay, but if you're only changing it for the formatting, you should format from without. Both solutions are below.

use strict; use HTML::TokeParser; use CGI qw( blockquote ); my $string = join '', <DATA>; my $string_ref = \$string; my $p = HTML::TokeParser->new( $string_ref ); my $html; my $last_end_tag; while ( my $token = $p->get_token ) { if ( $token->[0] =~ /[TCD]/ ) { $html .= $token->[1]; } elsif ( $token->[0] eq 'S' ) { if ( $token->[1] eq 'p' and $last_end_tag eq 'h4' ) { $html .= blockquote( $token->[2], $p->get_text('/p') ); $p->get_token(); # toss the </p> $last_end_tag = 'blockquote'; } else { $html .= $token->[-1]; } } elsif ( $token->[0] eq 'E' ) { # if it's a new blockquote, it's closed already $html .= $token->[-1] unless $token->[1] eq '/p' and $last_end_tag eq 'h4'; $last_end_tag = $token->[1]; } } print $html; __END__ <p>Stand alone p, or, aw, skip it.</p> <h4 class="title">This is an h4</h4> <p class="salad" id="taco">A first paragraph.</p> <p>A follower.</p> <p>Big finish. Or Finnish?</p>

CSS solution.

blockquote { /* format definition, whatever you want... */ margin:1em 1em 1em 2em; font-style:italic; font-size:105%; } h4 + p { /* format definition identical to your blockquote def */ }

update: fixed grammar flatulence + one more, sigh.

Replies are listed 'Best First'.
Re^2: HTML module help please?
by tachyon (Chancellor) on Sep 01, 2004 at 03:03 UTC

      Oh, of course, you're right. I've become so accustomed to well formed xhtml that I forget what the web really looks like :) Mine will fail on any unclosed paras. I tried fixing it but your approach is much more sound for it. One thing mine does that yours omits is retains the attribute tags for the original para. But that's easy to update.

      print '<blockquote>'; # becomes print CGI::start_blockquote( $attr );

        retains the attribute tags for the original para

        Ah but is that a feature or a bug ;-)

        cheers

        tachyon

        Thanks again, YM and tachyon.

        My P tags are all closed, and there are no attributes for me to worry about, but I really appreciate your efforts and your thoroughness.



        ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
        =~y~b-v~a-z~s; print
Re^2: HTML module help please?
by Cody Pendant (Prior) on Sep 01, 2004 at 02:59 UTC
    I'd have to booster pretty strongly against the perl solution. In this case you probably should do it with CSS.

    Thank you so much for the code.

    And, to reassure you, it's not about the formatting at all. I'm parsing HTML into XHTML and the original is a screenplay and I have a kind of rough schema where character names are H4, stage directions are P and speeches are BLOCKQUOTE, and ironically, both will probably look exactly the same in the browser.

    It might change later, but I've got to first achieve valid XHTML where the speeches are marked up as a distinct data type to the stage directions.



    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
    =~y~b-v~a-z~s; print