in reply to read a file and insert closing tags if not present

You are most likely looking for modules like HTML::Tidy, HTML::TreeBuilder or XML::Twig.

If you show us a small sample of the sort of data you have to deal with and the code you have tried we may be able to give more specific answers.


DWIM is Perl's answer to Gödel
  • Comment on Re: read a file and insert closing tags if not present

Replies are listed 'Best First'.
Re^2: read a file and insert closing tags if not present
by valavanp (Curate) on Mar 29, 2007 at 07:32 UTC
    Hi grandfather, This is the code which i tried.
    require HTML::TokeParser; $p = HTML::TokeParser->new("output.xml") || die "Can't open: $!"; $p->empty_element_tags(1); open(FH, "output.xml"); print FH $p; close FH;
    output.xml
    <greeting class="simple">Hello, world!
    The above file is a sample file which i tried to insert the closing tag for the greeting. Actually i have a file which contains 500 lines of text with tagging. for. example in that file i have a tag named <to> but it's not closed. I have to insert the closing tag. This is an example. Thanks for your suggestion.

      HTML::TreeBuilder handles that simple case:

      use strict; use warnings; use HTML::TreeBuilder; my $sgml = <<SGML; <greeting class="simple">Hello, world! SGML my $root = HTML::TreeBuilder->new (); $root->ignore_unknown (0); $root->parse ($sgml); print $root->guts (0)->as_XML ();

      Prints:

      <greeting class="simple">Hello, world!</greeting>

      although I'd not guarantee it will accept everything a real SGML document may contain.


      DWIM is Perl's answer to Gödel
        Hi grandfather, Your solution is fine. But when i give like this extra tags have been inserted. how can i avoid this.
        use strict; use warnings; use HTML::TreeBuilder; my $sgml = <<SGML; <html> <greeting class="simple">Hello, world!<head>heading</head> </html> SGML my $root = HTML::TreeBuilder->new (); $root->ignore_unknown (0); $root->parse ($sgml); print $root->guts (0)->as_XML ();
        Thanks for your suggestion

      You can guess sometimes, but there is no way of knowing where the right place for it is.

      in the example,<p> foo <p> bar, you can see where the </p>'s should go, because you can't nest p tags but if you have <span style="rly">Oh, rly<span style="ya">ya, rly there is no real way of knowing where the </span>'s should go, because they can legally be nested.

      You'll most likely have to write rules for how (and where) to end each tag, so that you don't mess the nesting of things (like finding your whole document in a <a href="foo"> or something)

      @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;