in reply to How to avoid addition of tags by HTML::TreeBuilder

$tree->ignore_unknown(0);; $tree->implicit_tags(0); $tree->no_expand_entities(1); $tree->ignore_unknown(0); $tree->ignore_ignorable_whitespace(0); $tree->no_space_compacting(1); $tree->store_comments(1); $tree->store_pis(1);

Replies are listed 'Best First'.
Re^2: How to avoid addition of tags by HTML::TreeBuilder
by phoenix007 (Sexton) on Apr 19, 2019 at 07:23 UTC

    Not Working : Tried by setting options provided by you

    Output after setting your options :

    <!DOCTYPE html> <html><head></head><body></body> <body> <p>test https://www.google.com</p> </body></html>

    Expected output : (Same as input)

    <!DOCTYPE html> <body> <p>test https://www.google.com</p> </body>

      The expected output is illegal HTML; in fact so is the Tree builder version. HTML5 requires the title. Getting tools to produce incorrect output is usually be outside their scope.

      If you always have the same template but differing bodies, you could just use the tree to print the body content into your template. Otherwise there might be a limited number of cases you could convert into a heuristic tree with matching template pieces to get what you want.

        Illegal html? Lol
      That's as good as it gets with Treebuilder