Killswit7ch8 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've tried to fiure this out on my own but hit a road block. I'm reading in an external file, then doind a search and replace. I am also using the following to break lines at 256 characters

use Text::Wrap qw(wrap $columns $huge);
$columns = 256;
$huge = "die";

Problem I'm having is if there is a tag like below

<test attr1="var" attr2="var" attr3="var" attr4="var" attr5="var" attr6="var" attr7="var">

It may get broken like so:

<test attr1="var" attr2="var" attr3="var"
attr4="var" attr5="var" attr6="var" attr7="var">

I don't want this to happen. How can I avoid this? The program breaks lines at 256 characters. If its in the middle of a tag I want it to break before or after and NOT in between. How do I do this?

Code below:

#!/usr/local/bin/perl require 5.000; use Env; use Cwd; use File::Basename; use Text::Wrap qw(wrap $columns $huge); $columns = 256; $huge = "die"; my $infile = $ARGV[0]; open(FILEREAD, "$infile.txt"); open(FILEWRITE, "> $infile.temp"); $i=1; while (<FILEREAD>) { chomp $_; $_ =~ s/<p>/\n<p>/ig; print FILEWRITE wrap("", "", $_), "\n"; $i++; } close FILEWRITE; close FILEREAD;

Replies are listed 'Best First'.
Re: tags being broken in the wrong places
by moritz (Cardinal) on Feb 22, 2011 at 15:22 UTC
    I don't want this to happen. How can I avoid this?

    Don't use a module that's meant for wrapping free-form text for wrapping XML.

    I seem to recall that XML::Twig has methods for pretty-printing XML, other XML modules probably do too.

      It has a command line tool (xml_pp) for pretty printing, but it's not a simple library call.
Re: tags being broken in the wrong places
by locked_user sundialsvc4 (Abbot) on Feb 22, 2011 at 15:25 UTC

    Writing reliable code like this can be annoyingly tricky.   Assuming that you cannot use, say, an HTML or XML parser (like XML::Simple), you might “reasonably assume” that no tag will exceed, say, 512 characters.   You write code that maintains a string-buffer of that (arbitrary, but longer than 256...) length.   By examining the first non-empty character, you decide if the next thing in the buffer is (either...) “a tag” or “not a tag.”   You either write out an appropriate chunk of the non-tag string, or the complete tag.   Then, you remove the characters from the buffer that you have just written out, and then you read enough new characters (if there be any...) to refill the 512-character buffer, and repeat.   (If you encounter a tag that exceeds the size of your buffer, well, you’re screwed, but “bits are cheap.”)

    Obviously, if you can simply slurp the whole file into memory ...

Re: tags being broken in the wrong places
by 7stud (Deacon) on Feb 22, 2011 at 22:38 UTC

    It sounds like you could use a SAX parser to do what you want. A SAX parser will present you with three things(or more if you want):

    • the opening tag
    • the text
    • the closing tag

    After you are presented with the opening or closing tag, you can decide whether the tag should be added to the current content waiting to be printed(which you store in a variable); or go ahead and print the current content, zero out the variable, and add the tag to the variable. If adding the tag to the empty variable makes it longer than the max length, then immediately print out the tag.

    See XML::SAX on cpan.