comment on

In Re: Wrap while ignoring certain sequences (from CB) I ended up with this code:

    my $len= 79;
    my $esc= '\e';
    my $eseq= qr[$esc[^a-zA-Z]*[a-zA-Z]];
    my $char= qr[(?:$eseq)*[^$esc\n]];
    my $nonsp= qr[(?:$eseq)*[^$esc\s]];

    s[(?:^|(?<=\s))((?:$char){1,$len}(?:$eseq)*)\s][$1\n]g;
    s[(?:^|(?<=\s))((?:$nonsp){$len}(?:$eseq)*)(?=[^$esc\s])][$1\n]g;
[download]

But it has two bugs (described later). What I think I should use is:

    s[(?:\G|^)((?:$char){1,$len}(?:$eseq)*)\s][$1\n]gm;
    #    ^^^^                                        ^
    #    vv
    s[(?:\G|(?<=\s))((?:$nonsp){$len}(?:$eseq)*)(?=[^$esc\s])][$1\n]g;
[download]

You see, the first substitution should only take place immediately after a newline. The ^ (and m option) take care of starting right after a newline that was already in the string. The \G should take care of starting right after a newline that was just inserted by a substitution (wait for it).

Note that you can't use ^ nor look-behind assertions to detect stuff that was inserted after the s///g started. This isn't documented (that I've seen), but I've tested it (and it makes sense).

\G should say "start where I left off last time" -- except double checking the documentation, it appears that this is only supported for m//g, not s///g. Is there a good reason for this that I'm missing? Update: Appears to be just a documentation issue (perhaps even an issue just with my reading of docs). See my reply below.

So the first substitution in the first block of code above could be better because it wastes time trying to split each short line starting at each whitespace in the line. The (?<=\s) works by matching the space that we just turned into a newline.

The second substitution in the first block of code is broken because a single word that spans more than two lines will only be wrapped once. I want to allow substitutions that start either right after I've wrapped a long word, or at the start of the next word. But without \G, I don't see any way to allow starting right after a substitution that doesn't also allow starting anywhere (which creates another bug as discussed on my previous node).

Anyone see a way I can work around this missing feature? Should I report this as a "bug"?

BTW, playing with this code is made easier by setting $len much lower and making $esc a regular character.

- tye

In reply to No \G for s///g ? by tye

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.