comment on

Hello, everybody!

I've got a data-format (german banking, if anybody cares) that specifies a variable number (up to five) lines of 27 characters each. My upstream (norisbank) gives me this as a space, 27 characters, a space, 27 characters, etc. Well and good, and easy to split, right?

Not so much. My bank will also give me less then 27 characters, if the line ends with spaces, thereby making it impossible to split out correctly.

I've developed the following algorithm, which mostly works:

my @lines;
while (length $_) {
  my $p=substr($=, 0, 1, '');
  while ($p ne ' ') {
    print "Remaining text: $_\n";
    print "p='$p'";
    die "line to short, but no previous line?"
      unless length $lines[-1];
    print "Previous line: $lines[-1]\n";
    $_=substr($lines[-1], -1, 1, '');
    $p=substr($_, 0, 1, '');
  }
}
[download]

That is, if the remaining bit doesn't start with a space, steal the last character of the previous line.

Now, I'm sure there's a better way to implement this, proably with a regex that will just do it all for me, in a single line. I havn't been able to find it, however, which is no doubt because of my very poor regex-fu.

/ (.{0,27})/g, for some strange reason, seems to simply not match any chunks of less then 27 characters.

Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

In reply to splitting text into lines -- code -> regex by theorbtwo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.