Re: splitting text into lines -- code -> regex

You need to be a little more specific. From my reading it seems that the 27 characters never have any spaces in. Is that right? If so then what you want is either split() or the magic variable $/

If spaces can be validly placed within the codes then I would suggest that you split on spaces and then reassemble your lines until you have enough information:

$/ = ' ';
$line = '';
while($chunk = <>)
  {
    if((length($line) + length($chunk)) > 27)
      {
        # Do the line processing
        $line = '';
      }
    $line .= $chunk;
  }
# Process the final line
[download]

(Code untested of course)

Comment on Re: splitting text into lines -- code -> regex Download Code

Replies are listed 'Best First'.
Re^2: splitting text into lines -- code -> regex by davido (Cardinal) on Jul 12, 2004 at 07:55 UTC
That might present a problem, as some spaces are actual delimiters, and others are going to be literal text. If you treat both the same way (split on plain spaces and reassemble all the "words"), you may end up splitting words that happened to span a 27-char limit. Dave	[reply]
Re^3: splitting text into lines -- code -> regex by hawtin (Prior) on Jul 12, 2004 at 21:43 UTC
That might present a problem, as some spaces are actual delimiters, and others are going to be literal text. If you treat both the same way (split on plain spaces and reassemble all the "words"), you may end up splitting words that happened to span a 27-char limit. The way the problem was stated it seemed to me that you can guarantee that each 'line' ends in a space. If that is correct then you know each 'line' is the concatenation of a number of 'chunks', that is no 'chunk' can belong to two different lines. Is that not correct? That is not, of course, to say that every space is the delimiter of a 'line' (otherwise it would be simple). My code did take into account the fact that spaces have two different meanings. While I did forget to allow for adding one for the extra space a working version is quite close to my original code: use strict; use warnings; $/ = ' '; my $line = ''; print "\|012345678901234567890123456\|\n"; while(my $chunk = <DATA>) { if((length($line) + length($chunk)) > 28) { # Remove the delimiter space and pad out chop($line); $line .= ' 'x(27-length($line)); # Do the line processing print "\|$line\|\n"; $line = ''; } $line .= $chunk; } # Process the final line chop($line); $line .= ' 'x(27-length($line)); print "\|$line\|\n"; __END__ 012345678-A1234567 INCL.EUR 3,31 MWST JULI MONATL. GEB HR T-DSL FLAT 0 +1.07.04-3 [download] Gives `\|012345678901234567890123456\| \|012345678-A1234567 INCL.EUR\| \|3,31 MWST JULI MONATL. GEB \| \|HR T-DSL FLAT 01.07.04-3 \|` [download] If however my assumption about spaces at the end of each line is wrong (for example if there could be words that are longer than 27 characters without a space) then a simple if statement will take care of that, something like: `while(my $chunk = <DATA>) { if(length($chunk) > 27) { # Process holdover line process_line($line) if($line); $line = ''; while($chunk =~ s/^(.{27})//) { process_line($1); } $line = $chunk; } elsif((length($line) + length($chunk)) > 28) {` [download] It is true that this is a more simplistic approach than using a "negative lookahead assertion", but there again I don't know how one of them works :-)	[reply] [d/l] [select]


"be consistent"
	PerlMonks