meisterperl has asked for the wisdom of the Perl Monks concerning the following question:

is there a way in a regular expression to define a fixed width position for several columns which are read in from a flat text file? Something like {23,15},{28,10} (Start reading from position 23, read 15 characters, continue on position 28, read 10 chr, continue, etc..) and drop those values into the $1, $2, etc variables? Thanks!

Replies are listed 'Best First'.
Re: reading in fixed width
by BrowserUk (Patriarch) on Feb 27, 2005 at 19:19 UTC

    23+15=38 so "continuing on to position 28" doesn't make a lot of sense. Assuming that you meant (23,15) & (48,10), you could us unpack:

    ($1,$2) = unpack 'x23 A15 x10 a10', $var;

    If you insist on using a regex, then with the same change of spec as above, you could use:

    $var =~ m[^.{23}(.{15}}.{10}(.{10})];

    Or if you really did mean overlapping bits then you could use substr

    $1 = substr $var, 23, 15; $2 = substr $var, 28, 10;
    or unpack
    ($1,$2) = unpack 'x23 a15 X10 a10', $var;

    Or a regex:

    $var =~ m[(?=.{23}(.{15}))(?=.{28}(.{10}))];

    Of course, using named vars rather than $1 & $2 would make more sense :)


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: reading in fixed width
by NetWallah (Canon) on Feb 27, 2005 at 19:05 UTC
    For Fixed witdths, you are probably better off using substr or unpack , rather than regex.

        ..."I don't know what the facts are but somebody's certainly going to sit down with him and find out what he knows that they may not know, and make sure he knows what they know that he may not know, and that's a good thing. I think it's a very constructive exchange," --Donald Rumsfeld

Re: reading in fixed width
by ysth (Canon) on Feb 27, 2005 at 19:35 UTC
    If you truly mean to have overlapping fields, I don't think unpack would work, but substr or a lookahead regex would:

    substr: Note that second parameter is 0-based. A warning will be given if the start position is beyond (but not at!) the end of the string; the result will be shorter than the requested length if $in isn't long enough.

    $one = substr($in, 23, 15); $two = substr($in, 28, 10);
    Beware! If you use substr in an lvalue context, the warning gets promoted to an error:
    $ perl -we'sub foo { print $_[0] } eval { foo(substr "abc", 4, 1); 1} +or die "croak: $@"' croak: substr outside of string at -e line 1.
    regex: use one (?=) anchored at the beginning per field. The offsets are still 0-based.
    $in =~ /^ (?=.{23}(.{15})) # field one (?=.{28}(.{10})) # field two /xs or warn "bad input: $in ";
    If columns may be shorter, use .{0,15} and .{0,10} or similar. If a starting column is beyond the end of the string, the regex will fail.
      You can do overlapping fields with unpack.
      my $data = "123"; my @data = unpack "A2X1A2", $data; print "@data";
      This prints
      12 23


      holli, /regexed monk/
Re: reading in fixed width
by sh1tn (Priest) on Feb 27, 2005 at 19:15 UTC
    @_ = a..z; $_ = join '', @_; / .{5} # 5 symbols neglected (.{3}) # next 3 captured .{5} # another 5 missed (.{3}) # and again 3 captured /x; print "$1 $2 \n" __END__ STDOUT: fgh nop where: 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 10 j 11 k 12 l 13 m 14 n 15 o 16 p 17 q 18 r 19 s 20 t 21 u 22 v 23 w 24 x 25 y 26 z