in reply to Re^2: Extracting specific data from fixed-width columns
in thread Extracting specific data from fixed-width columns

From the word choice in the original node, I understood that the OP knows the keys ("variables") and not necessarily the positions.

If the position are fixed and known, it could work this:

my @vars=$line=~/.{25}(.{15})/g;

or this, that keeps the keys interleaved with the values:

my @vars=$line=~/(.{25})(.{15})/g;

Not tested (I've not a perl available right now) but both of them should work.

Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Replies are listed 'Best First'.
Re^4: Extracting specific data from fixed-width columns
by BrowserUk (Patriarch) on Jul 04, 2008 at 08:15 UTC

    There are several other reasons for prefering unpack to a regex for fixed width data.

    1. Strings are typically left-justified and space padded. When used as keys in a hash, 'xxx' won't match 'xxx '.

      The A template will strip the right-padding on the fly.

      Numbers are typically right-justified. Perl will strip leading spaces the first time you use it in a numeric contexr.

    2. unpack is usually much faster than using a regex for this.

      For this application of picking out 6/12 fields from 125/250, it is roughly 10 times faster:

      #! perl -slw use strict; use Math::Random::MT qw[ rand ]; use Benchmark qw[ cmpthese ]; our $data = join '', map { sprintf '%-25s%15d', 'X' x int( rand 25 ) , int( rand 2**32 ) } 1 .. 125; ## extract 6 pairs at positions 3rd, 33rd, 50th, 75th, 100th 123rd my $pair = 'A25 A15'; our $tmpl = "x[($pair)2] $pair x[($pair)29] $pair x[($pair)16] $pair" . "x[($pair)24] $pair x[($pair)24] $pair x[($pair)22] $pair +"; cmpthese -3, { regex => q[ our $data; my @sixPair = ( $data =~ m[(.{25})(.{15})]g )[ 5, 6, 65, 66, 99, 100, 149, 150, 199, 200, 245,246 ]; ], unpack=> q[ our( $data, $tmpl ); my @sixPair = unpack $tmpl, $data ], }; cmpthese 1, { regex => q[ our $data; my @sixPair = ( $data =~ m[(.{25})(.{15})]g )[ 4,5, 64,65, 98,99, 148,149, 198,199, 244,245 ]; print 'regex ', join '|', @sixPair; ], unpack=> q[ our( $data, $tmpl ); my @sixPair = unpack $tmpl, $data; print 'unpack ', join '|', @sixPair; ], }; __END__ C:\test>junk4 Rate regex unpack regex 3311/s -- -91% unpack 38783/s 1071% -- regex XXXXXXXXXXXXXXXXXXXXXX | 189677339| XXXXXXX | 966124187| XXXXXXXXXXX | -1269554066| XXXXX | -1916129141| XXXXXXXXXXXXXXX | -479254076| XXXXXXXXXXXXXXXXX | 335028423 unpack XXXXXXXXXXXXXXXXXXXXXX| 189677339| XXXXXXX| 966124187| XXXXXXXXXXX| -1269554066| XXXXX| -1916129141| XXXXXXXXXXXXXXX| -479254076| XXXXXXXXXXXXXXXXX| 335028423

      Results wrapped manually for posting.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.