There are several other reasons for prefering unpack to a regex for fixed width data.

  1. Strings are typically left-justified and space padded. When used as keys in a hash, 'xxx' won't match 'xxx '.

    The A template will strip the right-padding on the fly.

    Numbers are typically right-justified. Perl will strip leading spaces the first time you use it in a numeric contexr.

  2. unpack is usually much faster than using a regex for this.

    For this application of picking out 6/12 fields from 125/250, it is roughly 10 times faster:

    #! perl -slw use strict; use Math::Random::MT qw[ rand ]; use Benchmark qw[ cmpthese ]; our $data = join '', map { sprintf '%-25s%15d', 'X' x int( rand 25 ) , int( rand 2**32 ) } 1 .. 125; ## extract 6 pairs at positions 3rd, 33rd, 50th, 75th, 100th 123rd my $pair = 'A25 A15'; our $tmpl = "x[($pair)2] $pair x[($pair)29] $pair x[($pair)16] $pair" . "x[($pair)24] $pair x[($pair)24] $pair x[($pair)22] $pair +"; cmpthese -3, { regex => q[ our $data; my @sixPair = ( $data =~ m[(.{25})(.{15})]g )[ 5, 6, 65, 66, 99, 100, 149, 150, 199, 200, 245,246 ]; ], unpack=> q[ our( $data, $tmpl ); my @sixPair = unpack $tmpl, $data ], }; cmpthese 1, { regex => q[ our $data; my @sixPair = ( $data =~ m[(.{25})(.{15})]g )[ 4,5, 64,65, 98,99, 148,149, 198,199, 244,245 ]; print 'regex ', join '|', @sixPair; ], unpack=> q[ our( $data, $tmpl ); my @sixPair = unpack $tmpl, $data; print 'unpack ', join '|', @sixPair; ], }; __END__ C:\test>junk4 Rate regex unpack regex 3311/s -- -91% unpack 38783/s 1071% -- regex XXXXXXXXXXXXXXXXXXXXXX | 189677339| XXXXXXX | 966124187| XXXXXXXXXXX | -1269554066| XXXXX | -1916129141| XXXXXXXXXXXXXXX | -479254076| XXXXXXXXXXXXXXXXX | 335028423 unpack XXXXXXXXXXXXXXXXXXXXXX| 189677339| XXXXXXX| 966124187| XXXXXXXXXXX| -1269554066| XXXXX| -1916129141| XXXXXXXXXXXXXXX| -479254076| XXXXXXXXXXXXXXXXX| 335028423

    Results wrapped manually for posting.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^4: Extracting specific data from fixed-width columns by BrowserUk
in thread Extracting specific data from fixed-width columns by lunabelle22a

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.