lunabelle22a has asked for the wisdom of the Perl Monks concerning the following question:

To anyone who can help, I have a data file with 6 fixed-width columns. The columns alternate between variables (width=25) and their corresponding values (width=15). There are a total of 125 variables. However, I only need to use 6 of these variables and their values for future plotting purposes. What should I use to locate these specific six variables and extract only them and their values from this rather large data file? Thanks for your help!
  • Comment on Extracting specific data from fixed-width columns

Replies are listed 'Best First'.
Re: Extracting specific data from fixed-width columns
by moritz (Cardinal) on Jul 03, 2008 at 18:11 UTC
    unpack is very good for extracting fixed-width data sets. There's a gentler introduction to unpack and pack in perlpacktut.
Re: Extracting specific data from fixed-width columns
by BrowserUk (Patriarch) on Jul 03, 2008 at 18:53 UTC

    Something like this should do it (Obviously, the comments should be removed before use!):

    #! perl -slw use strict; my $pair = 'A25 A15'; ## Assumes the 6 of 125 you need are the 3rd, 33rd, 50th, 75th, 100th, + and 123rd, my $tmplNeeded = join ' ', <<"EOT"; x[($pair)2] # skip 2 $pair # grab the 3rd x[($pair)29] # skip 29 $pair # grab the 33rd x[($pair)16] # skip 16 $pair # Grab the 50th x[($pair)24] # ... $pair x[($pair)24] $pair x[($pair)22] $pair EOT open FILE, '<', ...; while( <FILE> ) { my @required = unpack $tmplNeeded, $_; ## $required[ 0 ] is the 3rd field ## $required[ 1 ] is the 33rd field etc. }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Extracting specific data from fixed-width columns
by psini (Deacon) on Jul 03, 2008 at 18:12 UTC

    Say that you read a line in $line, then this code:

    my %vars=$line=~/(.{25})(.{15})/g;

    populates the hash %vars with the variable/value couples from $line.

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

      I don't remember where/when I saw the benchmark but this is surprisingly fast (fixed width regexes -- you'd probably want the '\A' in front too). I'm sorry I don't have tuits to do a new chart and can't find the old script now but I remember being shocked at how fast this was and that it was faster than most of the alternatives. Perhaps some kind Monk remembers what I saw or is feeling charitable with Benchmark.

        From the word choice in the original node, I understood that the OP knows the keys ("variables") and not necessarily the positions.

        If the position are fixed and known, it could work this:

        my @vars=$line=~/.{25}(.{15})/g;

        or this, that keeps the keys interleaved with the values:

        my @vars=$line=~/(.{25})(.{15})/g;

        Not tested (I've not a perl available right now) but both of them should work.

        Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: Extracting specific data from fixed-width columns
by jds17 (Pilgrim) on Jul 03, 2008 at 18:44 UTC
    From your description, I am not 100% sure about how your input data is structured, I assume all variables can appear in any one of the three key columns. A working solution would be as follows: (Sorry if it looks strange, I don't know how your data looks and have used your column widths of 25 and 15. I have just printed out the matches, of course you can e.g. save them in a hash or call a function immediately on them, depending on what you want to do with the data.)
    use strict; use warnings; #define a regexp matching the interesting variable names my $interesting_vars = qr(a111111111111111111111111|c222222222222222222222222); #sample input rows my @rows = ('a111111111111111111111111 1b8888888888888888 +888888888 15x222222222222222222222222 2', 'd999999999999999999999999 4b3333333333333333 +333333333 15c222222222222222222222222 123'); for (@rows) { #split by variable value pairs for (/.{40}/g) { #split variable and value /(.{25})(.{15})/; #since I am doing an additional match, I have to #save my submatches my $var = $1; my $val = $2; print "'$var' = '$val'\n" if $var =~ $interesting_vars; } }
    Output:
    'a111111111111111111111111' = ' 1' 'c222222222222222222222222' = ' 123'
Re: Extracting specific data from fixed-width columns
by Anonymous Monk on Jul 03, 2008 at 18:14 UTC