in reply to Re: Joining separate data files to make one.
in thread Joining separate data files to make one.

Hi,

Thanks for this, I learned a lot.

Actually it took me nearly all day to figure it out. There are two typos ( [ should be { ). A reply after yours constructed some sample data files and I used them as input to your script. The only drawback with your script is that if one of the files ends before a later file, the "n/a" is not appended to the hash for the file before. Your script handles the situation where files start after the one before.

I at least "understand" your script, but I am having trouble with the other one. From your script I can see how to add varying fields from each of the files (ie 4 from gravity and magnetics, 3 from bathymetry). With the other script, I can't see how to vary the number of fields to read.

  • Comment on Re^2: Joining separate data files to make one.

Replies are listed 'Best First'.
Re^3: Joining separate data files to make one.
by BrowserUk (Patriarch) on Oct 07, 2010 at 10:33 UTC
    . There are two typos ( [ should be { ).

    Sorry. It was typed directly into the edit box and so was never tested. I apologise for that. I wanted to describe a viable alternative approach to the problem--and I find describing with code far more efficient and clear than using words. I was aware that it wasn't a complete working solution as posted.

    The only drawback with your script is that if one of the files ends before a later file, the "n/a" is not appended to the hash for the file before.

    I would handle that in the output loop. If when you come to write a record, it is "too short", pad it with the appropriate numbers of 'n/a's. Of course, as coded with concatenating strings, determining how much to add is a pain.

    You could split "\t" to get the field count, and the padding and the rejoin, but that would be a bit silly. Better to build up the records as (a hash of) arrays, pushing the fields as you go, and then just join them at the end. After padding if necessary.

    Something like:

    my %data; open FILE, '<', 'gravity' or die; while( <FILE> ) { my @fields = split ' ', $_; $data{ @fields[ 0, 1 ] } = \@fields; } close FILE; open FILE, '<', 'magnetics' or die; while( <FILE> ) { my @fields = split ' ', $_; ## Pad the hash if we didn't see this date/time in the gravity fil +e $data{ "@fields[ 0, 1 ]" } //= [ @fields[ 0,1 ], ('n/a') x 3 ]; push @{ $data{ "@fields[ 0, 1 ]" } }, @fields[ 2 .. $#fields ]; } close FILE; open FILE, '<', 'bathymetry' or die; while( <FILE> ) { my @fields = split ' ', $_; ## Pad the hash if we've never seen it before) ## (??? == No of fields added by the magnetics) $data{ "@fields[ 0, 1 ]" } //= [ @fields[ 0,1 ], ('n/a') x ( 3 + ? +?? ) ]; ## We saw it in gravity, but not magnetics. push @{ $data{ "@fields[ 0, 1 ]" } }, ('n/a') x ??? if @{ $data{ "@fields[ 0, 1 ]" } } < 3 + ???; push @{ $data{ "@fields[ 0, 1 ]" } }, @fields[ 2 .. $#fields ]; } close FILE; for my $key ( sort keys %data ) { my $nFields = @{ $data{ $key } }; ## Pad: ??? === total number of fields push @{ $data{ $key } }, ('n/a') x ( ??? - $nFields ); print join "\t", @{ $data{ $key } }; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hi, once again

      I love your work

      I spent a fair part of the day on this and got it to work just as I wanted. Your contribution was great. The biggest problem I had was extracting the elements from the final hash to output them to the final desired file.

      I think I proved that an infinite number of monkeys typing on an infinite number of typewriters will eventually end up typing the Bible.

      The thing that amazes me is that I could never come up with the solution you did, even reading all the texts and going to courses. Every time I have read about hashes or attended a course, we end up using Barney Rubble, Fred Flintstone, etc as examples and then extracting their surname or a Bedrock phone number.

      One last question. The construct //= I can see what it does, but I cannot find a reference? Presumably the // is a match against null? Is that correct. I asked a couple of people at work who count themselves as good Perl programmers and even they said that they had seen nothing like it.

      Many thanks for your help. You should see the dog's breakfast that it replaced

      Mike

        One last question. The construct //= I can see what it does, but I cannot find a reference?

        // is called defined-OR. It essentially says: If the expression to the left is not defined, then execute the expression to the right.

        Defined_OR_equals (my name for //=) says: if the variable to left is undefined, give it the value of the expression to the right. Great for initialing things when they may or may not have already been initialised.

        It is the newer, better version of || and ||= which were used for the same things, but suffered from the flaw that they would overwrite defined but false values, like 0.

        BTW: I'm with you on damning cutesy examples. They are next to useless. Actually, often worse than useless. Because you see them, they work without causing you to think too much. and leave you thinking you understand. It's not until you come to try and use the construct so demonstrated in a real-world situation, that you suddenly realise that you didn't learn a damn thing from the example.

        And worse, because the example didn't cause you to think about what the construct actually does, it leaves a gaping hole in your mental arsenal of solutions, that causes you to jump through convoluted hoops trying to solve problems another way.

        It's probably my biggest bugbear with the way Perl is used by many people. I came to realise very early on in my Perl journey, that every single built-in construct Perl provides, is there for a reason. To provide a ready, concise and efficient solution to a certain class of common problems.

        Those people that insist on eschewing certain subsets of the full language on the grounds of spurious ideological philosophies, forever commit themselves to reinventing those eschewed subset in ever more laborious and convoluted ways.

        Perl 5 ain't perfect by any means; but it's a damn sight closer to being complete than any other language I've used in anger. And I've used quite a few.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.