how to place rows information into column (2)

Dear Perl Monks,
My question is the continuation of old post "how to place rows information into column" answered by Ted, Holli & jZed, Thanks a lot for your help.
The sample file i posted earlier is different from actual one. I just wanted to get a hint from you guys and learn things to apply on my own but later when i tried to implement on my real file, i couldn't find a way.
Actually i am unable to create a matching pattern statement for my hash. Please help me here with the real file which looks like:

0.000000e+000 1.502947e+000
1.162272e-012 1.508957e+001
2.324544e-012 1.508948e+000
.....
0.000000e+000 1.502947e+001
1.162272e-012 1.508941e+001
2.324544e-012 1.508940e+000
....
0.000000e+000 1.503947e+000
1.162272e-012 1.504947e+000
2.324544e-012 1.508900e+001
..... so on

Its a 300MB file, the two values are tab seperated. The first column value is repeating after lets say 2000 lines. Their pair value varies (could be the same)

I would appreciate if you write a precise comments especially on matching statement

Thanks in advance.
Syed.

Comment on how to place rows information into column (2)

Replies are listed 'Best First'.
Re: how to place rows information into column (2) by jZed (Prior) on Feb 03, 2005 at 03:21 UTC
If your data is "tab delimited" and you have 300mb of it, you can use Text::CSV_XS, the fastest method of parsing "delimited" data (which is really "separated" data). Just start the module with sep_char="\t" and it will handle your data fine.	[reply]
Re: how to place rows information into column (2) by sh1tn (Priest) on Feb 02, 2005 at 21:00 UTC
`$file = '0.000000e+000 1.502947e+000 1.162272e-012 1.508957e+001 2.324544e-012 1.508948e+000'; my $regex = { 'first' => qr{^(\S+)}, # anything non-blank from the beginning '^' 'second' => qr{\s+(\S*)} # anything non-blank after the fist blank '\ +s+' }; for(split '\n', $file){ /$regex->{first}$regex->{second}/o # 'o' compiles the regex and # and upon success we take the first column and/or the second one print "Key: $1 and Value: $2\n"; # So hash structure can have $hash{$1} = $2 }` [download]	[reply] [d/l]
Re^2: how to place rows information into column (2) by riz (Initiate) on Feb 02, 2005 at 22:06 UTC
Hi sh1n, are we assigning these lines '0.000000e+000 1.502947e+000 1.162272e-012 1.508957e+001 2.324544e-012 1.508948e+000'; to the variable $file. With '...' in the middle means some 2000 more pairs in continuity. Kindly explain your very first line of the code. many thanks, riz.	[reply]
Re^3: how to place rows information into column (2) by sh1tn (Priest) on Feb 02, 2005 at 23:02 UTC
Hi riz, The very first line is scalar instead of file handler for simplicity. As fas as I understand more important is the way we match (or split) these two columns. Do you really think that another 2000 lines matter something? As conserns RAM or performance - it doesn't matter.	[reply]
Re: how to place rows information into column (2) by osunderdog (Deacon) on Feb 02, 2005 at 21:50 UTC
I wouldn't try to pattern match the scientific float... you could, but I think it's over-kill. If the data is tab delimited, then you can use that to distinguish the two values: use strict; while(<DATA>) { # Get rid of trailing \n on line. chomp; # divide the line into two items based on tab my ($x, $y) = split("\t"); print "X:[$x] Y: [$y]\n"; } ##OUTPUT: # X:[0.000000e+000] Y: [1.502947e+000] # X:[1.162272e-012] Y: [1.508957e+001] # X:[2.324544e-012] Y: [1.508948e+000] # X:[0.000000e+000] Y: [1.502947e+001] # X:[1.162272e-012] Y: [1.508941e+001] # X:[2.324544e-012] Y: [1.508940e+000] # X:[0.000000e+000] Y: [1.503947e+000] # X:[1.162272e-012] Y: [1.504947e+000] # X:[2.324544e-012] Y: [1.508900e+001] ##NOTE in data the two fields are tab delimited. __DATA__ 0.000000e+000 1.502947e+000 1.162272e-012 1.508957e+001 2.324544e-012 1.508948e+000 0.000000e+000 1.502947e+001 1.162272e-012 1.508941e+001 2.324544e-012 1.508940e+000 0.000000e+000 1.503947e+000 1.162272e-012 1.504947e+000 2.324544e-012 1.508900e+001 [download] *"Look, Shiny Things!"* is not a better business strategy than compatibility and reuse.	[reply] [d/l]
Re^2: how to place rows information into column (2) by sh1tn (Priest) on Feb 02, 2005 at 23:20 UTC
wouldn't try to pattern match the scientific float... you could, but I think it's over-kill. Do you really have any idea what does `/^(\S+)\s+(\S)/o` do? `/^(\S+)\s+(\S)/o and print FH "K: ", $1, "V: ", $2, "\n" while <DATA> +;` [download] where <DATA> contains over 22000 lines - less than a second on my old (less than 1800 Mh cpu) home machine.	[reply] [d/l] [select]
Re^3: how to place rows information into column (2) by osunderdog (Deacon) on Feb 02, 2005 at 23:35 UTC
`/^(\S+)\s+(\S)/o` Matches one or more non-whitespace characters, followed by one or more whitespace characters, followed by 0 or more non-whitespace characters. The `o` indicates 'compile pattern only once'. from `perldoc perlop` `If you want such a pattern to be compiled only once, add a "/o" after +the trailing delimiter. This avoids expensive run-time recompilation +s, and is useful when the value you are interpolating won’t change ov +er the life of the script.` [download] The parens capture those non-whitspace matches into variables `$1` and `$2` for use within the scope. "Look, Shiny Things!"* is not a better business strategy than compatibility and reuse.	[reply] [d/l] [select]