accessing specific data in a file

phoenixQueen has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: accessing specific data in a file by njcodewarrior (Pilgrim) on Mar 25, 2006 at 15:45 UTC
You're question is a little vague, but I think I understand what your asking. I'm assuming these are latitude/longitude pairs. If your data file is called 'test.vrml' and has the following contents: `coord Coordinate { 12 23 45, 14 16 65 } coord Coordinate { 13 34 57, 14 17 55 } coord Coordinate { 14 33 34, 14 15 40 }` [download] Then you could use the following to pull out the coordinates: `#! /usr/bin/perl use strict; use warnings; my $filename = './test.vrml'; open my $FH, '<', $filename or die "Can't open $filename for reading: +$!"; my @data = <$FH>; # Read the contents of the file into an array close $FH; my @coordinates; foreach my $line ( @data ) { if ( my ($lat, $lon) = ( $line =~ m/\{\s(.), (.)\s\}/) ) { push @coordinates, "$1 $2"; } } print "COORDINATES: $_\n" foreach ( @coordinates );` [download] This output the following: `COORDINATES: 12 23 45 14 16 65 COORDINATES: 13 34 57 14 17 55 COORDINATES: 14 33 34 14 15 40` [download] Which could then be written to the output file 'test.cpp' like this: `my $output_filename = './test.cpp'; open $OUTPUT, '>', $output_filename or die "Can't open $output_filenam +e for writing: $!"; foreach ( @data ) { print { $OUTPUT } "$_\n"; # Print to file } close $OUTPUT;` [download] --njcodewarrior	[reply] [d/l] [select]
Re: accessing specific data in a file by johngg (Canon) on Mar 25, 2006 at 23:36 UTC
... I read the file into an array @lines then did $lines = "@lines" to get a big string Two things spring to mind about the method you have described here to get your file data into a long string. Firstly, you can achieve the same result in one fell swoop. By unsetting the `$/` variable (default input record separator, a newline on *nix) the read consumes the whole file in one go. Make the change local to a small code block to avoid affecting other i/o. `my $lines; { local $/ = undef; $lines = <INPUT>; }` [download] Secondly, there is a potential flaw in your `$lines = "@lines";` because interpolating a list in double quotes puts a space character between each element whereas not in quotes doesn't. E.g. `@names = ("bill", "fred", "joe"); print "@names\n"; print @names, "\n";` [download] produces `bill fred joe billfredjoe` [download] For your problem you should have done `$lines = @lines;` to avoid introducing spaces into the string that weren't in the file. Cheers, JohnGG	[reply] [d/l] [select]
Re: accessing specific data in a file by davidrw (Prior) on Mar 25, 2006 at 15:29 UTC
can you give fuller examples (use `<readmore></readmore>` tags if necessary) of the input data and desired output?	[reply] [d/l]
Re: accessing specific data in a file by graff (Chancellor) on Mar 26, 2006 at 01:18 UTC
If the distribution of white-space (including new-lines) is as chaotic as your sample suggests, something like this might be useful: slurp the whole file into a single scalar, as suggested in an earlier reply, then do a while loop over a regex match: my $indata; open( I, "<", "vrml.file" ) or die "vrml.file: $!"; { local $/; $indata = <>; } while ( $indata =~ /\[ # match open sq-bracket ((?:\s\d+){3}) # match 3 numerics , # match comma ((?:\s\d+){3}) # match 3 more numerics \] # match close sq-bracket /gx ) { my ( $x, $y ) = ( $1, $2 ); s/^\s+// for ( $x, $y ); my @x = split " ", $x; my @y = split " ", $y; # @x and @y each contain 3 numeric values my $axis = "x"; for my $aref ( \@x, \@y ) { print scalar @$aref, " elements in $axis array: @$aref\n"; $axis++; } } [download] The "x" modifier on the regex allows me to break up and comment the regex components for legibility. The "g" modifier will repeat the match throughout the data, and for every segment of data in the string (i.e. in the content of the file as a whole), I get one iteration of the while loop, with $1 and $2 set to the first and second sets of three digits.	[reply] [d/l]
Re^2: accessing specific data in a file by phoenixQueen (Initiate) on Mar 26, 2006 at 14:14 UTC
Hi Guys, thanks heaps for all your replies. that helped me understand more of reg exp in perl, but I'm not quite there yet :( Graff, you're right in saying the sample has a lot of whitespaces. I copied and pasted the format as is. so the while loop looks like it's the way to go. the Coordinate field in vrml has thousands of those points where each set of 3 represents (x,y,z) coordinates. A snippet of the code looks like geometry IndexedFaceSet { coord Coordinate { point [ 265.185 -166.225 -510.375, 264.529 -166.901 -513.43, 269.321 -166.425 -510.918, 271.021 -166.956 -513.279, 271.223 -167.21 -514.637, 272.77 -166.984 -514.019, 270.767 -167.555 -516.859, 268.668 -167.344 -515.143, 272.884 -167.285 -515.945, 266.267 -167.539 -516.835, 267.193 -167.766 -518.544, 272.686 -167.438 -517.418, 269.214 -167.83 -519.399, 274.761 -166.996 -515.928, 275.801 -166.946 -519.035, 264.372 -167.451 -517.076, 266.21 -167.785 -519.148, 263.801 -167.367 -516.587, 269.266 -167.919 -521.463, 271.322 -167.821 -522.477, 266.656 -168.197 -528.597, 269.644 -168.342 -527.906, 267.007 -167.981 -524.961, 264.07 -167.493 -517.639, 263.244 -167.65 -520.707, 267.726 -167.91 -521.09, 264.468 -167.739 -523.493] } [download] This would give the coordinates for points defining one block. The model has a number (around 30) of blocks within it, each with a Coordinate field. and I need to read all the points from all the blocks. I tried playing around with the few suggestions provided, but still does not have to right reg exp to match the format I am seeking. what does colon (:) mean in ((?:\s\d+){3}) I tried something like `while ( $array =~ /\[ #open sq bracket ((?:\s\d+){3}) #match 3 numerics , ((?:\s*\d+){3}) \] /gx) { print "Grr: get inside this loop"; }` [download] on a test sample where there were only 2 sets of (x,y,z) coordinates within sq brackets. the print statement is not getting executed. thanks again for your precious help!!:) Phoenix.	[reply] [d/l] [select]
Re^3: accessing specific data in a file by graff (Chancellor) on Mar 26, 2006 at 19:20 UTC
Okay -- the data sample in the OP had only integer values, so I didn't try to accommodate decimal points or minus signs. To handle that, just expand a bit on "\d" in the regex: `/\[ #open sq bracket ((?:\s[-.\d]+){3}) #match 3 numerics, incl. decimal point and/o +r minus sign , ((?:\s[-.\d]+){3}) \] /gx` [download] The square brackets (not escaped by backslash) define a character class, consisting of the dash, the period and any digit. Note that the dash needs to be first or last inside the brackets in order to be treated as a literal dash; if it were in the middle between to other characters, it would define a range of characters (e.g. <code> a-z </code which matches all lower-case letters). You can definitely benefit from reading the perlre and perlretut man pages. Please do that. update: Actually, looking at your more detailed data sample, it looks like the regex given above won't work -- there are a lot more sets of numerics between a given pair of square brackets. I'm afraid I don't understand how that many sets of values can constitute a single "Coordinate". If you need more help, you first need to be more clear about how the data are supposed to be interpreted, and what you really intend to do with the sets of values that are given to you that way.	[reply] [d/l]
Re: accessing specific data in a file by TedPride (Priest) on Mar 26, 2006 at 20:31 UTC
Simpler is usually better, especially if you're just running a script for personal use and don't care much about efficiency. use strict; use warnings; use Data::Dumper; my (%coords, @keys, @p, $data, $key, $points, $i); $data = join '', <DATA>; while ($data =~ /geometry (\w+).?point \[(.?)\]/gs) { $key = $1; $points = $2; @p = $points =~ /(-?\d+\.\d+)/g; my @points; for ($i = 0; $i < $#p; $i += 3) { push @points, [$p[$i], $p[$i+1], $p[$i+2]]; } $coords{$key} = \@points; push @keys, $key; } print Dumper(\@keys, \%coords); __DATA__ geometry IndexedFaceSet { coord Coordinate { point [ 265.185 -166.225 -510.375, 264.529 -166.901 -513.43, 269.321 -166.425 -510.918, 271.021 -166.956 -513.279, 271.223 -167.21 -514.637, 272.77 -166.984 -514.019, 270.767 -167.555 -516.859, 268.668 -167.344 -515.143, 272.884 -167.285 -515.945, 266.267 -167.539 -516.835, 267.193 -167.766 -518.544, 272.686 -167.438 -517.418, 269.214 -167.83 -519.399, 274.761 -166.996 -515.928, 275.801 -166.946 -519.035, 264.372 -167.451 -517.076, 266.21 -167.785 -519.148, 263.801 -167.367 -516.587, 269.266 -167.919 -521.463, 271.322 -167.821 -522.477, 266.656 -168.197 -528.597, 269.644 -168.342 -527.906, 267.007 -167.981 -524.961, 264.07 -167.493 -517.639, 263.244 -167.65 -520.707, 267.726 -167.91 -521.09, 264.468 -167.739 -523.493] } [download]	[reply] [d/l]
Re^2: accessing specific data in a file (Working script) by phoenixQueen (Initiate) on Mar 31, 2006 at 05:23 UTC
Hi Guys, thanks so much for your help. I wouldn't have been able to move to the next step within deadline without. I'm sure what I wrote is not the best way to get the job done, but it got it done, and I can move on to code my model in cpp. ;) hehe actually I got addicted to regexp in perl and was playing around for a while. then time issue cropped up and I just got functional perl working, and stopped trying to do funky perl. well, if it could help anyone in the future at all, here's the script I used: Read more... (8 kB)	[reply] [d/l] [select]