in reply to accessing specific data in a file

If the distribution of white-space (including new-lines) is as chaotic as your sample suggests, something like this might be useful: slurp the whole file into a single scalar, as suggested in an earlier reply, then do a while loop over a regex match:
my $indata; open( I, "<", "vrml.file" ) or die "vrml.file: $!"; { local $/; $indata = <>; } while ( $indata =~ /\[ # match open sq-bracket ((?:\s*\d+){3}) # match 3 numerics , # match comma ((?:\s*\d+){3}) # match 3 more numerics \] # match close sq-bracket /gx ) { my ( $x, $y ) = ( $1, $2 ); s/^\s+// for ( $x, $y ); my @x = split " ", $x; my @y = split " ", $y; # @x and @y each contain 3 numeric values my $axis = "x"; for my $aref ( \@x, \@y ) { print scalar @$aref, " elements in $axis array: @$aref\n"; $axis++; } }
The "x" modifier on the regex allows me to break up and comment the regex components for legibility. The "g" modifier will repeat the match throughout the data, and for every segment of data in the string (i.e. in the content of the file as a whole), I get one iteration of the while loop, with $1 and $2 set to the first and second sets of three digits.

Replies are listed 'Best First'.
Re^2: accessing specific data in a file
by phoenixQueen (Initiate) on Mar 26, 2006 at 14:14 UTC
    Hi Guys, thanks heaps for all your replies. that helped me understand more of reg exp in perl, but I'm not quite there yet :(
    Graff, you're right in saying the sample has a lot of whitespaces. I copied and pasted the format as is. so the while loop looks like it's the way to go. the Coordinate field in vrml has thousands of those points where each set of 3 represents (x,y,z) coordinates. A snippet of the code looks like
    geometry IndexedFaceSet { coord Coordinate { point [ 265.185 -166.225 -510.375, 264.529 -166.901 -513.43, 269.321 -166.425 -510.918, 271.021 -166.956 -513.279, 271.223 -167.21 -514.637, 272.77 -166.984 -514.019, 270.767 -167.555 -516.859, 268.668 -167.344 -515.143, 272.884 -167.285 -515.945, 266.267 -167.539 -516.835, 267.193 -167.766 -518.544, 272.686 -167.438 -517.418, 269.214 -167.83 -519.399, 274.761 -166.996 -515.928, 275.801 -166.946 -519.035, 264.372 -167.451 -517.076, 266.21 -167.785 -519.148, 263.801 -167.367 -516.587, 269.266 -167.919 -521.463, 271.322 -167.821 -522.477, 266.656 -168.197 -528.597, 269.644 -168.342 -527.906, 267.007 -167.981 -524.961, 264.07 -167.493 -517.639, 263.244 -167.65 -520.707, 267.726 -167.91 -521.09, 264.468 -167.739 -523.493] }
    This would give the coordinates for points defining one block.
    The model has a number (around 30) of blocks within it, each with a Coordinate field. and I need to read all the points from all the blocks. I tried playing around with the few suggestions provided, but still does not have to right reg exp to match the format I am seeking. what does colon (:) mean in ((?:\s*\d+){3}) I tried something like
    while ( $array =~ /\[ #open sq bracket ((?:\s*\d+){3}) #match 3 numerics , ((?:\s*\d+){3}) \] /gx) { print "Grr: get inside this loop"; }
    on a test sample where there were only 2 sets of (x,y,z) coordinates within sq brackets. the print statement is not getting executed. thanks again for your precious help!!:) Phoenix.
      Okay -- the data sample in the OP had only integer values, so I didn't try to accommodate decimal points or minus signs. To handle that, just expand a bit on "\d" in the regex:
      /\[ #open sq bracket ((?:\s*[-.\d]+){3}) #match 3 numerics, incl. decimal point and/o +r minus sign , ((?:\s*[-.\d]+){3}) \] /gx
      The square brackets (not escaped by backslash) define a character class, consisting of the dash, the period and any digit. Note that the dash needs to be first or last inside the brackets in order to be treated as a literal dash; if it were in the middle between to other characters, it would define a range of characters (e.g. <code> a-z </code which matches all lower-case letters).

      You can definitely benefit from reading the perlre and perlretut man pages. Please do that.

      update: Actually, looking at your more detailed data sample, it looks like the regex given above won't work -- there are a lot more sets of numerics between a given pair of square brackets. I'm afraid I don't understand how that many sets of values can constitute a single "Coordinate". If you need more help, you first need to be more clear about how the data are supposed to be interpreted, and what you really intend to do with the sets of values that are given to you that way.