phoenixQueen has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,
I'm very new to Perl.
I am trying to use it to convert data in a vrml file and store it in a cpp file. I have coordinate values within brackets I want to access within the vrml file. this would look something like this:
coord Coordinate { point [ 12 23 45, 14 16 65] }
the file contains several instances of the above structure but with different values.

I know I need to do a regexp match I read the file into an array @lines then did $lines = "@lines" to get a big string then I know i should try a regexp match and tried $lines =~/{.*}/g but it doesn't seem to work.

Can someone please give me a clue as to what regexp I could use?

and once I have that regexp, I want to store those coordinate values in a variable to be able to write them to a cpp file. how can I do that?

thanks heaps for any help.

P.

Replies are listed 'Best First'.
Re: accessing specific data in a file
by njcodewarrior (Pilgrim) on Mar 25, 2006 at 15:45 UTC

    You're question is a little vague, but I think I understand what your asking.
    I'm assuming these are latitude/longitude pairs. If your data file is called 'test.vrml' and has the following contents:

    coord Coordinate { 12 23 45, 14 16 65 } coord Coordinate { 13 34 57, 14 17 55 } coord Coordinate { 14 33 34, 14 15 40 }

    Then you could use the following to pull out the coordinates:

    #! /usr/bin/perl use strict; use warnings; my $filename = './test.vrml'; open my $FH, '<', $filename or die "Can't open $filename for reading: +$!"; my @data = <$FH>; # Read the contents of the file into an array close $FH; my @coordinates; foreach my $line ( @data ) { if ( my ($lat, $lon) = ( $line =~ m/\{\s*(.*), (.*)\s*\}/) ) { push @coordinates, "$1 $2"; } } print "COORDINATES: $_\n" foreach ( @coordinates );

    This output the following:

    COORDINATES: 12 23 45 14 16 65 COORDINATES: 13 34 57 14 17 55 COORDINATES: 14 33 34 14 15 40

    Which could then be written to the output file 'test.cpp' like this:

    my $output_filename = './test.cpp'; open $OUTPUT, '>', $output_filename or die "Can't open $output_filenam +e for writing: $!"; foreach ( @data ) { print { $OUTPUT } "$_\n"; # Print to file } close $OUTPUT;
    --njcodewarrior
Re: accessing specific data in a file
by johngg (Canon) on Mar 25, 2006 at 23:36 UTC
    ... I read the file into an array @lines then did $lines = "@lines" to get a big string

    Two things spring to mind about the method you have described here to get your file data into a long string. Firstly, you can achieve the same result in one fell swoop. By unsetting the $/ variable (default input record separator, a newline on *nix) the read consumes the whole file in one go. Make the change local to a small code block to avoid affecting other i/o.

    my $lines; { local $/ = undef; $lines = <INPUT>; }

    Secondly, there is a potential flaw in your $lines = "@lines"; because interpolating a list in double quotes puts a space character between each element whereas not in quotes doesn't. E.g.

    @names = ("bill", "fred", "joe"); print "@names\n"; print @names, "\n";

    produces

    bill fred joe billfredjoe

    For your problem you should have done $lines = @lines; to avoid introducing spaces into the string that weren't in the file.

    Cheers,

    JohnGG

Re: accessing specific data in a file
by davidrw (Prior) on Mar 25, 2006 at 15:29 UTC
    can you give fuller examples (use <readmore></readmore> tags if necessary) of the input data and desired output?
Re: accessing specific data in a file
by graff (Chancellor) on Mar 26, 2006 at 01:18 UTC
    If the distribution of white-space (including new-lines) is as chaotic as your sample suggests, something like this might be useful: slurp the whole file into a single scalar, as suggested in an earlier reply, then do a while loop over a regex match:
    my $indata; open( I, "<", "vrml.file" ) or die "vrml.file: $!"; { local $/; $indata = <>; } while ( $indata =~ /\[ # match open sq-bracket ((?:\s*\d+){3}) # match 3 numerics , # match comma ((?:\s*\d+){3}) # match 3 more numerics \] # match close sq-bracket /gx ) { my ( $x, $y ) = ( $1, $2 ); s/^\s+// for ( $x, $y ); my @x = split " ", $x; my @y = split " ", $y; # @x and @y each contain 3 numeric values my $axis = "x"; for my $aref ( \@x, \@y ) { print scalar @$aref, " elements in $axis array: @$aref\n"; $axis++; } }
    The "x" modifier on the regex allows me to break up and comment the regex components for legibility. The "g" modifier will repeat the match throughout the data, and for every segment of data in the string (i.e. in the content of the file as a whole), I get one iteration of the while loop, with $1 and $2 set to the first and second sets of three digits.
      Hi Guys, thanks heaps for all your replies. that helped me understand more of reg exp in perl, but I'm not quite there yet :(
      Graff, you're right in saying the sample has a lot of whitespaces. I copied and pasted the format as is. so the while loop looks like it's the way to go. the Coordinate field in vrml has thousands of those points where each set of 3 represents (x,y,z) coordinates. A snippet of the code looks like
      geometry IndexedFaceSet { coord Coordinate { point [ 265.185 -166.225 -510.375, 264.529 -166.901 -513.43, 269.321 -166.425 -510.918, 271.021 -166.956 -513.279, 271.223 -167.21 -514.637, 272.77 -166.984 -514.019, 270.767 -167.555 -516.859, 268.668 -167.344 -515.143, 272.884 -167.285 -515.945, 266.267 -167.539 -516.835, 267.193 -167.766 -518.544, 272.686 -167.438 -517.418, 269.214 -167.83 -519.399, 274.761 -166.996 -515.928, 275.801 -166.946 -519.035, 264.372 -167.451 -517.076, 266.21 -167.785 -519.148, 263.801 -167.367 -516.587, 269.266 -167.919 -521.463, 271.322 -167.821 -522.477, 266.656 -168.197 -528.597, 269.644 -168.342 -527.906, 267.007 -167.981 -524.961, 264.07 -167.493 -517.639, 263.244 -167.65 -520.707, 267.726 -167.91 -521.09, 264.468 -167.739 -523.493] }
      This would give the coordinates for points defining one block.
      The model has a number (around 30) of blocks within it, each with a Coordinate field. and I need to read all the points from all the blocks. I tried playing around with the few suggestions provided, but still does not have to right reg exp to match the format I am seeking. what does colon (:) mean in ((?:\s*\d+){3}) I tried something like
      while ( $array =~ /\[ #open sq bracket ((?:\s*\d+){3}) #match 3 numerics , ((?:\s*\d+){3}) \] /gx) { print "Grr: get inside this loop"; }
      on a test sample where there were only 2 sets of (x,y,z) coordinates within sq brackets. the print statement is not getting executed. thanks again for your precious help!!:) Phoenix.
        Okay -- the data sample in the OP had only integer values, so I didn't try to accommodate decimal points or minus signs. To handle that, just expand a bit on "\d" in the regex:
        /\[ #open sq bracket ((?:\s*[-.\d]+){3}) #match 3 numerics, incl. decimal point and/o +r minus sign , ((?:\s*[-.\d]+){3}) \] /gx
        The square brackets (not escaped by backslash) define a character class, consisting of the dash, the period and any digit. Note that the dash needs to be first or last inside the brackets in order to be treated as a literal dash; if it were in the middle between to other characters, it would define a range of characters (e.g. <code> a-z </code which matches all lower-case letters).

        You can definitely benefit from reading the perlre and perlretut man pages. Please do that.

        update: Actually, looking at your more detailed data sample, it looks like the regex given above won't work -- there are a lot more sets of numerics between a given pair of square brackets. I'm afraid I don't understand how that many sets of values can constitute a single "Coordinate". If you need more help, you first need to be more clear about how the data are supposed to be interpreted, and what you really intend to do with the sets of values that are given to you that way.

Re: accessing specific data in a file
by TedPride (Priest) on Mar 26, 2006 at 20:31 UTC
    Simpler is usually better, especially if you're just running a script for personal use and don't care much about efficiency.
    use strict; use warnings; use Data::Dumper; my (%coords, @keys, @p, $data, $key, $points, $i); $data = join '', <DATA>; while ($data =~ /geometry (\w+).*?point \[(.*?)\]/gs) { $key = $1; $points = $2; @p = $points =~ /(-?\d+\.\d+)/g; my @points; for ($i = 0; $i < $#p; $i += 3) { push @points, [$p[$i], $p[$i+1], $p[$i+2]]; } $coords{$key} = \@points; push @keys, $key; } print Dumper(\@keys, \%coords); __DATA__ geometry IndexedFaceSet { coord Coordinate { point [ 265.185 -166.225 -510.375, 264.529 -166.901 -513.43, 269.321 -166.425 -510.918, 271.021 -166.956 -513.279, 271.223 -167.21 -514.637, 272.77 -166.984 -514.019, 270.767 -167.555 -516.859, 268.668 -167.344 -515.143, 272.884 -167.285 -515.945, 266.267 -167.539 -516.835, 267.193 -167.766 -518.544, 272.686 -167.438 -517.418, 269.214 -167.83 -519.399, 274.761 -166.996 -515.928, 275.801 -166.946 -519.035, 264.372 -167.451 -517.076, 266.21 -167.785 -519.148, 263.801 -167.367 -516.587, 269.266 -167.919 -521.463, 271.322 -167.821 -522.477, 266.656 -168.197 -528.597, 269.644 -168.342 -527.906, 267.007 -167.981 -524.961, 264.07 -167.493 -517.639, 263.244 -167.65 -520.707, 267.726 -167.91 -521.09, 264.468 -167.739 -523.493] }
      Hi Guys, thanks so much for your help. I wouldn't have been able to move to the next step within deadline without. I'm sure what I wrote is not the best way to get the job done, but it got it done, and I can move on to code my model in cpp. ;) hehe actually I got addicted to regexp in perl and was playing around for a while. then time issue cropped up and I just got functional perl working, and stopped trying to do funky perl. well, if it could help anyone in the future at all, here's the script I used: