Re: Re: Re: Regex "(un)Knowledge"

I forgot to mention a little "big" detail! In order to this code works we need to read the file using undef $/!!

Example:

  undef $/;
  open(F,"$file") || die "$!";

  my @comentarios=grep defined, <F> =~ m{
     ( /\* .*? \*/) | ( \/\/[^\n]*)           
     | " (?: [^"\\]* | \\. )* "               
     | ' (?: [^'\\]* | \\. )* '               
     | . [^/"']*                              
   }xgs;
  close(F);
[download]

Thank you all once again!!

Comment on Re: Re: Re: Regex "(un)Knowledge" Download Code

Replies are listed 'Best First'.
Re: Re: Re: Re: Regex "(un)Knowledge" by BrowserUk (Patriarch) on Jul 15, 2003 at 18:30 UTC
So, you are slurping the whole file as a scalar and matching against that. The problem with that is, you have effectively told perl that there are no such things as lines, so $. isn't going to help. It will always be 1. The only thought that comes to mind is build an index to the relate newlines to byte positions. Then use @+ (See perlvar) to get the byte positions at which each element of your array of captures was found and look these up in the index to convert the byte positions back to line positions. Not nice, but it would work. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply]
Re: Re: Re: Re: Regex "(un)Knowledge" by jmanning2k (Pilgrim) on Jul 15, 2003 at 18:39 UTC
I guessed you were using undef $/. I found a working solution, though incomplete. I'm not sure what the last few lines of your regex do. I'm guessing it also finds all quoted strings in the text. You can modify this to work for that case too. my $match; my $line; while(<F>) { if(m{/\} .. m{\/}) { ## single line if(m{(/\.?\/)}) { $match = $1; $line = $.; $hash{$line} = $match; #print "Line $line: Got match '$match'\n"; } else { ## multi-line if( m{(/\.)} ) { ## Initial line '/' $match = $1 . "\n"; $line = $.; # record this line number } elsif( m{(.\/)} ) { ## Final line '/' $match .= $1; $line = $.; $hash{$line} = $match; #print "Line $line: Got match '$match'\n"; $match = undef; $line = undef; } else { # We are between lines, and have no / or / $match .= $_; } } } elsif ( m{(//.)\Z} ) { $match = $1; $line = $.; $hash{$line} = $match; #print "Line $line: Got match '$match'\n"; } } [download] So, this stores the starting line number and comment string in a hash. Hopefully this gives you an idea of how to process several lines. It's certainly not as nice as the quick and simple grep solution, but grep aggregates all the results together, so you can't get line numbers out of the results. I did see this in the output of perldoc -f grep 'grep returns aliases into the original list' so, perhaps you can somehow map back the results into the original array, but I have no idea how. My ideal solution would be some variant of the grep statement you have, perhaps with a map statement instead of grep, and some nice (linenumber, string) pair returned. Can't seem to find anything that works though. Hope this helps, ~J	[reply] [d/l]