nofernandes has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have the following code that provides me an array with, in this case, all the string starting with /* and ending with */ and all the strings starting with // and ending with \n!

my @com=grep defined, <F> =~ m{ ( /\* .*? \*/) | ( \/\/[^\n]*) | " (?: [^"\\]* | \\. )* " | ' (?: [^'\\]* | \\. )* ' | . [^/"']* }xgs; close(F);

This code works fine! But what i want is that instead of getting an array with the strings, i want to get an hash with a the line number of the file where the string was found as key and the string itself as the value!

I can do this easely comparing the returned array with the array of the file, but the problem is that with very large files comparing it makes the program to slow!!!

Does someone have suggestions?
Thanks in advance!
Nuno :)

Replies are listed 'Best First'.
Re: Regex "(un)Knowledge" (loop)
by tye (Sage) on Jul 15, 2003 at 18:02 UTC

    First, let us do something a bit easier. We'll use the character offset instead of the line number as key to the hash:

    my %hash; my $re= qr{ ( /\* .*? \*/) | ( \/\/[^\n]*) | " (?: [^"\\]* | \\. )* " | ' (?: [^'\\]* | \\. )* ' | . [^/"']* }xs; while( /$re/g ) { $hash{pos($_)}= $1; }
    Then there are several ways to convert character offsets into line numbers. If none of your patterns spanned lines, then I'd probably update the regex to match newlines separately so I could increment a line number count in the same loop. But /* */ can span lines so I think I'd instead do a merge-sort-ish thing similar to:
    my @nl; while( /\n/g ) { push @nl, pos($_); } my $ln= 1; while( /$re/g ) { $ln++ while $nl[$ln-1] < pos($_); $hash{$ln}= $1; }
    Except I think there is probably at least one off-by-one error in that code. For example, pos($_) might need to be replaced with something from @- or @+ in one or both of those places.

    I hope it gives you an idea where to start to get what you are looking for.

                    - tye

      But in this case how can i read a file!!?

      Considering that i must read all the content of the file at once in order to catch multiline comments!!

        I think you misunderstood something.

        You can/should read the file all at once with my approach (I already assumed you were doing this based on comments elsewhere in the thread and because, as you note above, you need to match multi-line comments).

        Just use your existing file-reading code. I only rewrote the code you provided so you only need to replace that part of your code (except I didn't bother to repeat the close statement). (update:) nor the file reading code (my mind simply blocked out the that part of your code).

                        - tye
Re: Regex "(un)Knowledge"
by BrowserUk (Patriarch) on Jul 15, 2003 at 17:26 UTC

    What version of Perl are you using? I ask, because I can't get your code to match anything more that one line?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      Adding the m option to the regex might help this - m{}xgsm, but I'm having the same problem...
      Any solution I've come up with so far won't match multi line comments.
      However, if the original doesn't either, and this isn't a requirement, then perhaps that's not a problem.

        I forgot to mention a little "big" detail! In order to this code works we need to read the file using undef $/!!

        Example:

        undef $/; open(F,"$file") || die "$!"; my @comentarios=grep defined, <F> =~ m{ ( /\* .*? \*/) | ( \/\/[^\n]*) | " (?: [^"\\]* | \\. )* " | ' (?: [^'\\]* | \\. )* ' | . [^/"']* }xgs; close(F);

        Thank you all once again!!

      My version is Perl 5.6.1.!!
      What does not work!
      How are you putting the code??
      Thanks..
Re: Regex "(un)Knowledge"
by nofernandes (Beadle) on Jul 15, 2003 at 19:06 UTC

    Hmm i see.. but i cannot make this run!! Iīm a little bit of newbie in the Perl Language!!!!!

    Can you explain me why does my code donīt work!??

    $file="Finger.java"; open(F,"$file"); undef $/; my %hash; my $re= qr{ ( /\* .*? \*/) | ( \/\/[^\n]*) | " (?: [^"\\]* | \\. )* " | ' (?: [^'\\]* | \\. )* ' | . [^/"']* }xs; while( /$re/g ) { $hash{pos($_)}= $1; } my @nl; while( /\n/g ) { push @nl, pos($_); } my $ln= 1; while( /$re/g ) { $ln++ while $nl[$ln-1] < pos($_); $hash{$ln}= $1; } @keys=sort {$a<=>$b} (keys %hash); foreach $key (@keys) { $value=$hash{$key}; $hash_ordenada{$key}=$value; print "Line: $key\t$value\n"; }

    Thank you very much!

    nofernandes!
      Can you explain me why does my code donīt work!??

      You! will! get! much! better! explanations! if! you! tell! us! what! don't work!?? means!!!!!!