Search Engine Output needs sorting

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, and thanks for taking your time. I am new to Perl, and am trying to write a search engine. Now that I got the output to print I would just like to sort it a little better. The output currently looks like:

NSUBJ

_nsubj_ of move is: ***** it *****

MATCH #1 Sent. 60 The type of body cavity an animal has strongly influences how --it-- can **move** .

_nsubj_ of move is: ***** animals *****

MATCH #1 Sent. 88

These --animals-- **move** slowly or not at all .

MATCH #2 Sent. 89

Bilateral symmetry is a common characteristic of --animals-- that **move** freely through their environments .

I want to print the _nsubj_ of move is: ***** animals ***** first because it has more Matches. HOW would you go about doing this?

Here is the print part of the code:

 ## EDIT: Now the EVEN number is gramfunc, ODD number is sentence
my @allgramfunc; ## list of unique grammar func
my @allmatches; ##use for headings and all matches (sentences) under t
+hat heading as one scalar


my @sortedallgramfunc; ## What order for capital heading? => alphabeti
+cal
my @sortedheadmatches; ## What order for dependency heading? => freque
+ncy Depend on firstword?
my @sortedfirstmatches; ## To keep order of sentences with headmatches
my @sortedsecondmatches; ## To keep order of sentences with headmatche
+s

my %seenmatches = ();
my %seens = ();
#my @pluralfirstmatches = @firstmatches;
#my @pluralsecondmatches = @secondmatches;
#CREATE an array of all the grammar functions:
for (my $j=0; $j <= @grammatches; $j++) { ## Could be normal $j++ if u
+se another variable instead of @matches for both
    if ( defined( $grammatches[$j] )) { ## Just to avoid error message
        push (@allgramfunc, "$grammatches[$j]") unless ($seengramfunc{
+ $grammatches[$j] }++);
    }
}
#SORT overheadings by alphabetical
@sortedallgramfunc = sort { lc($a) cmp lc($b) } @allgramfunc;
#PRINT all the sentences which are related to searchkey by the same gr
+amfunc
foreach my $sortedallgramfunc (@sortedallgramfunc) {
    print ("\n",uc $sortedallgramfunc,"\n\n");# Which gramfunc is bein
+g shown?
    
    #@sortedheadmatches = sort @headmatches;
    for (my $l=0; $l <= @headmatches; $l++) {
        if (defined( $headmatches[$l] ) and $headmatches[$l] =~ /$sort
+edallgramfunc/) {
            
            #2#$pluralfirstmatches[$l] =~ s/$firstmatches[$l]$pluralsu
+ffix/$firstmatches[$l]/ig;    
            unless ($seenmatches{ $headmatches[$l] }++) {
                print $headmatches[$l];
                my $count = 1;
                for (my $m=0; $m <= @sentmatches; $m++) {
                    if ( defined( $sentmatches[$m]) and $sentmatches[$
+m] =~ /\s\S\S$firstmatches[$l]\S\S\s/ and $sentmatches[$m] =~ /\s\S\S
+$secondmatches[$l]\S\S\s/) { ##We know $l and $m are matching
                    #5# Try sorting by creating array that includes he
+ader and all sentences as a scalar, then by size, maybe join until hi
+t _dobj sort length $a cmp length $b maybe "length first then alphabe
+tical" => or $a cmp $b
                        print "MATCH #$count $sentmatches[$m]"; # unle
+ss $seens{ $sentmatches[$m] }++);
                        $count++;
                    }
                }
            }
        }
    }
}
[download]

The way I thought to go about it is shown in the code comments, #5#..., Thanks again! This is my first time using this site, so if this format isn't good, let me know please!

Comment on Search Engine Output needs sorting Download Code

Replies are listed 'Best First'.
Re: Search Engine Output needs sorting by jwkrahn (Abbot) on May 28, 2011 at 06:30 UTC
`for (my $j=0; $j <= @grammatches; $j++) { ## Could be normal $j++ if u +se another variable instead of @matches for both ... for (my $l=0; $l <= @headmatches; $l++) { ... for (my $m=0; $m <= @sentmatches; $m++) {` [download] You have an off-by-one error on your for loops, you are trying to access one element past the end the arrays. Those are usually written as: `for my $j ( 0 .. $#grammatches ) { ## Could be normal $j++ if use anot +her variable instead of @matches for both ... for my $l ( 0 .. $#headmatches ) { ... for my $m ( 0 .. $#sentmatches ) {` [download]	[reply] [d/l] [select]
Re^2: Search Engine Output needs sorting by Anonymous Monk on May 28, 2011 at 13:04 UTC
Thanks for the insightful tip! However I'm not sure what that changes, since it was working before. Did I over compensate using the 'defined' function (which I used when I saw the uninitialized error)? <\p> I haven't encountered the $# before, does @ not work? Thanks again!	[reply]
Re^3: Search Engine Output needs sorting by johngg (Canon) on May 28, 2011 at 13:17 UTC
Using `@array` in scalar context gives the number of elements in the array whereas `$#array` gives the index of the last element. `knoppix@Microknoppix:~$ perl -E ' > @array = ( 1 .. 5 ); > say qq{@array}; > say scalar @array; > say $#array;' 1 2 3 4 5 5 4 knoppix@Microknoppix:~$` [download] I hope this is helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re^4: Search Engine Output needs sorting by Anonymous Monk on May 28, 2011 at 14:04 UTC
Re^4: Search Engine Output needs sorting by Anonymous Monk on May 28, 2011 at 14:12 UTC
Re^5: Search Engine Output needs sorting by johngg (Canon) on May 28, 2011 at 14:44 UTC
Re^2: Search Engine Output needs sorting by Anonymous Monk on May 28, 2011 at 18:09 UTC
Wait! Is that true for all for loops in Perl? You can never use the C-style method? I always have to use foreach or that .. operator instead?	[reply]
Re^3: Search Engine Output needs sorting by MidLifeXis (Monsignor) on May 31, 2011 at 13:28 UTC
You can never use the C-style method? I would not say never use -- there are instances where it might make better sense^. However, for simple `(i=0;i<max;i++)`-type loops, Perl has a different method of specifying it: `for my $i (0..$max-1)`. This is a question of style and understanding. Just as not knowing idiomatic expressions (or using them incorrectly) can tag someone as "not from 'round here", using the C-style for loop in the general case may cause some to question how well you know Perl -- right or wrong. On the other hand (and this may be heresy), if you are in a C shop, and your coding standards are written from a C point of view, I suppose that could be an argument for using the C-style for loop. Not necessarily a good argument, but an argument. ^ - perhaps a time that you would want to use a C-style loop is if the increment is something other than 1, or if the exit test is something other than a simple comparison. However, this could be possibly written better in a different construct (`while`, for example). --MidLifeXis	[reply] [d/l] [select]
Re: Search Engine Output needs sorting by Anonymous Monk on May 28, 2011 at 13:31 UTC
Okay, I have solved the frequency sorting by changing the prints to push. Then getting an array and joining the headers and allsentences. Then sorting by length.: `#5# .... push (@allsents, "MATCH #$count $sentmatches[$ +m]") unless $seens{ $sentmatches[$m] }++; $count++; } } push (@sepmatches, @allsents); ##$allmatches[0] is header, [1] is all sentences etc. +EVEN - header, ODD - sent ## Now join headers and sentences (0 and 1 etc.) foreach (@sepmatches) { ##Try Use natatime where n=2 $joinmatches = join ('', @sepmatches); } push (@allmatches, $joinmatches); @sortedallmatches = sort {length $b cmp length $a} @al +lmatches; } } } print @sortedallmatches; }` [download] The messy part (though it's all pretty messy) is that I had to call the @allmatches after the first for loop so it would not print every result every time. Then call @sepmatches after the second for loop Thanks for the help PM, I hope to join the ranks when I know a little more Perl!	[reply] [d/l]
Re^2: Search Engine Output needs sorting by Anonymous Monk on May 29, 2011 at 08:09 UTC
Improved to a Schwartzian type: `print map { $_->{DATA} } sort { $b->{COUNT} <=> $a->{COUNT} \|\| $a->{DATA} cmp $b->{DATA} } map { +{ COUNT => s/(MATCH\s#\d+)/$1/g, DATA => $_, } } @allmatches;` [download]	[reply] [d/l]