Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, and thanks for taking your time. I am new to Perl, and am trying to write a search engine. Now that I got the output to print I would just like to sort it a little better. The output currently looks like:

NSUBJ

_nsubj_ of move is: ***** it *****

MATCH #1 Sent. 60 The type of body cavity an animal has strongly influences how --it-- can **move** .

_nsubj_ of move is: ***** animals *****

MATCH #1 Sent. 88

These --animals-- **move** slowly or not at all .

MATCH #2 Sent. 89

Bilateral symmetry is a common characteristic of --animals-- that **move** freely through their environments .

I want to print the _nsubj_ of move is: ***** animals ***** first because it has more Matches. HOW would you go about doing this?

Here is the print part of the code:

## EDIT: Now the EVEN number is gramfunc, ODD number is sentence my @allgramfunc; ## list of unique grammar func my @allmatches; ##use for headings and all matches (sentences) under t +hat heading as one scalar my @sortedallgramfunc; ## What order for capital heading? => alphabeti +cal my @sortedheadmatches; ## What order for dependency heading? => freque +ncy Depend on firstword? my @sortedfirstmatches; ## To keep order of sentences with headmatches my @sortedsecondmatches; ## To keep order of sentences with headmatche +s my %seenmatches = (); my %seens = (); #my @pluralfirstmatches = @firstmatches; #my @pluralsecondmatches = @secondmatches; #CREATE an array of all the grammar functions: for (my $j=0; $j <= @grammatches; $j++) { ## Could be normal $j++ if u +se another variable instead of @matches for both if ( defined( $grammatches[$j] )) { ## Just to avoid error message push (@allgramfunc, "$grammatches[$j]") unless ($seengramfunc{ + $grammatches[$j] }++); } } #SORT overheadings by alphabetical @sortedallgramfunc = sort { lc($a) cmp lc($b) } @allgramfunc; #PRINT all the sentences which are related to searchkey by the same gr +amfunc foreach my $sortedallgramfunc (@sortedallgramfunc) { print ("\n",uc $sortedallgramfunc,"\n\n");# Which gramfunc is bein +g shown? #@sortedheadmatches = sort @headmatches; for (my $l=0; $l <= @headmatches; $l++) { if (defined( $headmatches[$l] ) and $headmatches[$l] =~ /$sort +edallgramfunc/) { #2#$pluralfirstmatches[$l] =~ s/$firstmatches[$l]$pluralsu +ffix/$firstmatches[$l]/ig; unless ($seenmatches{ $headmatches[$l] }++) { print $headmatches[$l]; my $count = 1; for (my $m=0; $m <= @sentmatches; $m++) { if ( defined( $sentmatches[$m]) and $sentmatches[$ +m] =~ /\s\S\S$firstmatches[$l]\S\S\s/ and $sentmatches[$m] =~ /\s\S\S +$secondmatches[$l]\S\S\s/) { ##We know $l and $m are matching #5# Try sorting by creating array that includes he +ader and all sentences as a scalar, then by size, maybe join until hi +t _dobj sort length $a cmp length $b maybe "length first then alphabe +tical" => or $a cmp $b print "MATCH #$count $sentmatches[$m]"; # unle +ss $seens{ $sentmatches[$m] }++); $count++; } } } } } }
The way I thought to go about it is shown in the code comments, #5#..., Thanks again! This is my first time using this site, so if this format isn't good, let me know please!

Replies are listed 'Best First'.
Re: Search Engine Output needs sorting
by jwkrahn (Abbot) on May 28, 2011 at 06:30 UTC
    for (my $j=0; $j <= @grammatches; $j++) { ## Could be normal $j++ if u +se another variable instead of @matches for both ... for (my $l=0; $l <= @headmatches; $l++) { ... for (my $m=0; $m <= @sentmatches; $m++) {

    You have an off-by-one error on your for loops, you are trying to access one element past the end the arrays.    Those are usually written as:

    for my $j ( 0 .. $#grammatches ) { ## Could be normal $j++ if use anot +her variable instead of @matches for both ... for my $l ( 0 .. $#headmatches ) { ... for my $m ( 0 .. $#sentmatches ) {

      Thanks for the insightful tip! However I'm not sure what that changes, since it was working before. Did I over compensate using the 'defined' function (which I used when I saw the uninitialized error)? <\p>

      I haven't encountered the $# before, does @ not work? Thanks again!

        Using @array in scalar context gives the number of elements in the array whereas $#array gives the index of the last element.

        knoppix@Microknoppix:~$ perl -E ' > @array = ( 1 .. 5 ); > say qq{@array}; > say scalar @array; > say $#array;' 1 2 3 4 5 5 4 knoppix@Microknoppix:~$

        I hope this is helpful.

        Cheers,

        JohnGG

      Wait! Is that true for all for loops in Perl? You can never use the C-style method? I always have to use foreach or that .. operator instead?

        You can never use the C-style method?

        I would not say never use -- there are instances where it might make better sense*. However, for simple (i=0;i<max;i++)-type loops, Perl has a different method of specifying it: for my $i (0..$max-1).

        This is a question of style and understanding. Just as not knowing idiomatic expressions (or using them incorrectly) can tag someone as "not from 'round here", using the C-style for loop in the general case may cause some to question how well you know Perl -- right or wrong.

        On the other hand (and this may be heresy), if you are in a C shop, and your coding standards are written from a C point of view, I suppose that could be an argument for using the C-style for loop. Not necessarily a good argument, but an argument.

        * - perhaps a time that you would want to use a C-style loop is if the increment is something other than 1, or if the exit test is something other than a simple comparison. However, this could be possibly written better in a different construct (while, for example).

        --MidLifeXis

Re: Search Engine Output needs sorting
by Anonymous Monk on May 28, 2011 at 13:31 UTC

    Okay, I have solved the frequency sorting by changing the prints to push. Then getting an array and joining the headers and allsentences. Then sorting by length.:

    #5# .... push (@allsents, "MATCH #$count $sentmatches[$ +m]") unless $seens{ $sentmatches[$m] }++; $count++; } } push (@sepmatches, @allsents); ##$allmatches[0] is header, [1] is all sentences etc. +EVEN - header, ODD - sent ## Now join headers and sentences (0 and 1 etc.) foreach (@sepmatches) { ##Try Use natatime where n=2 $joinmatches = join ('', @sepmatches); } push (@allmatches, $joinmatches); @sortedallmatches = sort {length $b cmp length $a} @al +lmatches; } } } print @sortedallmatches; }

    The messy part (though it's all pretty messy) is that I had to call the @allmatches after the first for loop so it would not print every result every time. Then call @sepmatches after the second for loop

    Thanks for the help PM, I hope to join the ranks when I know a little more Perl!

      Improved to a Schwartzian type:
      print map { $_->{DATA} } sort { $b->{COUNT} <=> $a->{COUNT} || $a->{DATA} cmp $b->{DATA} } map { +{ COUNT => s/(MATCH\s#\d+)/$1/g, DATA => $_, } } @allmatches;