in reply to Re: Re: Getting unique data from search.
in thread Getting unique data from search.

You didn't follow his advice; he said "change the print statement." Not "add this at the bottom." The code should look like this:
use strict; use File::Find; my %seen; sub wanted { if ( /\.html$/) { open F, "< $File::Find:name" or die "read $File::Find::name: $!\n"; local $_; while (<F>) { if ( /<title>(.*?)<\/title>/si ) { print "Title = $1\n" unless $seen{$1}++; } } close F; } } find( \&wanted, "/unixpath/webfiles" );

jdporter
...porque es dificil estar guapo y blanco.

Replies are listed 'Best First'.
Re: Re: Re: Re: Getting unique data from search.
by Anonymous Monk on Nov 14, 2002 at 19:41 UTC
    thanks, but not sure what and how this works???
    print "Title = $1\n" unless $seen{$1}++;
    Can someone explain it to me?
      Sure. We have this little section:
      if ( /<title>(.*?)<\/title>/si ) { print "Title = $1\n" unless $seen{$1}++; }
      The first line uses a regex to find things enclosed by <title> tags. Because of the grouping parentheses, whatever is matched gets magically assigned to the special $1 variable. (Note, that's the numeral one, not a lower-case ell.) (If you had more paren groups, what they matched would be assigned to $2, $3, etc.)

      The stuff inside the if block is really just a fancy (compact) way of writing

      $seen{$1}++; if ( ! $seen{$1} ) { print "Title = $1\n"; }
      Now is it clearer, I hope?

      Footnote: Why the /s and /i modifiers on the regex? Well, the /i is so we can match <title>, <TITLE>, or any other variant of case. The /s is so the dot in the pattern can match linebreak characters, in case someone cleverly wrote something like

      <title>This is a very Long Title</title>

      jdporter
      ...porque es dificil estar guapo y blanco.