in reply to Re: Getting unique data from search.
in thread Getting unique data from search.

this still not working, any suggestions?
use File::Find; sub wanted { if( $_ =~ /\.html?$/) { my $name = $File::Find::name; open ( F, $name ) or die "$!: $name\n"; while($line = <F>) { if($line =~ /<title>(.+)<\/title>/i) { print "Title = $1\n"; } } close F; } } find( \&wanted, "/unixpath/webfiles" ); print "title = $1\n" unless $seen{$1}++;

Replies are listed 'Best First'.
Re: Re: Re: Getting unique data from search.
by jdporter (Paladin) on Nov 14, 2002 at 16:30 UTC
    You didn't follow his advice; he said "change the print statement." Not "add this at the bottom." The code should look like this:
    use strict; use File::Find; my %seen; sub wanted { if ( /\.html$/) { open F, "< $File::Find:name" or die "read $File::Find::name: $!\n"; local $_; while (<F>) { if ( /<title>(.*?)<\/title>/si ) { print "Title = $1\n" unless $seen{$1}++; } } close F; } } find( \&wanted, "/unixpath/webfiles" );

    jdporter
    ...porque es dificil estar guapo y blanco.

      thanks, but not sure what and how this works???
      print "Title = $1\n" unless $seen{$1}++;
      Can someone explain it to me?
        Sure. We have this little section:
        if ( /<title>(.*?)<\/title>/si ) { print "Title = $1\n" unless $seen{$1}++; }
        The first line uses a regex to find things enclosed by <title> tags. Because of the grouping parentheses, whatever is matched gets magically assigned to the special $1 variable. (Note, that's the numeral one, not a lower-case ell.) (If you had more paren groups, what they matched would be assigned to $2, $3, etc.)

        The stuff inside the if block is really just a fancy (compact) way of writing

        $seen{$1}++; if ( ! $seen{$1} ) { print "Title = $1\n"; }
        Now is it clearer, I hope?

        Footnote: Why the /s and /i modifiers on the regex? Well, the /i is so we can match <title>, <TITLE>, or any other variant of case. The /s is so the dot in the pattern can match linebreak characters, in case someone cleverly wrote something like

        <title>This is a very Long Title</title>

        jdporter
        ...porque es dificil estar guapo y blanco.