comment on

Hello, I would like to count the frequency of certain keywords in the text file, which is sample.txt. For example, I determine a main word as "Steve Jobs" and "Executive," and I would like to count the frequency of "stock option" and "package" within 10 words from "Steve Jobs" and "Executive" for the sample text below. The result that I expected is 4. Sample text) Stock option is the most popular compensation policy in the world these days. Steve Jobs also received huge amount of stock options, and the stock option was exercised before the fiscal year. Different from his compensation package, the other executives received less amount of stock options. To get the result, I used the code below and used the command that "perl code.pl sample.txt "Steve Jobs" "Executive" 10 "stock option" "package" However, the error message occurs. The error message is "Use of uninitialized value $distance in numeric le <<=> at line..." Could you please give me some advice to get the result I want? I am attaching the sample text and the code that I used. In the sample text, there are three different articles and it is divided by "Document ". So, I expect to get the results for the three articles. I am looking forward to your responses. I hope you all have a great weekend! I really appreciate it in advance. PERL code)


use strict;
use warnings;

my ($filename, @mainword, $distance, @search) = @ARGV;

my $content;
open my $fh, '<', $filename or die $!;
local $/ = undef;
$content = <$fh>;
close $fh;

my @docs = split 'Document ', $content;
foreach my $doc ( @docs ) {

    my $count = 0;

    my $mainword = '(' . (join '|', map { "\Q$_\E" } @mainword) . ')';
    my $search = '(' . (join '|', map { "\Q$_\E" } @search) . ')';


    for (my $dist = 0; $dist <= $distance; $dist++) {
        while ( $doc =~ /
            (?:^|\W)                        
            $search                        
            (?=                           
                (?:\W++\w++){$dist}       
                \W++\Q$mainword\E         
            )
            /ixsg
        )
        {
            print " found [$1] at ", $-[1], "\n";

            $count++;
        }

        while ( $doc =~ /
            (?:^|\W)
            \Q$mainword\E
            (?=
                (?:\W++\w++){$dist}
                \W++$search
            )
            /ixsg
        )
        {
            print "-found [$1] at ", $-[1], "\n";
            $count++;
        }
    }

    print "match: $count\n";
}
[download]

In reply to Counting the keywords in the text file by moviesigh

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.