GHMON has asked for the wisdom of the Perl Monks concerning the following question:

Hi i want to count the number of any words in my file but the number is only plus all words !!!!!

#Hi Codder use warnings ; use strict ; #use DBI ; use utf8 ; use Encode ; my $numlinep = 0 ; my $traincountp = 0 ; my $pt = '/root/Positive.txt' ; my $ptt = '/root/positivetrain1.txt' ; my $pwt = '/root/Positive2.txt' ; my $ntw = '/root/Negative2.txt' ; open (my $in , "<:encoding(utf8)" , "$pt") or die "$pt: $!" ; while (my $line = <$in>) { $numlinep++ ; } close $in ; open ($in , "<:encoding(utf8)" , "$pt") or die "$pt: $!" ; while (my $linep = <$in>) { my $inp ; if ($traincountp <= (0.7*$numlinep)){ $traincountp++ ; open ($inp , ">>" , "$ptt") or die "$ptt: $!" ; print $inp $linep , "\n" ; } } close $in ; my $numlinet = 0 ; my $traincountn = 0 ; my $nt = '/root/Negative.txt' ; my $ntt = '/root/negativetrain1.txt' ; open (my $it , "<:encoding(utf8)" , "$nt") or die "$nt: $!" ; while (my $line = <$it>) { $numlinet++ ; } close $it ; open ($it , "<:encoding(utf8)" , "$nt") or die "$nt: $!" ; while (my $linen = <$it>) { my $itn ; if ($traincountn <= (0.7*$numlinet)){ $traincountn++ ; open ($itn , ">>" , "$ntt") or die "$ntt: $!" ; print $itn $linen , "\n" ; } } close $it ; my $numlinepw = 0 ; my %countp = () ; open (my $inw , "<:encoding(utf8)" , "$ptt") or die "$ptt: $!" ; open (my $inwp , "<:encoding(utf8)" , "$pwt") or die "$pwt: $!" ; while (<$inw>) { my @pwords ; my @ptw ; my $elementp ; my $countp ; @pwords = split (/\n/ , $inwp) ; push @ptw , @pwords ; foreach my $elementp (@ptw) { $countp{$elementp}++ ; } while ( ( my $kp , my $vp) = each %countp ) { open (my $hashp , ">>" , 'wordsbagp.txt') ; print "$vp => $kp" , "\n" ; print $hashp "$vp = $kp\n" ; #print "$kp => $vp\n" ; #print "$kp" , "\n" ; #print "$vp" , "\n" ; #print "$kp" , "\n" , "$vp" , "\n" ; } $numlinepw++ ; } #print "$numlinepw" , "\n" ; my $numlinenw = 0 ; my %countn = () ; open (my $itw , "<:encoding(utf8)" , "$ntt") or die "$ntt: $!" ; open (my $nwt , "<:encoding(utf8)" , "$ntw" ) or die "$ntw: $!" ; while (<$itw>) { $numlinenw++ ; my @nwords ; my @ntw ; my $elementn ; my $countn ; @nwords = split (/\n/ , $nwt) ; push @ntw , @nwords ; foreach my $elementn (@ntw) { $countn{$elementn}++ ; } while ( ( my $kn , my $vn ) = each %countn ) { open (my $hashn , '>>' , 'wordsbagn.txt') or die $! ; print "$vn => $kn" , "\n" ; print $hashn "$vn = $kn\n" ; #print "$kn => $vn\n" ; #print "$kn" , "\n" ; #print "$vn" , "\n" ; #print "$kn" , "\n" , "$vn" , "\n" ; } } print 'Finish First Section' , "\n" ;

Replies are listed 'Best First'.
Re: problem count the number of words
by haukex (Archbishop) on Dec 26, 2018 at 11:46 UTC

    Sorry, but I don't understand the question.

    • You haven't provided any sample input data, or the expected output for that input. See also How do I post a question effectively? and I know what I mean. Why don't you?
    • Please use consistent indentation in your source code. See perltidy to help with that.
    • Please provide a Short, Self-Contained, Correct Example: for example, the code from open (my $in ... up to close $it ; doesn't seem to have anything to do with counting words, and the last two blocks of code beginning with my $numlinepw and my $numlinenw seem to be pretty much identical except for the variable names, for the purposes of asking this question one of them can be removed, and in the final code should probably be refactored into a subroutine.

    Having said all that, this looks suspicious to me: @nwords = split( /\n/, $nwt ); - @nwords will contain lines, not words, unless of course your input file only has one word per line - but again, you haven't shown it, so we don't know. (Update: And it's not just that - $nwt is a filehandle, as per jwkrahn's post).

Re: problem count the number of words -- oneliner
by Discipulus (Canon) on Dec 26, 2018 at 14:15 UTC
    Hello,

    Given what wise haukex already said and considering only your title, a simple oneliner can do the task (pay attention to windows double quotes):

    perl -lne "$count+=split}{print $count" /path/file1 /path/file2

    Using Deparse can give you a working starting point to work with:

    perl -MO=Deparse -lne "$count+=split}{print $count" /path/file1 /path +/file2 BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = readline ARGV)) { chomp $_; $count += split(' ', $_, 0); } { print $count; } -e syntax OK

    That can be translated in: foreach file in the input ( see ARGV in perlodoc ) read it line by line, chomp each line, split the line at withespaces and add the resulting word count to $count When all file processing is finished print the value of $count

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Here's an alternative:
      perl -ne 's/\W+/$i++/eg;END{print "$i\n"}' f1 f2
        Hi kschwab,

        Are you sure? This one gives the same result like wc.

        perl -ne 's/\S+/$i++/egi;END{print "$i\n"}' f1 f2
Re: problem count the number of words
by jwkrahn (Abbot) on Dec 26, 2018 at 19:33 UTC
    open (my $inwp , "<:encoding(utf8)" , "$pwt") or die "$pwt: $!" ; while (<$inw>) { my @pwords ; my @ptw ; my $elementp ; my $countp ; @pwords = split (/\n/ , $inwp) ;

    The variable $inwp is a FILEHANDLE and does not contain any data from the file $pwt.

    open (my $nwt , "<:encoding(utf8)" , "$ntw" ) or die "$ntw: $!" ; while (<$itw>) { $numlinenw++ ; my @nwords ; my @ntw ; my $elementn ; my $countn ; @nwords = split (/\n/ , $nwt) ;

    The same thing with the variable $nwt.

      Hi bro

      i have 2 files that include a lot of sentences and also i have 2 files include negative words and positive words and i want to count the number use of these 2 type of words in the sentences , my sentences are $pt = '/root/Positive.txt' , $nt = '/root/Negative.txt' so my words are $pwt = '/root/Positive2.txt' , $ntw = '/root/Negative2.txt' , now i want to count the number of words in 2 sentences by this code

        Here's an example of counting defined sets of "words" (which can be tricky to define) based on the technique described in the Building Regex Alternations Dynamically article by haukex. If you can figure out how to get the contents of your positive and negative word data files into the corresponding arrays (and if my notion of what you want is anywhere near what you actually want), you may be on your way.

        Note that the code is set up for case-insensitive matching and counting: the negative word "fourscore" matches "FoUrScOrE" in the example sentence, and so on. Note, again, that the concept of a "word" can be slippery, so the use of the  \b boundary assertion, among other details, may not be appropriate.

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @positive = qw(nation conceived liberty created equal foo); my @negative = qw(fourscore SEVEN fOrTh fathers continent bar); ;; my $sentence = 'FoUrScOrE and seven years ago ' . 'our fathers brought forth, on this continent, ' . 'a new nation, conceived in liberty, and dedicated ' . 'to the proposition that all men are created equal. ' . 'Repeat seven nation fathers nation.' ; ;; my %pos = map { lc($_) => 0 } @positive; my $rx_pos = make_regex(\%pos); print 'for debug: positive rx: ', $rx_pos; ;; my %neg = map { lc($_) => 0 } @negative; my $rx_neg = make_regex(\%neg); print 'for debug: negative rx: ', $rx_neg; ;; my %other; my $rx_undefined = qr{ (?! $rx_pos | $rx_neg) }xms; my $rx_word = qr{ \b [[:alpha:]]+ \b }xms; ;; ++$pos { lc $_ } for $sentence =~ m{ $rx_pos }xmsg; ++$neg { lc $_ } for $sentence =~ m{ $rx_neg }xmsg; ++$other{ lc $_ } for $sentence =~ m{ $rx_undefined $rx_word }xmsg; ;; dd \%pos; dd \%neg; dd \%other; ;; ;; sub make_regex { my ($hr_wordlist) = @_; ;; my ($rx) = map qr{ (?i) \b (?: $_) \b }xms, join '|', map quotemeta, reverse sort keys %$hr_wordlist ; ;; return $rx; } " for debug: positive rx: (?msx-i: (?i) \b (?: nation|liberty|foo|equal| +created|conceived) \b ) for debug: negative rx: (?msx-i: (?i) \b (?: seven|fourscore|forth|fat +hers|continent|bar) \b ) { conceived => 1, created => 1, equal => 1, foo => 0, liberty => 1, na +tion => 3 } { bar => 0, continent => 1, fathers => 2, forth => 1, fourscore => 1, +seven => 2 } { a => 1, ago => 1, all => 1, "and" => 2, are => 1, brought => 1, dedicated => 1, in => 1, men => 1, new => 1, on => 1, our => 1, proposition => 1, repeat => 1, that => 1, the => 1, this => 1, to => 1, years => 1, }

        Update: In the  make_regex() function, the lines
            reverse sort
            map  quotemeta,
        are swapped | were swapped (fixed); they should be
            map  quotemeta,
            reverse sort
        i.e., sort-ing, either lexically or by length, should be done on the raw strings before the quotemeta step.


        Give a man a fish:  <%-{-{-{-<