problem count the number of words

GHMON has asked for the wisdom of the Perl Monks concerning the following question:

Hi i want to count the number of any words in my file but the number is only plus all words !!!!!

#Hi Codder
use warnings ;
use strict ;
#use DBI ;
use utf8 ;
use Encode ;
my $numlinep = 0 ;
my $traincountp = 0 ;
my $pt = '/root/Positive.txt' ;
my $ptt = '/root/positivetrain1.txt' ;
my $pwt = '/root/Positive2.txt' ;
my $ntw = '/root/Negative2.txt' ;

open (my $in , "<:encoding(utf8)" , "$pt") or die "$pt: $!" ;
while (my $line = <$in>) {
      
            $numlinep++ ;
}
close $in ;
open ($in , "<:encoding(utf8)" , "$pt") or die "$pt: $!" ;
while (my $linep = <$in>) {
      my $inp ;
      if ($traincountp <= (0.7*$numlinep)){
        $traincountp++ ;
        open ($inp , ">>" , "$ptt") or die "$ptt: $!" ;
        print $inp $linep , "\n" ;
      }
}
close $in ;
my $numlinet = 0 ;
my $traincountn = 0 ;
my $nt = '/root/Negative.txt' ;
my $ntt = '/root/negativetrain1.txt' ;
open (my $it , "<:encoding(utf8)" , "$nt") or die "$nt: $!" ;
while (my $line = <$it>) {
      
            $numlinet++ ;
}
close $it ;
open ($it , "<:encoding(utf8)" , "$nt") or die "$nt: $!" ;
while (my $linen = <$it>) {
      my $itn ;
      if ($traincountn <= (0.7*$numlinet)){
        $traincountn++ ;
        open ($itn , ">>" , "$ntt") or die "$ntt: $!" ;
        print $itn $linen , "\n" ;
      }
}
close $it ;


my $numlinepw = 0 ;
my %countp = () ;

open (my $inw , "<:encoding(utf8)" , "$ptt") or die "$ptt: $!" ;
open (my $inwp , "<:encoding(utf8)" , "$pwt") or die "$pwt: $!" ;
while (<$inw>) {

my @pwords ;
my @ptw ;
my $elementp ;
my $countp ;

  @pwords = split (/\n/ , $inwp) ;
  push @ptw , @pwords ;

foreach my $elementp (@ptw) {
    $countp{$elementp}++ ;
    }

  while ( ( my $kp , my $vp) = each %countp ) {
    open (my $hashp , ">>" , 'wordsbagp.txt') ;
    print "$vp => $kp" , "\n" ;
    print $hashp "$vp = $kp\n" ;
    #print "$kp => $vp\n" ;
    #print "$kp" , "\n" ;
    #print "$vp" , "\n" ;
    #print "$kp" , "\n" , "$vp" , "\n" ;
    
    }

   $numlinepw++ ;
} 

#print "$numlinepw" , "\n" ;

my $numlinenw = 0 ;
my %countn = () ;

open (my $itw , "<:encoding(utf8)" , "$ntt") or die "$ntt: $!" ;
open (my $nwt , "<:encoding(utf8)" , "$ntw" ) or die "$ntw: $!" ;

while (<$itw>) {

  $numlinenw++ ;

my @nwords ;
my @ntw ;
my $elementn ;
my $countn ;


@nwords = split (/\n/ , $nwt) ;
  push @ntw , @nwords ;

foreach my $elementn (@ntw) {
    $countn{$elementn}++ ;
    }

  while ( ( my $kn , my $vn ) = each %countn ) {
    open (my $hashn , '>>' , 'wordsbagn.txt') or die $! ;
    print "$vn => $kn" , "\n" ;
    print $hashn "$vn = $kn\n" ;

    #print "$kn => $vn\n" ;
    #print "$kn" , "\n" ;
    #print "$vn" , "\n" ;
    #print "$kn" , "\n" , "$vn" , "\n" ;
    }
}

print 'Finish First Section' , "\n" ;
[download]

Comment on problem count the number of words Download Code

Replies are listed 'Best First'.
Re: problem count the number of words by haukex (Archbishop) on Dec 26, 2018 at 11:46 UTC
Sorry, but I don't understand the question. You haven't provided any sample input data, or the expected output for that input. See also How do I post a question effectively? and I know what I mean. Why don't you? Please use consistent indentation in your source code. See perltidy to help with that. Please provide a Short, Self-Contained, Correct Example: for example, the code from `open (my $in ...` up to `close $it ;` doesn't seem to have anything to do with counting words, and the last two blocks of code beginning with `my $numlinepw` and `my $numlinenw` seem to be pretty much identical except for the variable names, for the purposes of asking this question one of them can be removed, and in the final code should probably be refactored into a `sub`routine. Having said all that, this looks suspicious to me: `@nwords = split( /\n/, $nwt );` - `@nwords` will contain lines, not words, unless of course your input file only has one word per line - but again, you haven't shown it, so we don't know. (Update: And it's not just that - `$nwt` is a filehandle, as per jwkrahn's post).	[reply] [d/l] [select]
Re: problem count the number of words -- oneliner by Discipulus (Canon) on Dec 26, 2018 at 14:15 UTC
Hello, Given what wise haukex already said and considering only your title, a simple oneliner can do the task (pay attention to windows double quotes): `perl -lne "$count+=split}{print $count" /path/file1 /path/file2` [download] Using `Deparse` can give you a working starting point to work with: `perl -MO=Deparse -lne "$count+=split}{print $count" /path/file1 /path +/file2 BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = readline ARGV)) { chomp $_; $count += split(' ', $_, 0); } { print $count; } -e syntax OK` [download] That can be translated in: foreach file in the input ( see `ARGV` in perlodoc ) read it line by line, chomp each line, split the line at withespaces and add the resulting word count to `$count` When all file processing is finished print the value of `$count` L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2: problem count the number of words -- oneliner by kschwab (Vicar) on Dec 26, 2018 at 16:40 UTC
Here's an alternative: `perl -ne 's/\W+/$i++/eg;END{print "$i\n"}' f1 f2` [download]	[reply] [d/l]
Re^3: problem count the number of words -- oneliner by pme (Monsignor) on Dec 26, 2018 at 17:16 UTC
Hi kschwab, Are you sure? This one gives the same result like `wc`. `perl -ne 's/\S+/$i++/egi;END{print "$i\n"}' f1 f2` [download]	[reply] [d/l]
Re^4: problem count the number of words -- oneliner by kschwab (Vicar) on Dec 26, 2018 at 17:38 UTC
Re^5: problem count the number of words -- oneliner by pme (Monsignor) on Dec 26, 2018 at 21:00 UTC
Re: problem count the number of words by jwkrahn (Abbot) on Dec 26, 2018 at 19:33 UTC
`open (my $inwp , "<:encoding(utf8)" , "$pwt") or die "$pwt: $!" ; while (<$inw>) { my @pwords ; my @ptw ; my $elementp ; my $countp ; @pwords = split (/\n/ , $inwp) ;` [download] The variable `$inwp` is a FILEHANDLE and does not contain any data from the file `$pwt`. `open (my $nwt , "<:encoding(utf8)" , "$ntw" ) or die "$ntw: $!" ; while (<$itw>) { $numlinenw++ ; my @nwords ; my @ntw ; my $elementn ; my $countn ; @nwords = split (/\n/ , $nwt) ;` [download] The same thing with the variable `$nwt`.	[reply] [d/l] [select]
Re^2: problem count the number of words by GHMON (Novice) on Dec 27, 2018 at 17:19 UTC
Hi bro i have 2 files that include a lot of sentences and also i have 2 files include negative words and positive words and i want to count the number use of these 2 type of words in the sentences , my sentences are $pt = '/root/Positive.txt' , $nt = '/root/Negative.txt' so my words are $pwt = '/root/Positive2.txt' , $ntw = '/root/Negative2.txt' , now i want to count the number of words in 2 sentences by this code	[reply]
Re^3: problem count the number of words (updated) by AnomalousMonk (Archbishop) on Dec 27, 2018 at 20:16 UTC
Here's an example of counting defined sets of "words" (which can be tricky to define) based on the technique described in the Building Regex Alternations Dynamically article by haukex. If you can figure out how to get the contents of your positive and negative word data files into the corresponding arrays (and if my notion of what you want is anywhere near what you actually want), you may be on your way. Note that the code is set up for case-insensitive matching and counting: the negative word `"fourscore"` matches `"FoUrScOrE"` in the example sentence, and so on. Note, again, that the concept of a "word" can be slippery, so the use of the `\b` boundary assertion, among other details, may not be appropriate. c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @positive = qw(nation conceived liberty created equal foo); my @negative = qw(fourscore SEVEN fOrTh fathers continent bar); ;; my $sentence = 'FoUrScOrE and seven years ago ' . 'our fathers brought forth, on this continent, ' . 'a new nation, conceived in liberty, and dedicated ' . 'to the proposition that all men are created equal. ' . 'Repeat seven nation fathers nation.' ; ;; my %pos = map { lc($_) => 0 } @positive; my $rx_pos = make_regex(\%pos); print 'for debug: positive rx: ', $rx_pos; ;; my %neg = map { lc($_) => 0 } @negative; my $rx_neg = make_regex(\%neg); print 'for debug: negative rx: ', $rx_neg; ;; my %other; my $rx_undefined = qr{ (?! $rx_pos \| $rx_neg) }xms; my $rx_word = qr{ \b [[:alpha:]]+ \b }xms; ;; ++$pos { lc $_ } for $sentence =~ m{ $rx_pos }xmsg; ++$neg { lc $_ } for $sentence =~ m{ $rx_neg }xmsg; ++$other{ lc $_ } for $sentence =~ m{ $rx_undefined $rx_word }xmsg; ;; dd \%pos; dd \%neg; dd \%other; ;; ;; sub make_regex { my ($hr_wordlist) = @_; ;; my ($rx) = map qr{ (?i) \b (?: $_) \b }xms, join '\|', map quotemeta, reverse sort keys %$hr_wordlist ; ;; return $rx; } " for debug: positive rx: (?msx-i: (?i) \b (?: nation\|liberty\|foo\|equal\| +created\|conceived) \b ) for debug: negative rx: (?msx-i: (?i) \b (?: seven\|fourscore\|forth\|fat +hers\|continent\|bar) \b ) { conceived => 1, created => 1, equal => 1, foo => 0, liberty => 1, na +tion => 3 } { bar => 0, continent => 1, fathers => 2, forth => 1, fourscore => 1, +seven => 2 } { a => 1, ago => 1, all => 1, "and" => 2, are => 1, brought => 1, dedicated => 1, in => 1, men => 1, new => 1, on => 1, our => 1, proposition => 1, repeat => 1, that => 1, the => 1, this => 1, to => 1, years => 1, } [download] Update: In the `make_regex()` function, the lines `reverse sort` `map quotemeta,` ~~are swapped~~ \| were swapped (fixed); they should be `map quotemeta,` `reverse sort` i.e., sort-ing, either lexically or by length, should be done on the raw strings before the quotemeta step. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: problem count the number of words by GHMON (Novice) on Dec 28, 2018 at 09:37 UTC
Re^5: problem count the number of words by poj (Abbot) on Jan 01, 2019 at 17:06 UTC