choroba has pointed out bugs in your regular expression (see Metacharacters in perlre). To avoid this sort of mistake, rather than typing all that in, you should consider building a regular expression expression from a list of words, like perhaps:

#penghilangan stopword my @words = qw( untuk dari di yang dan ini itu atau pada ke adalah setelah selalu daripada dengan dalam akan juga tidak karena tersebut ada bisa sebagai sudah saat oleh harus menjadi secara last modified lebih hanya para telah seperti sementara kepada namun sangat lalu belum bagi tak kalau bahwa tetapi dapat antara banyak kembali saja atas hingga melalui terjadi tapi sampai tentang sama agar memang lagi selama mencapai terus yakni the terhadap ketika merupakan sehingga sebuah jika bukan jadi sejumlah sejak perlu mulai jelas pun masih mengatakan menurut sekitar lain melakukan baru beberapa hal ); my $regex = join '|', map qr/\b\Q$_\E\b/, @words; $kata =~ s/$regex//g;

Other changes you might consider include:

  1. strict and warnings are good. See Use strict warnings and diagnostics or die.
  2. A more natural way of expressing $#ARGV + 1 != 1 might be @ARGV != 1
  3. Your second $kata =~ tr/[A-Z]/[a-z]/; is unnecessary, since you already lower-cased everything when building %freq.
  4. You have a whole bunch of substitutions for removing characters. Looking at them, I wonder if you really mean what you have written. For example, do you really want to remove the three character sequence "`”, or do you mean remove any occurrence of these three characters? (The escape before " is unnecessary) I think you would probably get your actual desired result replacing $kata =~ s/\d+//g;, $kata =~ s/[!.,()*]|\"`”//g; and $kata =~ s/-+//g; with $kata =~ s/[\d!.,()*"`”\-+]//g;

Update: Corrected oversight in replacement RE in 4. Thanks choroba.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


In reply to Re: how to create word list from input text file by kennethk
in thread read whole file in a directory by ask91

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.