in reply to Exact string matching

I have wrote a script which will find me, how many times a given word has repeated
Please show us your code. This is not a code writing service. Please read How do I post a question effectively?

Replies are listed 'Best First'.
Re^2: Exact string matching
by Anonymous Monk on Oct 16, 2011 at 11:08 UTC

    Dear Monk, I'm so sorry for being very immature, this is my first time posting a question, so plz forgive my immaturity

    open(HD,"file") or die ("Cant open"); $text=<HD>; $text=~s/ //g; chomp $text; $pattern="word"; $offset = 0; $pos=index $text,$pattern,$offset; while ($pos != -1) { print "Found $pattern at $pos\n"; $offset = $pos + 1; $pos = index($text, $pattern, $offset); }
      Looking at what you are trying to achieve, here is the code
      use Data::Dumper; open (HAN,'employee.pm'); my $cont = <HAN>; # assume $cont = 'package Employee df df'; my %hash = (); while ( $cont =~ m/(\w+)/g ) { $hash{$1}++; } print Dumper(\%hash); --------- output $VAR1 = { 'Employee' => 1, 'df' => 2, 'package' => 1 };
      it prints how many time each word occured ..

        Dear Ram, Thank you very much for your kind assistance but this works only if the words in the file is separated with a defined spacer such as a white space, what if the file contains only strings without any spacer (junk of characters or sequence of characters to be precise). That's where I am stuck. I need to find the number of occurrence of all possible substrings, that to in linear time (sry, that I was not clear). example: $text = 'howdoidoit' and the answer should be like, for substring of length 3 => how = 1 ; owd = 1 ; wdo = 1 ; doi = 2 ; oid = 1 ; ido = 1 ; oit = 1 ;

Re^3: Exact string matching
by Anonymous Monk on Oct 16, 2011 at 12:52 UTC
    (I was bit carried away...sry for my poor formating earlier)
    
    Dear Ram, Thank you very much for your kind assistance but this works only if the words in the file is separated with a defined spacer such as a white space, what if the file contains only strings without any spacer (junk of characters or sequence of characters to be precise). That's where I am stuck. I need to find the number of occurrence of all possible substrings, that to in linear time (sry, that I was not clear). 
    
    example: 
    
    $text = 'howdoidoit' 
    
    and the answer should be like 
    
    For substring of length 3 
    
    how - 1 
    owd - 1 
    wdo - 1 
    doi - 2 
    oid - 1 
    ido - 1 
    oit - 1
    

      The fastest way to n-tuple long strings is using unpack:

      $text = 'howdoidoit';; print for unpack '(a3X2)*', $text;; how owd wdo doi oid ido doi oit it it print for unpack '(a4X3)*', $text;; howd owdo wdoi doid oido idoi doit oit oit oit

      You have to discard the last n-1 results but that is very quick and simple to do.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        ... and this approach can easily be extended to make n dynamic by interpolating the width counts in the unpack template string.

        >perl -wMstrict -le "my $text = 'howdoidoit'; ;; my $n = 3; my $back = $n - 1; ;; my @unpacked = unpack qq{(a$n X$back)*}, $text; my %count; $count{$_}++ for @unpacked[0 .. $#unpacked - $back]; ;; use Data::Dumper; print Dumper \%count; " $VAR1 = { 'wdo' => 1, 'ido' => 1, 'owd' => 1, 'how' => 1, 'oid' => 1, 'oit' => 1, 'doi' => 2 };
      try this
      foreach ($cont =~ m/([a-z]{3})/g ){ $hash{$_}++; }
      what do you mean by liner time? and lastly you need to modify the pattern depending on what you want, please work on it

        Note that your approach will only return non-overlapping trigrams:

        > perl -wle "print for 'howdoyoudo' =~ /([a-z]{3})/g" how doy oud

        I would advise the original poster to really work on the question and maybe search CPAN for Ngrams or Trigrams.