Re: Exact string matching

Replies are listed 'Best First'.
Re^2: Exact string matching by Anonymous Monk on Oct 16, 2011 at 11:08 UTC
Dear Monk, I'm so sorry for being very immature, this is my first time posting a question, so plz forgive my immaturity `open(HD,"file") or die ("Cant open"); $text=<HD>; $text=~s/ //g; chomp $text; $pattern="word"; $offset = 0; $pos=index $text,$pattern,$offset; while ($pos != -1) { print "Found $pattern at $pos\n"; $offset = $pos + 1; $pos = index($text, $pattern, $offset); }` [download]	[reply] [d/l]
Re^3: Exact string matching by ramprasad27 (Sexton) on Oct 16, 2011 at 11:41 UTC
Looking at what you are trying to achieve, here is the code `use Data::Dumper; open (HAN,'employee.pm'); my $cont = <HAN>; # assume $cont = 'package Employee df df'; my %hash = (); while ( $cont =~ m/(\w+)/g ) { $hash{$1}++; } print Dumper(\%hash); --------- output $VAR1 = { 'Employee' => 1, 'df' => 2, 'package' => 1 };` [download] it prints how many time each word occured ..	[reply] [d/l]
Re^4: Exact string matching by Anonymous Monk on Oct 16, 2011 at 12:40 UTC
Dear Ram, Thank you very much for your kind assistance but this works only if the words in the file is separated with a defined spacer such as a white space, what if the file contains only strings without any spacer (junk of characters or sequence of characters to be precise). That's where I am stuck. I need to find the number of occurrence of all possible substrings, that to in linear time (sry, that I was not clear). example: $text = 'howdoidoit' and the answer should be like, for substring of length 3 => how = 1 ; owd = 1 ; wdo = 1 ; doi = 2 ; oid = 1 ; ido = 1 ; oit = 1 ;	[reply]
Re^3: Exact string matching by Anonymous Monk on Oct 16, 2011 at 12:52 UTC
(I was bit carried away...sry for my poor formating earlier) Dear Ram, Thank you very much for your kind assistance but this works only if the words in the file is separated with a defined spacer such as a white space, what if the file contains only strings without any spacer (junk of characters or sequence of characters to be precise). That's where I am stuck. I need to find the number of occurrence of all possible substrings, that to in linear time (sry, that I was not clear). example: $text = 'howdoidoit' and the answer should be like For substring of length 3 how - 1 owd - 1 wdo - 1 doi - 2 oid - 1 ido - 1 oit - 1	[reply]
Re^4: Exact string matching by BrowserUk (Patriarch) on Oct 16, 2011 at 20:05 UTC
The fastest way to n-tuple long strings is using unpack: `$text = 'howdoidoit';; print for unpack '(a3X2)', $text;; how owd wdo doi oid ido doi oit it it print for unpack '(a4X3)', $text;; howd owdo wdoi doid oido idoi doit oit oit oit` [download] You have to discard the last n-1 results but that is very quick and simple to do. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^5: Exact string matching by AnomalousMonk (Archbishop) on Oct 16, 2011 at 23:29 UTC
... and this approach can easily be extended to make n dynamic by interpolating the width counts in the unpack template string. `>perl -wMstrict -le "my $text = 'howdoidoit'; ;; my $n = 3; my $back = $n - 1; ;; my @unpacked = unpack qq{(a$n X$back)*}, $text; my %count; $count{$_}++ for @unpacked[0 .. $#unpacked - $back]; ;; use Data::Dumper; print Dumper \%count; " $VAR1 = { 'wdo' => 1, 'ido' => 1, 'owd' => 1, 'how' => 1, 'oid' => 1, 'oit' => 1, 'doi' => 2 };` [download]	[reply] [d/l]
Re^4: Exact string matching by ramprasad27 (Sexton) on Oct 16, 2011 at 16:27 UTC
try this `foreach ($cont =~ m/([a-z]{3})/g ){ $hash{$_}++; }` [download] what do you mean by liner time? and lastly you need to modify the pattern depending on what you want, please work on it	[reply] [d/l]
Re^5: Exact string matching by Corion (Patriarch) on Oct 16, 2011 at 16:30 UTC
Note that your approach will only return non-overlapping trigrams: `> perl -wle "print for 'howdoyoudo' =~ /([a-z]{3})/g" how doy oud` [download] I would advise the original poster to really work on the question and maybe search CPAN for Ngrams or Trigrams.	[reply] [d/l]