Re^2: Exact string matching
by Anonymous Monk on Oct 16, 2011 at 11:08 UTC
|
Dear Monk, I'm so sorry for being very immature, this is my first time posting a question, so plz forgive my immaturity
open(HD,"file") or die ("Cant open");
$text=<HD>;
$text=~s/ //g;
chomp $text;
$pattern="word";
$offset = 0;
$pos=index $text,$pattern,$offset;
while ($pos != -1)
{
print "Found $pattern at $pos\n";
$offset = $pos + 1;
$pos = index($text, $pattern, $offset);
}
| [reply] [d/l] |
|
|
Looking at what you are trying to achieve, here is the code
use Data::Dumper;
open (HAN,'employee.pm');
my $cont = <HAN>; # assume $cont = 'package Employee df df';
my %hash = ();
while ( $cont =~ m/(\w+)/g )
{ $hash{$1}++;
}
print Dumper(\%hash); --------- output
$VAR1 = {
'Employee' => 1,
'df' => 2,
'package' => 1 };
it prints how many time each word occured .. | [reply] [d/l] |
|
|
Dear Ram, Thank you very much for your kind assistance but this works only if the words in the file is separated with a defined spacer such as a white space, what if the file contains only strings without any spacer (junk of characters or sequence of characters to be precise). That's where I am stuck. I need to find the number of occurrence of all possible substrings, that to in linear time (sry, that I was not clear).
example:
$text = 'howdoidoit'
and the answer should be like,
for substring of length 3 =>
how = 1 ;
owd = 1 ;
wdo = 1 ;
doi = 2 ;
oid = 1 ;
ido = 1 ;
oit = 1 ;
| [reply] |
Re^3: Exact string matching
by Anonymous Monk on Oct 16, 2011 at 12:52 UTC
|
(I was bit carried away...sry for my poor formating earlier)
Dear Ram, Thank you very much for your kind assistance but this works only if the words in the file is separated with a defined spacer such as a white space, what if the file contains only strings without any spacer (junk of characters or sequence of characters to be precise). That's where I am stuck. I need to find the number of occurrence of all possible substrings, that to in linear time (sry, that I was not clear).
example:
$text = 'howdoidoit'
and the answer should be like
For substring of length 3
how - 1
owd - 1
wdo - 1
doi - 2
oid - 1
ido - 1
oit - 1
| [reply] |
|
|
$text = 'howdoidoit';;
print for unpack '(a3X2)*', $text;;
how
owd
wdo
doi
oid
ido
doi
oit
it
it
print for unpack '(a4X3)*', $text;;
howd
owdo
wdoi
doid
oido
idoi
doit
oit
oit
oit
You have to discard the last n-1 results but that is very quick and simple to do.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
>perl -wMstrict -le
"my $text = 'howdoidoit';
;;
my $n = 3;
my $back = $n - 1;
;;
my @unpacked = unpack qq{(a$n X$back)*}, $text;
my %count;
$count{$_}++ for @unpacked[0 .. $#unpacked - $back];
;;
use Data::Dumper;
print Dumper \%count;
"
$VAR1 = {
'wdo' => 1,
'ido' => 1,
'owd' => 1,
'how' => 1,
'oid' => 1,
'oit' => 1,
'doi' => 2
};
| [reply] [d/l] |
|
|
foreach ($cont =~ m/([a-z]{3})/g ){
$hash{$_}++;
}
what do you mean by liner time?
and lastly you need to modify the pattern depending on what you want, please work on it | [reply] [d/l] |
|
|
> perl -wle "print for 'howdoyoudo' =~ /([a-z]{3})/g"
how
doy
oud
I would advise the original poster to really work on the question and maybe search CPAN for Ngrams or Trigrams. | [reply] [d/l] |