in reply to Re: Tips on how to perform this regex query
in thread Tips on how to perform this regex query

my $lenSmall = length $smallstring; for my $o ( 0 .. $lenBig - 1 ) { my $masked = substr( $bigstring, $o, $lenSmall ) ^ $smallstring; my $matched = $masked =~ tr[\0][]; if( ( $matched / $lenSmall ) > 0.90 ) { $masked =~ tr[\1-\255][ ]; $masked =~ tr[\0][*]; printf "%.2f%% match at offset %u\n", $matched / $lenSmall * 1 +00, $o; print substr $bigstring, $o, $lenSmall; print $smallstring; print $masked; } }
98.05% match at offset 1071 YQVARNDGQGKAAATFMHISYNNFITEVNNLNKRMGDLRDINGEAGTWVRLLNGSGSADGGFTDHYTLLQ +MGADRKHELGSMDLFTGVMATYTDTDASADLYSGKTKSWGGGFYASGLFRSGAYFDVIAKYIHNENKYD +LNFAGAGKQNFRSHSLYAGAEVGYRYHLTDTTFVEPQAELVWGRLQGQTFNWNDSGMDVSMRRNSVNPL +VGRTGVVSGKTFSGKDWSLTARAGLHYEFDLTDSADVHLKDAAGEHQINGRKDSRMLYGVGLNARFGDN +TRLGLEVERSAFGKYNTDDAINANIRYSF GTMARNDGQGKAAATFMHISYNNFITEVDNLNKRMGDLRDINGEAGTWVRLLNGSGSADGGFTDHYTLLQ +MGADRKHELGSMDLFTGVMATYTDTDASADLYSGKTKSWGGGFYASGLFRSGAYFDVIAKYIHNENKYD +LNFAGAGKQNFRSHSLYAGAEVGYRYHLTDTTFVEPQAELVWGRLQGQTFNWNDSGMDVSMRRNSVNPL +VGRTGVVSGKTFSGKDWSLTARAGLHYEFDLTDSADVHLKDAAGEHQINGRKDSRMLYGVGLNARFGDN +TRLGLEVERSAFGKYNTDDAINANIRYSFLE ************************* ***************************************** +********************************************************************* +********************************************************************* +********************************************************************* +*****************************

Replies are listed 'Best First'.
Re^3: Tips on how to perform this regex query
by BrowserUk (Patriarch) on Jan 12, 2014 at 00:25 UTC

    c:\test\anonymonks.pl Global symbol "$lenBig" requires explicit package name at C:\test\1070 +240.pl line 9. Execution of C:\test\1070240.pl aborted due to compilation errors.
    Correct for that and compare:
    my $bigstring="MNRIYSLRYSA..."; my $smallstring="GTMARNDGQGKAA..."; my $lenBig = length $bigstring; my $lenSmall = length $smallstring; for my $o ( 0 .. ( $lenBig - $lenSmall + 1 ) ) { my $masked = substr( $bigstring, $o, $lenSmall ) ^ $smallstring; my $matched = $masked =~ tr[\0][]; if( ( $matched / $lenSmall ) > 0.095 ) { $masked =~ tr[\1-\255][ ]; $masked =~ tr[\0][*]; printf "%.2f%% match at offset %u\n", $matched / $lenSmall * 1 +00, $o; print substr $bigstring, $o, $lenSmall; print $smallstring; print $masked; } } ]; $otherstring = q[ my $bigstring="MNRIYSLRYSA..."; my $smallstring="GTMARNDGQGKAA..."; my $lenBig = length $bigstring; my $lenSmall = length $smallstring; for my $o ( 0 .. $lenBig - 1 ) { my $masked = substr( $bigstring, $o, $lenSmall ) ^ $smallstring; my $matched = $masked =~ tr[\0][]; if( ( $matched / $lenSmall ) > 0.90 ) { $masked =~ tr[\1-\255][ ]; $masked =~ tr[\0][*]; printf "%.2f%% match at offset %u\n", $matched / $lenSmall * 1 +00, $o; print substr $bigstring, $o, $lenSmall; print $smallstring; print $masked; } } ]; print compare( $firststring, $otherstring ); __END__ 99.4% plagerised

    Original thought is a rare commodity.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Original though is a rare commodity.
      s/though/thought/
      plagerised
      patched!
        patched!

        Patch rejected!

        Error: $lenBig + $lenSmall > $lenBig

        A more correct patch would be:

        for my $o ( 0 .. ( $lenBig - $lenSmall + 2 ) ) {

        But it still misses Found 'RNDGQGKAAATFMHISYNNFITEV'(4:24) at 1076


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Tips on how to perform this regex query
by Anonymous Monk on Jan 12, 2014 at 00:07 UTC
    corrected and improved
    #! perl -slw use strict; my $bigstring="MNRIYSLRYSAVARGFIAVSEFARKCVHKSVRRLCFPVLLLIPVLFSAGSLAGTV +NNELGYQLFRDFAENKGMFRPGATNIAIYNKQGEFVGTLDKAAMPDFSAVDSEIGVATLINPQYIASVK +HNGGYTNVSFGDGENRYNIVDRNNAPSLDFHAPRLDKLVTEVAPTAVTAQGAVAGAYLDKERYPVFYRL +GSGTQYIKDSNGQLTKMGGAYSWLTGGTVGSLSSYQNGEMISTSSGLVFDYKLNGAMPIYGEAGDSGSP +LFAFDTVQNKWVLVGVLTAGNGAGGRGNNWAVIPLDFIGQKFNEDNDAPVTFRTSEGGALEWSFNSSTG +AGALTQGTTTYAMHGQQGNDLNAGKNLIFQGQNGQINLKDSVSQGAGSLTFRDNYTVTTSNGSTWTGAG +IVVDNGVSVNWQVNGVKGDNLHKIGEGTLTVQGTGINEGGLKVGDGKVVLNQQADNKGQVQAFSSVNIA +SGRPTVVLTDERQVNPDTVSWGYRGGTLDVNGNSLTFHQLKAADYGAVLANNVDKRATITLDYALRADK +VALNGWSESGKGTAGNLYKYNNPYTNTTDYFILKQSTYGYFPTDQSSNATWEFVGHSQGDAQKLVADRF +NTAGYLFHGQLKGNLNVDNRLPEGVTGALVMDGAADISGTFTQENGRLTLQGHPVIHAYNTQSVADKLA +ASGDHSVLTQPTSFSQEDWENRSFTFDRLSLKNTDFGLGRNATLNTTIQADNSSVTLGDSRVFIDKNDG +QGTAFTLEEGTSVATKDADKSVFNGTVNLDNQSVLNINDIFNGGIQANNSTVNISSDSAVLGNSTLTST +ALNLNKGANALASQSFVSDGPVNISDATLSLNSRPDEVSHTLLPVYDYAGSWNLKGDDARLNVGPYSML +SGNINVQDKGTVTLGGEGELSPDLTLQNQMLYSLFNGYRNIWSGSLNAPDATVSMTDTQWSMNGNSTAG +NMKLNRTIVGFNGGTSPFTTLTTDNLDAVQSAFVMRTDLNKADKLVINKSATGHDNSIWVNFLKKPSNK +DTLDIPLVSAPEATADNLFRASTRVVGFSDVTPILSVRKEDGKKEWVLDGYQVARNDGQGKAAATFMHI +SYNNFITEVNNLNKRMGDLRDINGEAGTWVRLLNGSGSADGGFTDHYTLLQMGADRKHELGSMDLFTGV +MATYTDTDASADLYSGKTKSWGGGFYASGLFRSGAYFDVIAKYIHNENKYDLNFAGAGKQNFRSHSLYA +GAEVGYRYHLTDTTFVEPQAELVWGRLQGQTFNWNDSGMDVSMRRNSVNPLVGRTGVVSGKTFSGKDWS +LTARAGLHYEFDLTDSADVHLKDAAGEHQINGRKDSRMLYGVGLNARFGDNTRLGLEVERSAFGKYNTD +DAINANIRYSF"; my $smallstring="GTMARNDGQGKAAATFMHISYNNFITEVDNLNKRMGDLRDINGEAGTWVRLLN +GSGSADGGFTDHYTLLQMGADRKHELGSMDLFTGVMATYTDTDASADLYSGKTKSWGGGFYASGLFRSG +AYFDVIAKYIHNENKYDLNFAGAGKQNFRSHSLYAGAEVGYRYHLTDTTFVEPQAELVWGRLQGQTFNW +NDSGMDVSMRRNSVNPLVGRTGVVSGKTFSGKDWSLTARAGLHYEFDLTDSADVHLKDAAGEHQINGRK +DSRMLYGVGLNARFGDNTRLGLEVERSAFGKYNTDDAINANIRYSFLE"; my $lenBig = length $bigstring; my $lenSmall = length $smallstring; my $threshold = 0.9; for my $o ( 0 .. $lenBig - $threshold*$lenSmall ) { my $masked = substr( $bigstring, $o, $lenSmall ) ^ $smallstring; my $matched = $masked =~ tr[\0][]; if( ( $matched / $lenSmall ) > $threshold ) { $masked =~ tr[\1-\255][ ]; $masked =~ tr[\0][*]; printf "%.2f%% match at offset %u\n", $matched / $lenSmall * 1 +00, $o; print substr $bigstring, $o, $lenSmall; print $smallstring; print $masked; } }
    98.05% match at offset 1071 YQVARNDGQGKAAATFMHISYNNFITEVNNLNKRMGDLRDINGEAGTWVRLLNGSGSADGGFTDHYTLLQ +MGADRKHELGSMDLFTGVMATYTDTDASADLYSGKTKSWGGGFYASGLFRSGAYFDVIAKYIHNENKYD +LNFAGAGKQNFRSHSLYAGAEVGYRYHLTDTTFVEPQAELVWGRLQGQTFNWNDSGMDVSMRRNSVNPL +VGRTGVVSGKTFSGKDWSLTARAGLHYEFDLTDSADVHLKDAAGEHQINGRKDSRMLYGVGLNARFGDN +TRLGLEVERSAFGKYNTDDAINANIRYSF GTMARNDGQGKAAATFMHISYNNFITEVDNLNKRMGDLRDINGEAGTWVRLLNGSGSADGGFTDHYTLLQ +MGADRKHELGSMDLFTGVMATYTDTDASADLYSGKTKSWGGGFYASGLFRSGAYFDVIAKYIHNENKYD +LNFAGAGKQNFRSHSLYAGAEVGYRYHLTDTTFVEPQAELVWGRLQGQTFNWNDSGMDVSMRRNSVNPL +VGRTGVVSGKTFSGKDWSLTARAGLHYEFDLTDSADVHLKDAAGEHQINGRKDSRMLYGVGLNARFGDN +TRLGLEVERSAFGKYNTDDAINANIRYSFLE ************************* ***************************************** +********************************************************************* +********************************************************************* +********************************************************************* +*****************************
      Better
      for my $o ( ($threshold-1)*$lenSmall .. $lenBig - $threshold*$lenSmall + ) {
      But substr needs to handle negative offsets then.

      Left as exercise!