cdherold has asked for the wisdom of the Perl Monks concerning the following question:

Dearest Monks,

I am having an issue with comparing two strings that should be identical with each other using 'eq'. I am pulling one string from my database (it was previously retrieved) and the other from the web. It works most of the time, but seems to fail under circumstances where, among other things, there are brackets (). Quotemeta doesn't seem to help.

Failed and Successful strings (i.e. headlines) are below code.

Here is the relevant code:

$header =~ s/^\s+//; #remove leading spaces $header =~ s/\s+$//; #remove trailing spaces $header_in_db =~ s/^\s+//; #remove leading spaces $header_in_db =~ s/\s+$//; #remove trailing spaces if ($header eq $header_in_db) { $Already_in_DB = 1; print "<b>ALREADY IN DATABASE</b><br>"; }

Failed matches:

Genetic Variants of Wnt Transcription Factor TCF-4 (TCF7L2) Putative Promoter Region Are Associated with Small Intestinal Crohn's Disease

Conserved Protective Mechanisms in Radiation and Genetically Attenuated uis3(-) and uis4(-) Plasmodium Sporozoites

MaxCyte Introduces the GT(TM) Flow Transfection System for Application with Autologous and Allogeneic Stem Cell Therapies

Update of Long-Term Data on Brain Cancer Patients Receiving DCVax(R)-Brain Continues to Show Striking Improvements in Delay of Disease and Survival

Dengue Virus Type 2 Infections of Aedes aegypti Are Modulated by the Mosquito's RNA Interference Pathway

Relation of DNA Methylation of 5′-CpG Island of ACSL3 to Transplacental Exposure to Airborne Polycyclic Aromatic Hydrocarbons and Childhood Asthma

Successful Matches:

Wayne County Partners With TechTown to Launch Global Stem Cell 'Innovation and Commercialization Lab'

Genes That Control Body's Salt Levels Are Identified

Push is on to tailor cancer care to tumor's genes

Eye Movement: Involuntary Maybe, But Certainly Not Random

Top 5 Fast-Growth Stocks for Feb. 16

Discovery of a Novel Activator of KCNQ1-KCNE1 K+ Channel Complexes

Most Humbly,

Chris Herold

Replies are listed 'Best First'.
Re: 'eq' matching not comprehensive
by almut (Canon) on Feb 17, 2009 at 19:57 UTC

    Use Devel::Peek to dump $header and $header_in_db in order to find out what those strings really are (encoding-wise, etc.)...

    use Devel::Peek; Dump $header; Dump $header_in_db; if ($header eq $header_in_db) { ...

    (look at "PV = ..." in the output)

Re: 'eq' matching not comprehensive
by Fletch (Bishop) on Feb 17, 2009 at 19:35 UTC

    Just a WAG, but judging by the funky characters in your sample title starting with "Relation ..." I'd bet there's character set encoding issues. Your DB may be returning one encoding but the source you're comparing against is using a different one. quotemeta doesn't help because eq doesn't have anything to do with metacharacters and regular expressions.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: 'eq' matching not comprehensive
by CountZero (Bishop) on Feb 17, 2009 at 22:09 UTC
    Did you check the encoding of the webpage and the encoding used in the database? If they are not the same, both strings may look the same on screen, but are actually quite different code-wise.

    A cheap trick is to use Text::Unidecode to transliterate both the database-string and the web-string to US-ASCII and then compare these. That should get you out of trouble with most "funny" characters and their internal representations.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James