in reply to Re: regexp question
in thread regexp question

I have used HTML:Entities encode_entities($text) to convert all the quotes, ampersands, and the like in my text in order to be able to do an 'insert' of the text into MySQL.

Now my problem is how to search on this text (or a subset of it) with all the HTML numbers in it. In other words, how can I search on "Apples & Oranges" when the actual text now in the table is "Apples & Oranges"?

I thought maybe I should just remove the HTML numbers (such as #8220; and &) and settle for that.

I feel a little sheepish in asking this, since it seems something that I should know. In fact though this seems like a problem many people have, and that there probably is some consistent ways to address it. Unfortunately I'm not familiar with such an approach.

Replies are listed 'Best First'.
Re^3: regexp question
by ikegami (Patriarch) on Jan 29, 2011 at 04:19 UTC

    how can I search on "Apples & Oranges" when

    You really can't. "A" could be stored as any of

    • A
    • A
    • A
    • A
    • ...
    • A
    • A
    • A
    • ...

    You'd need to decode the text to search it.

    the actual text now in the table is "Apples & Oranges"?

    Why?

    (How did you even reach this point from asking how to decode HTML text into plain text?)

      Hello Ikegmai:

      Greetings. I have reached the point I am at as follows: I translate text using HTML::Entities, then store it with the newly generated HTML names (e.g., #8220) in my RDBMS table; they will not 'insert' otherwise (I have observed that single and double quotes are especially disliked). Once translated by encode_entities(), MySQL behaves well, accepting the text for insertion.

      Now I need to search on this text using 'select' and 'like' in the SQL.

      Overnight I decided on an approach: encode all query criteria using a similar Java encode class, then search. What do you think? In this way one is always comparing apples with apples, or so it seems to me.

      My special thanks being able to chat so-to-speak with you and Anonymous Monk on this subject.

        Yes, definitely use placeholders. There's some documentation on the DBI page, but it's really quite simple:

        # BAD! $dbh->do(" INSERT INTO Table ( a, b, c, ) VALUES ( '$fields[0]', '$fields[1]', '$fields[2]' ) ");

        and

        # BAD! my $sth = $dbh->prepare(" INSERT INTO Table ( a, b, c, ) VALUES ( '$fields[0]', '$fields[1]', '$fields[2]' ) "); $sth->execute();

        become

        $dbh->do( (" INSERT INTO Table ( a, b, c, ) VALUES ( ?,?,? ) "), undef, @fields, );

        or

        my $sth = $dbh->prepare(" INSERT INTO Table ( a, b, c, ) VALUES ( ?,?,? ) "); $sth->execute(@fields);