in reply to regexp question

Maybe you would prefer the HTML::Entities module.

Replies are listed 'Best First'.
Re^2: regexp question
by Nodonomy (Novice) on Jan 29, 2011 at 03:12 UTC

    I have used HTML:Entities encode_entities($text) to convert all the quotes, ampersands, and the like in my text in order to be able to do an 'insert' of the text into MySQL.

    Now my problem is how to search on this text (or a subset of it) with all the HTML numbers in it. In other words, how can I search on "Apples & Oranges" when the actual text now in the table is "Apples & Oranges"?

    I thought maybe I should just remove the HTML numbers (such as #8220; and &) and settle for that.

    I feel a little sheepish in asking this, since it seems something that I should know. In fact though this seems like a problem many people have, and that there probably is some consistent ways to address it. Unfortunately I'm not familiar with such an approach.

      how can I search on "Apples & Oranges" when

      You really can't. "A" could be stored as any of

      • A
      • A
      • A
      • A
      • ...
      • A
      • A
      • A
      • ...

      You'd need to decode the text to search it.

      the actual text now in the table is "Apples & Oranges"?

      Why?

      (How did you even reach this point from asking how to decode HTML text into plain text?)

        Hello Ikegmai:

        Greetings. I have reached the point I am at as follows: I translate text using HTML::Entities, then store it with the newly generated HTML names (e.g., #8220) in my RDBMS table; they will not 'insert' otherwise (I have observed that single and double quotes are especially disliked). Once translated by encode_entities(), MySQL behaves well, accepting the text for insertion.

        Now I need to search on this text using 'select' and 'like' in the SQL.

        Overnight I decided on an approach: encode all query criteria using a similar Java encode class, then search. What do you think? In this way one is always comparing apples with apples, or so it seems to me.

        My special thanks being able to chat so-to-speak with you and Anonymous Monk on this subject.

Re^2: regexp question
by Anonymous Monk on Jan 29, 2011 at 02:56 UTC
    Hey, that module uses regex :p