davies has asked for the wisdom of the Perl Monks concerning the following question:

The basic task I wish to accomplish is relatively simple. I want to execute a database command line UPDATE bugs_fulltext SET comments = $newtext WHERE comments = $oldtext. My instinctive approach was to use placeholders. The problem I am hitting is that the new and old text may contain line breaks. I can't use an index on the table as the table is part of BugZilla and the definition has no index, hence the WHERE comments = $oldtext clause (I have other clauses to deal with duplicate comments, but that's not relevant to my problem). Unfortunately, the database may deliver line breaks as 0D or 0D0A but insists that they be entered as \n so that it can enter them in the local format. The problem is that Perl, of course, has its own setting for \n that need not be the same. Indeed, Perl's \n and the database's \n are different "\n"s. So what I need to do is to replace either 0D or 0D0A with \n.

Doing this in a simple regex seems to work. Unfortunately, the text could legitimately contain xkcd://Bobby Tables, so I still need to pass it through the placeholder. This converts my \n to \\n, which means that the WHERE clause does not match. Bind does the same thing. I then tried using split on the string to get an array of individual lines, intending to pass each line separately through db->quote and join them with \ns. Unfortunately, the regex that worked for s///g did not work in split (the exact regex was chr(13)chr(10)?). However, db->quote, getting a single line, converted the 0D0A sequences to \r\n, which might give me some leverage. If I pass the text through db->quote and then s/\\r\\n/\\n/g, I think I will get what I want. Unless, of course, the character before the R is an escaped backslash, i.e. the character group is \\r\n, when I don't want to substitute. Unless THAT is preceded by ANOTHER backslash, when I DO want to substitute. Unless (repeat ad infinitum).

So, I have two questions. Is there a better approach, given that I don't know what the localisation of Perl(\n) and database(\n) will be? Is there a way of writing s/\\r\\n/\\n/g unless preceded by an even number of backslashes, excluding the one given?

Regards,

John Davies

Replies are listed 'Best First'.
Re: Line separators when passing multi-line fields to a database
by moritz (Cardinal) on Mar 21, 2011 at 13:02 UTC
    My instinctive approach was to use placeholders.

    And that's the only correct approach.

    Doing this in a simple regex seems to work

    So, do it, and then pass the result to ->execute to bind to a placeholder.

      But passing \n through a placeholder converts it to \\n, which does not match the WHERE clause.

      Regards,

      John Davies

        If you execute:
        $sth = $dbh->prepare('UPDATE sometable SET value1=? WHERE value2=?'); $sth->execute('something with \n in it', 'something else which inciden +tally has a \n in it');
        I'm quite sure that the SQL will receive the strings exactly as you supplied it (Otherwise database-users around the world will have to start worrying quite a bit).
        In this case, using single-quotes, the \n will not be interpreted and supplied to the database as-is. If you want (perl) to interpret the \n and send a singular newline (0x0A), use double-quotes in the execute-statement.

        However (using double-quotes), when you use \n in the WHERE-clause it will only match LF's (0x0A) and not CRLF (0x0D0A), should those be present. If you also want to match those, you should do this:

        $sth = $dbh->prepare('UPDATE sometable SET value1=? WHERE value2=? OR +value2=?'); $sth->execute("something with \n in it", "something else which inciden +tally has a \n in it", "something else which incidentally has \r\n in + it");
        Not if your database and driver support placeholders it doesn't. Placeholders pass the data to the db completely unchanged. They do not escape any characters1. Passing a newline character through a placeholder results in the database receiving a newline character, not the string '\\n' (or the string '\n', for that matter).

        Are you perhaps searching for the single-quoted string '\n' (as opposed to the double-quoted string "\n", which is a single newline character, not the two characters \ and n) when putting it into the WHERE clause?


        1 Some drivers may fall back to escaping in order to emulate placeholders for database engines which don't support them - but, in that case, the exact same escaping will be done in the WHERE clause, so it won't affect you anyhow.

        You need to tell Perl to interpolate the string; double quotes should do it. The database engine will receive a string containing a literal beeline character.
Re: Line separators when passing multi-line fields to a database ("\n")
by tye (Sage) on Mar 21, 2011 at 15:15 UTC
    The problem is that Perl, of course, has its own setting for \n that need not be the same. Indeed, Perl's \n and the database's \n are different "\n"s. So what I need to do is to replace either 0D or 0D0A with \n.

    You seem to think that Perl's "\n" can sometimes be "\x0d\x0a". That is not the case. Perl's "\n" is always "\x0a" except for non-ASCII systems (and the one case of the "near ASCII" old Macs). In particular, there is no Perl where 2==length("\n"). Though, if you have read perlport, you can be forgiven for not being the first person to be hopelessly confused by the horrid wording there that is almost correct but also terribly misleading (and also doesn't recognize old Macs as a one-off case of brokenness while also ignoring the possibility of non-ASCII Perl).

    - tye        

      Yes, I did believe that about Perl, and it looks from what you say as though I had got confused somewhere, as I don't remember reading perlport. But it's not there that I'm having a problem. It's with the text from the database that may contain either one or two characters that I need to convert to \n (that's the easy bit) and then pass unescaped back to the database. That's the bit where I'm stuck right now. But thank you for the correction, as I'm pretty sure I would otherwise get bitten some time in the future.

      Regards,

      John Davies

        '\n' eq '\\n'; they are both two characters, a backslash (\) followed by the letter en (n). When you use placeholders, passing '\n' in as a value acts the same as putting a literal '\\n' into the SQL (at least for SQLs where '\\' is how you get a single backslash character).

        "\n" is a newline. When you use placeholders, passing "\n" in as a value acts the same as either putting '\n' (on some databases) or '
        '
        (a literal newline between single quotes) into the SQL.

        It sounds like all you need is s/\r\n/\n/g (unless you need to run your script on old Macs, in which case you want something ugly like:

        my( $cr, $lf )= ( "\r", "\n" ); ( $cr, $lf )= ( "\n", "\r" ) if "\r" eq "\x0a"; # On an old Mac ... s/$cr$lf/$lf/g;

        or you can just ignore the currently even rarer possibility of non-ASCII Perl and do s/\x0d\x0a/\x0a/g, but I refuse to go back to the "hard code character encodings" school of programming even if the current thinking is that Unicode will be the one ring to rule them all forever... eventually).

        - tye