in reply to Re^2: How to open SQL 2005 errorlog?
in thread How to open SQL 2005 errorlog?

Ahh...  the first two bytes (ÿþ = 0xFF 0xFE) most likely is a BOM (Byte Order Mark), indicating that the file is UTF-16le or UCS-2le (le=little-endian) encoded (the distinction between UTF-16 and UCS-2 is probably irrelevant in your case — what's essential is that (at least) two bytes are being used to encode a single character).

In order to properly read such files, you need to open them like this:

unless (open LOG, "<:encoding(ucs-2)", $sqlErrorlog) { ...

or

unless (open LOG, "<:encoding(utf-16)", $sqlErrorlog) { ...

(you need Perl 5.8.x for this to work)

When debug-printing your $_, you should then see the line content as expected.

Just in case you're still having problems (in particular with line endings), you might want to see this post of mine, which describes the problem, and a workaround. Good luck.

Replies are listed 'Best First'.
Re^4: How to open SQL 2005 errorlog?
by jc7 (Initiate) on Jun 07, 2007 at 17:57 UTC
    WOW, You are GREAT! utf-16 works for SQL 2005 error log! Thank you soooooooo much!!

    Can you give me one more help? How should I change this script so it will be able to open error logs of both SQL 2000 and SQL 2005? utf-16 does not work with SQL 2000 log.

    I am new in Perl. I really appreciate for your help!

      I would just switch the open mode, depending on whether it's a 2005 or pre-2005 logfile. In its most simple case, looking for the "2005" in the pathname

      my $openmode = $sqlErrorlog =~ /2005/ ? ">:encoding(utf-16)" : ">"; unless (open LOG, $openmode, $sqlErrorlog) { ...

      (or, as a refinement, you could also check if the year is >= 2005 — presuming they switched to unicode with SQL Server 2005, and future versions will adhere to the same naming scheme, i.e. 2007, 2009, ...   though that might be too many presumptions :)

      However, that approach is kinda cheating, and not generally recommended programming practice (reason: when you/someone happens to change the pathname some time in the future, it's very unlikely you're going to remember this curious dependency buried deep down in the code...).

      So, another way would be to test for the existence of the BOM, and if there, set UTF-16 mode. Actually - as I only recently learned myself - there's already a module for this: File::BOM, which handles the issue even more generally... (see the docs for examples).

        I did some tests. It is hard to detect whether the log is a 2005 or pre-2005 by its pathname in our case. The best way would be to get the file's BOM before scanning. Do you know how to apply the module File::BOM, so I can use it? Please advise if you do. Thanks and have a great weekend!