Hello... I have a question to ask... summerize - I have written a script that opens a utf-16 .xml file(s)... that strips out the XML tags and places the text from the XML into an @array for parsing... The XML files are from a backup software tool, the tool writes it's XML logfiles in both utf-16 and also in utf-8 (I need to skip the utf-8 files)the files are in the same directory using the same filename convention... I can already minipulate the utf-16 XML data as I desire however whan I try to process the source directory I also have the UTF-8 XML file there and my open FILEHANDLE function fails with a :BOM error when I attempt to open the UTF-8 files... On Chatter Box I have ask the question: how can I discover the encoding of a file before opening the file handle I have tried this using  next unless '-B $file'; however this does not work... I was pointed to a module (File::BOM) however this modile is not supported on the intel platform (perl 5.6.4 activestate)
if( $logfile =~ /.+\.xml/){ next unless '-B $logfile'; open(XMLFILE, '<:encoding(utf16)', $logfile)or die "Can't Open:$!"; while(<XMLFILE>) { $_ =~ s/^.*(<.*>)//g; $_ =~ s/\r//g; $_ =~ s/^\s//g; push @txtfile,$_; close(XMLFILE); }# While XML loop }#if XML loop print @txtfile;#for debug only
My attempt: '-B $logfile'; evedently does not tell me the difference between UTF16 or UTF8... Since I only want to process the UTF-16 .xml files i need help with the syntax to identify the UTF-8 files and skip them... I know that I could use XML::simple or XML::parser, but I am attempting to use regex to accomplish this... This IF statement will basically be updating functionallity to existing script with out writing a whole new one...

Thank you for any help that you may provide...

DBrock...

In reply to how do I check encoding before opening FILEHANDLE by dbrock

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.