Hello... I have a question to ask... summerize - I have written a script that opens a utf-16 .xml file(s)... that strips out the XML tags and places the text from the XML into an @array for parsing... The XML files are from a backup software tool, the tool writes it's XML logfiles in both utf-16 and also in utf-8 (I need to skip the utf-8 files)the files are in the same directory using the same filename convention... I can already minipulate the utf-16 XML data as I desire however whan I try to process the source directory I also have the UTF-8 XML file there and my open FILEHANDLE function fails with a :BOM error when I attempt to open the UTF-8 files...
On Chatter Box I have ask the question: how can I discover the encoding of a file before opening the file handle I have tried this using
next unless '-B $file'; however this does not work... I was pointed to a module (File::BOM) however this modile is not supported on the intel platform (perl 5.6.4 activestate)
if( $logfile =~ /.+\.xml/){
next unless '-B $logfile';
open(XMLFILE, '<:encoding(utf16)', $logfile)or die "Can't Open:$!";
while(<XMLFILE>) {
$_ =~ s/^.*(<.*>)//g;
$_ =~ s/\r//g;
$_ =~ s/^\s//g;
push @txtfile,$_;
close(XMLFILE);
}# While XML loop
}#if XML loop
print @txtfile;#for debug only
My attempt:
'-B $logfile'; evedently does not tell me the difference between UTF16 or UTF8... Since I only want to process the UTF-16 .xml files i need help with the syntax to identify the UTF-8 files and skip them... I know that I could use XML::simple or XML::parser, but I am attempting to use regex to accomplish this... This IF statement will basically be updating functionallity to existing script with out writing a whole new one...
Thank you for any help that you may provide...
DBrock...
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.