Hello...
I am trying to convert a XML file encoded in utf-16... I am trying to strip the XML file into a flat ascii text file... I am trying to do this by using regex...
$file = "c:\\temp\\1.xml"; $out = "c:\\temp\\output.txt"; open (FH, $file)or die "Cannot Open $file :$!"; open(OUT, ">$out")or die "Cannot Open $out :$!"; while(<FH>) { s/^.*(<.*>)//g; s/(?<=\w) (?=\w)//g; s/\n\n/\n/g; s/ / /g; print OUT $_; } close FH; close OUT;

I have successfuly striped out all of the XML tags... However the double spacing and double line returns are still in the output...
--------Example of the output TEXT--------- = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = + = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = += J o b s e r v e r : S E R V E R N A M E J o b n a m e : S E R V E R N A M E - I n c J o b s t a r t e d : M o n d a y , D e c e m b e r 2 7 , 2 + 0 0 4 a t 2 : 5 3 : 3 8 P M J o b t y p e : B a c k u p J o b L o g : B E X 0 0 1 6 4 . x m l = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = + = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = += D r i v e a n d m e d i a i n f o r m a t i o n f r o m m e + d i a m o u n t : R o b o t i c L i b r a r y N a m e : C O M P A Q 1 D r i v e N a m e : C O M P A Q 1 S l o t : 1 M e d i a L a b e l : D S W 0 0 0 M e d i a G U I D : { 4 3 1 B 0 3 D E - 1 C 4 9 - 1 1 D 4 - B 2 1 + C - 0 0 5 0 8 B C A 3 A 6 8 } O v e r w r i t e P r o t e c t e d U n t i l : 1 / 3 0 / 2 0 0 + 5 3 : 1 4 : 4 1 A M A p p e n d a b l e U n t i l : 1 2 / 3 1 / 9 9 9 9 1 2 : 0 0 : + 0 0 A M T a r g e t e d M e d i a S e t N a m e : D a i l y = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = + = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = += J o b O p e r a t i o n - B a c k u p M e d i a o p e r a t i o n - a p p e n d . H a r d w a r e c o m p r e s s i o n e n a b l e d . = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = + = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = += ----------------------End output Example----------

If any one could tell me what I'm failing to do correctly I would be able to continue my script upgrade...
Thank you
DBrock...

In reply to Decoding UTF-16 to ASCII by dbrock

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.