I have >5000 textual files generated in Windows from PDF files that I need to process on a Mac OS X machine. I run dos2unix on all of them to correct the newline and to convert the encoding from UTF-16LE to UTF-8.
In 4949 cases everything goes fine, but for 320 files dos2unix skips the executions saying they are binary files.
This is coherent with of file -c that gives me data for the 320 skipped files and text for the other files. However they are text from a visual inspection ...
How can I repair the 320? At first I suspected it was the presence of the BOM, but it appears also on the files that don't give problems.
Furthermore, both the data and the text files start with:
0000000 ff fe 3d 00 20 00 70 00 61 00 67 00 65 00 20 00
0000010 31 00 20 00 3d 00 0a 00 0d 00 0d 00 0a 00
I know this ha nothing strictly to do with Perl, but I am really lost here ...
Any hint? Thanks in advance.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.