-p prints to STDOUT, not STDIN.
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] |
Ah. Thanks for that. I've changed that script to:
perl -i -pe 'BEGIN{binmode STDOUT,":encoding(utf16le)"; undef $/;} s/\xFF\xFE/\xFF\xFEFilename:$ARGV\n/' `find . -name '*.TXT'`
however have since discovered that simply writing my new code over the BOM of the original data solved my problem:
perl -i -pe 'BEGIN{undef $/;} s/\xFF\xFE/Filename:$ARGV\n/' `find . -name '*.TXT'`
I'm not sure if this is because my source data might actually be mixed utf8 and utf16le.
In summary then - I don't think I've actually solved outputting in utf-16LE however I have solved my issue at present. Thanks very much to your reply and many of the other posts on this site which helped me comprehend this. Any other tips welcome.
| [reply] [d/l] [select] |
If you ran something like this:
perl -i -pe '(code that runs but does the wrong thing)' `find . -name
+ '*.TXT'`
I hope you had a backup copy of those text files, so that you could start over with the original data. If you can't restore the versions of the files as they were before that command line was run, well... you've got a much harder problem now. For one thing, if the files had been pure UTF-16 before you ran that command, then they probably had a mix if utf8 (ASCII) and UTF-16 content after you ran it -- and other things may have gone wrong as well.
If you can restore the original files, and if that version of the data was all pure UTF-16LE, then something like the following would do what you want:
perl -i.bak -M'open IO => ":encoding(UTF-16LE)"' -pe 'BEGIN{undef $/}
+s/(?<=\x{feff})/Filename: $ARGV\n/' `find . -name '*.TXT'`
Note the following:
- The "-i" option includes a file extension to be added to the name of the original file, so that it will not be overwritten by the new version. (That is, an original file X.TXT is renamed to "X.TXT.bak" before the new version of X.TXT is created.)
- The "-M" option invokes the "open" pragma, to set all IO handles to UTF-16LE encoding (for all files on the command line, text is converted from UTF-16LE to perl-internal utf8 on input, and back to UTF-16LE on output).
- The s/// operator is used with a look-behind assertion for the BOM, so that the BOM is preserved and new text is added immediately after it.
But again, if you now have to work from corrupted versions of the files (because you don't have a restorable backup of the originals), then there's rather more work you have to do (and it'll need fair bit more perl code -- you're not likely to solve it with one-liners on the command line).
A couple other points: (1) Since you are using the "find" command, I would expect that you also have access to "xargs", so that you could use a pipeline command (which tends to be preferable in many situations), like this:
find . -name '*.TXT' | xargs perl -i.bak ...
(2) If this is something you do repeatedly (e.g. at regular intervals on new sets of text files), why not save the perl code as a script? (Typing the name of a script file would be less troublesome than re-typing or copy-pasting the perl code itself.) | [reply] [d/l] [select] |