PERL Command line to batch add filename to start of file in UTF-16le

irenabyss has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks, I have a great batch perl command line script: http://www.unix.com/members/302084659.html by k_manimuthu that does almost exactly what I need running in OSX terminal:

perl -i -pe 'BEGIN{undef $/;} s/^/\nFilename:$ARGV\n/' `find . -name '*.TXT'` However the files I'm picking up are UTF-16LE and the appended filename text being added to the front of the file is being outputted in UTF8. I have tried various things to print the filename in UTF-16LE, the closest that actually runs is: perl -i -pe 'BEGIN{binmode STDIN,":encoding(utf16le)"; undef $/;} s/^/\nFilename:$ARGV\n/' `find . -name '*.TXT'`

however this still seems to output utf8. Can anyone improve k_manimuthu's little gem of a script? Thanks

Comment on PERL Command line to batch add filename to start of file in UTF-16le Select or Download Code

Replies are listed 'Best First'.
Re: PERL Command line to batch add filename to start of file in UTF-16le by choroba (Cardinal) on Dec 04, 2015 at 22:33 UTC
-p prints to STDOUT, not STDIN. ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l]
Re^2: PERL Command line to batch add filename to start of file in UTF-16le by irenabyss (Initiate) on Dec 05, 2015 at 03:28 UTC
Ah. Thanks for that. I've changed that script to: perl -i -pe 'BEGIN{binmode STDOUT,":encoding(utf16le)"; undef $/;} s/\xFF\xFE/\xFF\xFEFilename:$ARGV\n/' `find . -name '.TXT'` however have since discovered that simply writing my new code over the BOM of the original data solved my problem: perl -i -pe 'BEGIN{undef $/;} s/\xFF\xFE/Filename:$ARGV\n/' `find . -name '.TXT'` I'm not sure if this is because my source data might actually be mixed utf8 and utf16le. In summary then - I don't think I've actually solved outputting in utf-16LE however I have solved my issue at present. Thanks very much to your reply and many of the other posts on this site which helped me comprehend this. Any other tips welcome.	[reply] [d/l] [select]
Re^3: PERL Command line to batch add filename to start of file in UTF-16le by graff (Chancellor) on Dec 06, 2015 at 00:10 UTC
If you ran something like this: perl -i -pe '(code that runs but does the wrong thing)' `find . -name + '.TXT'` [download] I hope you had a backup copy of those text files, so that you could start over with the original data. If you can't restore the versions of the files as they were before that command line was run, well... you've got a much harder problem now. For one thing, if the files had been pure UTF-16 before you ran that command, then they probably had a mix if utf8 (ASCII) and UTF-16 content after you ran it -- and other things may have gone wrong as well. If you can restore the original files, and if that version of the data was all pure UTF-16LE, then something like the following would do what you want: perl -i.bak -M'open IO => ":encoding(UTF-16LE)"' -pe 'BEGIN{undef $/} +s/(?<=\x{feff})/Filename: $ARGV\n/' `find . -name '.TXT'` [download] Note the following: The "-i" option includes a file extension to be added to the name of the original file, so that it will not be overwritten by the new version. (That is, an original file X.TXT is renamed to "X.TXT.bak" before the new version of X.TXT is created.) The "-M" option invokes the "open" pragma, to set all IO handles to UTF-16LE encoding (for all files on the command line, text is converted from UTF-16LE to perl-internal utf8 on input, and back to UTF-16LE on output). The s/// operator is used with a look-behind assertion for the BOM, so that the BOM is preserved and new text is added immediately after it. But again, if you now have to work from corrupted versions of the files (because you don't have a restorable backup of the originals), then there's rather more work you have to do (and it'll need fair bit more perl code -- you're not likely to solve it with one-liners on the command line). A couple other points: (1) Since you are using the "find" command, I would expect that you also have access to "xargs", so that you could use a pipeline command (which tends to be preferable in many situations), like this: `find . -name '*.TXT' \| xargs perl -i.bak ...` [download] (2) If this is something you do repeatedly (e.g. at regular intervals on new sets of text files), why not save the perl code as a script? (Typing the name of a script file would be less troublesome than re-typing or copy-pasting the perl code itself.)	[reply] [d/l] [select]