moesplace has asked for the wisdom of the Perl Monks concerning the following question:

Gurus: I have 310,000+ files that have some hex characters in them. I'd like to strip these out, and am trying to use Perl and the tr command. These hex codes are interspersed in the lines who-knows where. Plus some files are just fine.

Please help.

use strict; use warnings; # Declare package names my $work_file; my @files = <E:\\triggers\\1_MERGE_3326\\tests\\*.hl7>; #$work_file = pop; # If work_file comes from command line, uncomment my $inPath = "E:\\triggers\\1_MERGE_3326\\pass"; my $outPath = "E:\\triggers\\1_MERGE_3326\\converted"; foreach $work_file (@files) { open (WORK, "<$work_file") or die "Couldn't open $work_file."; # Open +the working file open (my $file, ">$outPath\\$work_file") or die "Couldn't open $work_f +ile."; # open an output file while (<WORK>) { chomp; $_ =~ tr/\000-\011\013\014\016-\037//d; print $file map {"$_\n"} ($_); } # end while close WORK; close $file; } # end foreac

I keep getting a "Couldn't open E:\triggers\1_MERGE_3326\tests\20131209_180245424_R01.hl7. at E:\triggers\1_MERGE_3326\remove_chars_dir.pl line 16.

This file is the very first file that appears in my directory listing. I'm also logged in with an account with administrative privileges, so it can't be a permissions issue.

Does a file have to exist before it can be opened? If so, how to create one?

forgive the noob questions, please

Replies are listed 'Best First'.
Re: Removing non-printing (hex codes) from text files
by ww (Archbishop) on Feb 06, 2014 at 20:32 UTC
    "Does a file have to exist ...?"

    The Zen answer: Can you open a paper bag if you don't have a bag?"

    Actually, Perl is somewhat less rigid about that: The answer lies in your terminal: perldoc -f open.

    As to your line 26 and title: you seem to have an odd spec: you talk about hex but present octal values....

    and those appear to be:

    \000-\011\013\014\016-\037 NUL TAB VT FF SO US
    or, if interpreted as hex:
     NUL  DC1  DC3   DC4    SYN   Digit-'7'

    Are you sure you've identified the chars you wish to remove correctly?

    Come, let us reason together: Spirit of the Monastery
    If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.
Re: Removing non-printing (hex codes) from text files
by toolic (Bishop) on Feb 06, 2014 at 17:33 UTC
    Is this line 16?
    open (WORK, "<$work_file") or die "Couldn't open $work_file."; # Open +the working file

    If so, maybe it's a backslash thing (sorry... I don't do Windows). Try this simplified code:

    use strict; use warnings; my $work_file = 'E:\triggers\1_MERGE_3326\tests\20131209_180245424_R01 +.hl7'; open (WORK, "<$work_file") or die "Couldn't open $work_file : $!";

    Tip #7 from the Basic debugging checklist ... $!

Re: Removing non-printing (hex codes) from text files
by oiskuu (Hermit) on Feb 06, 2014 at 22:54 UTC

    Do you know what encoding these files are using? There are tools to convert from one encoding to another.

    If you really need to filter unprintable characters, line by line, then

    $\ = "\n"; while (<>) { s/[^[:print:]]//g; print; }
    does this, but also converts to unix line endings...

    tr/// will be faster, however. Characters \177-\377 aren't ASCII printable, you may want to filter those, too. There's no need to chomp line ends, only to add them back later.

    while (<WORK>) { tr/\000-\011\013\014\016-\037\177-\377//d; print $file $_; }

    Even better is to specify the good characters, remove anything other. Consider using fixed-length input (avoid arbitrary big line buffers). Finally, you probably want raw binmode to avoid problems with ^Z and so on. See open.

    use open IO => ':bytes'; $/ = \4096; ... while (<WORK>) { tr/\012\015\040-\176//cd; print $file $_; }

Re: Removing non-printing (hex codes) from text files
by ww (Archbishop) on Feb 07, 2014 at 13:53 UTC
Re: Removing non-printing (hex codes) from text files
by moesplace (Novice) on Feb 06, 2014 at 18:09 UTC

    Does a file have to exist before it can be opened? If so, how to create one?

    forgive the noob questions, please

      Slowly figuring this out. I've got a full pathname in a variable that I'm trying to prepend some of the same information to. E:\triggers\...E:\triggers.

      How do I just get the actual filename without all of the path information into a variable?