raj8 has asked for the wisdom of the Perl Monks concerning the following question:

Monks, In an attempt to open a list of files and insert them into a database, the process fails because some of the file names have special characters such as semicolons, apostrophes, dollars signs, carrots etc.. How, can I strip this special charecters out without damaging the file information? This is what I have so far:
open(INFILE, "$command |"); print "$command -report\n"; while (<INFILE>) { $files = @f[15]; print OUTFILE "$SQL_insert ('$files');\n";

Replies are listed 'Best First'.
Re: Replacing charecters in files
by tachyon (Chancellor) on Sep 03, 2003 at 04:08 UTC

    rename the files?

    my @files = glob("./*"); for my $filename (@files) { (my $newname = $filename ) =~ s/[^A-Za-z0-9\.]/_/; if ( -e $newname ) { warn "$newname already exists, skipping rename on $filename\n" +; } else { rename $filename, $newname; } }

    If you want to change what is actually in the files you can use an inplace edit with a suitable regex

    perl -pi.bak -e 's/[^\w\.\t \n]//' <files>

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Replacing charecters in files
by Abigail-II (Bishop) on Sep 03, 2003 at 08:31 UTC
    I hope I understood your problem right. I think your problem is that $command may contain characters that are special for the shell, and they cause the open to fail.

    Answers to these problems can be found in man perlipc and perldoc -f exec. What you need to do is a "safe pipe open", opening a pipe that doesn't involve the shell. The way to do this is forking your process, with a pipe from the child to the parent, and then doing an exec to $command in the child.

    Forking a child and opening a pipe between them can be done in a single Perl command:

    my $pid = open my $kid => "-|";

    This forks the program, returning the child PID in the parent, while opening a pipe from the child to the parent. If the fork fails, $pid is undefined.

    The next tricky thing in the exec. If we would do a simple exec $command, Perl would call the shell if $command contains special characters, and that is what we are trying to avoid. If the command had arguments, we could supply exec with a list (of more than one element) and exec would avoid calling the shell, but we don't have that option. But there is another way we can have exec avoid calling the shell, and that is by giving it a block as first argument. The result of the block will be how the program we are going to call is named, so we can just supply $command. This would give us:

    exec {$file} $file or die "exec() failed: $!\n";

    A complete program that does a safe pipe open:

    #!/usr/bin/perl use strict; use warnings; my $file = '....'; # Command with special characters. my $pid = open my $kid => "-|"; die "fork() failed: $!\n" unless defined $pid; unless ($pid) { exec {$file} $file or die "exec() failed: $!\n"; } while (<$kid>) { print; } __END__

    Abigail

Re: Replacing charecters in files
by esh (Pilgrim) on Sep 03, 2003 at 05:04 UTC

    I don't think you've provided quite enough information to get a complete answer. Two aspects of your question seem vague to me:

    1. You provide some samples of "special characters" but end it with "etc..." Knowing exactly what you consider to be "acceptable characters" and what you consider to be "special characters" may change the answer a bit.

    2. You say you want to "strip special characters" without "damaging the file information". I would need to know what the resulting output is going to be used for in order to determine if the information is damaged in the stripping process.

    On the second point, it may help to provide both a description of what the information is going to be used for and some examples of what your input and desired output should be.

    If deleting the special characters damages the information, you may want to encode them or escape them, but the way to do this is highly context dependent.

    Without additional information, all I can offer is to add one line to your sample code:

    open(INFILE, "$command |"); print "$command -report\n"; while (<INFILE>) { $files = @f[15]; # Delete special characters like ; ' $ ^ $files =~ tr/;'$^//d print OUTFILE "$SQL_insert ('$files');\n"; ... }

    -- Eric Hammond

Re: Replacing charecters in files
by bart (Canon) on Sep 03, 2003 at 10:08 UTC
    You probably don't want to strip them, but escape them, because, if you strip them, you do damage the information. Anyway, whatever you do, you should do it by modifying $file before inserting it in that string.

    You can escape apostrophes (') and backslashes (\) this way:

    $file =~ s/([\\'])/\\$1/g;
    If you do want to strip some characters, like semicolons and quotes, in the most straightforward manner, you can do:
    $file =~ tr/";//d;
    But likely, you may be wanting a smarter way of processing the data, and use some clever s/// trick. However, I have no idea on what is a generally acceptable format for all cases.

    p.s. Please don't use @f[15], use $f[15] instead. If you ran this script with warnings enabled, you'd get a warning about it. Perhaps it works, but my rule of thumb is that you should only use the "@" syntax only when in list context — it is used in scalar context, here. Perl may disagree and never like array slices (because that's what you used) of just one item, so that's where we disagree. :)

Re: Replacing charecters in files
by williamp (Pilgrim) on Sep 03, 2003 at 05:58 UTC
    If the files are from a non *nix platform blank spaces will probably be a problem as well, $file =~ s/ /_/;

      If the files are from a *nix platform blank spaces will still be a problem.

      --
      TTTATCGGTCGTTATATAGATGTTTGCA

Re: Replacing charecters in files
by Hutta (Scribe) on Sep 03, 2003 at 12:50 UTC
    You should also look into using the DBI module to communicate with the database, which would avoid having to do any shell escapes on your data at all. You'd still have to quote out database-special chars, but the DBI module provides a quote() method that handles it for you.