Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

What I am doing wrong?

If have a flat file that is | delimited

% cat flatfile.pipe COL1 | someother data|122343221|blahbalhbalh

other times I get flat files that are : delimited

% cat flatfile.colon COL1 : someother data:122343221:blahbalhbalh

so ...

I wrote a perl script that does something like this ...

% cat splitit.pl #!/sur/bin/perl use Getopt::Std; getopts( 'c:' ); while(<>) { chomp; (@columns) = split /$opt_c/; } foreach $col (@columns) { print "[$col]\n"; }

so ... when I do ...

% ./splitit.pl -c':' < flatfile.colons

... it get output ...

[COL1 ] [ someother data] [122343221] [blahbalhbalh]

... which is good. But when I do ...

% ./splitit.pl -c'|' < flatfile.pipes

... I get output ...

[C] [O] [L] [1] [ ] [ ] [ ] [ ] [|] [ ] [s]

... you get the idea I hope because I sure don't! :)

Thanks

Edit ar0n -- fixed formatting

Replies are listed 'Best First'.
Re: split $c
by japhy (Canon) on Dec 07, 2001 at 04:38 UTC
    The reason is because "|" is a regex metacharacter. Do:
    $opt{c} = quotemeta $opt{c};
    instead, before you use it in a regex.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Thanks!:)
Re: split $c
by belg4mit (Prior) on Dec 07, 2001 at 04:40 UTC
    % ./splitit.pl -c'\|' < flatfile.pipes
    You need to escape the pipe as it is being interpreted by the regexp engine split on null or null. Splitting on null splits between each byte of the string.

    --
    perl -p -e "s/(?:\w);([st])/'\$1/mg"

Re: split $c
by Masem (Monsignor) on Dec 07, 2001 at 04:40 UTC
    Another option, if ':' doesn't appear in the terms of the '|' delimited files, or '|' in the terms of the ':' files, and avoids use of GetOpts:
    @columns = split /\:|\|/;

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

Re: split $c
by hopes (Friar) on Dec 07, 2001 at 04:44 UTC
    Hello: You have to escape the '|' because is interpreted in the split as 'OR'.
    Try this code to see what I say:
    use strict; my $a='hello|world'; my $c='|'; my @letras = split /$c/,$a; for (@letras) { print "[$_]"; }
    Change the third line to do the split:
    my $c='\|';
    Hope this helps

    Hopes
    $_=$,=q,\,@4O,,s,^$,$\,,s,s,^,b9,s, $_^=q,$\^-]!,,print
(tye)Re: split $c
by tye (Sage) on Dec 07, 2001 at 19:56 UTC

    Although this gives the same effect as some of the other answers in the thread, my preferred solution is to always use \Q and \E around a string in a regex where you want the string to match literally rather than be interpretted as a regex. This means I'd rewrite your code as:

    (@columns) = split /\Q$opt_c\E/;
    I find that this rule is easy to apply in lots of situations, sometimes easier than using quotemeta and I think it makes the code easier to understand. If I were to use quotemeta in production code, I'd have to write:
    my $delim= getDelimiter(); my $delimRegex= quotemeta($delim);
    to properly document what was going on.

    All that said, I suggest you look into Text::xSV as it handles these types of "comma-separated values" file formats more robustly than a simple split does.

            - tye (but my friends call me "Tye")
Re: split $c
by dru145 (Friar) on Dec 07, 2001 at 21:49 UTC
    Please don't put everything in code tags. It's difficult to distinguish between your question and the snippets of code.