meisterperl has asked for the wisdom of the Perl Monks concerning the following question:

I have tried to use the (.*)\| and several variations of that in a regex to grab data between pipe delimiters. A sample source row of data is:
Johnathan Company| 2.33 |-2.5|Jeremiah|-5
There should be a way to grab into $1 the first part of data, then into $2 the second part of the delimited data, etc..
-I even tried (.*\|)(.*\|) etc.. I would like to get these all with one expression, but I can't seem to tweak it right. All the data is completely variable, and inconsistent, the only consistency is the pipes. As another option, is there a way to read in a specified number of characters into a var, then from that point, read more into another variable, etc? If you know of any good examples, please point me to them, thanks!

---Thanks for the help, I completely overlooked ?
  • Comment on Grabbing anything between pipe delimiters

Replies are listed 'Best First'.
Re: Grabbing anything between pipe delimiters
by Tanktalus (Canon) on Feb 25, 2005 at 19:34 UTC

    To solve your problem without answering the question:

    my @fields = split /\|/, $line;
    Using regexps is possible, but, given your requirements, sounds like the wrong solution. Something like this would work, but I'd still suggest split.
    my @a = $line =~ /([^\|]+)\|?/g;

Re: Grabbing anything between pipe delimiters
by dragonchild (Archbishop) on Feb 25, 2005 at 19:30 UTC
    .* is greedy. You want .*?

    Alternately, you might want to look at split() instead of a regex.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Grabbing anything between pipe delimiters
by RazorbladeBidet (Friar) on Feb 25, 2005 at 19:34 UTC
    Either you can try:

    my @results = $string =~ /(.*?)\|/g or
    my @results = $string =~ /([^\|]*)\|/g or
    my @results = split /\|/, $string

    --------------
    It's sad that a family can be torn apart by such a such a simple thing as a pack of wild dogs
Re: Grabbing anything between pipe delimiters
by JediWizard (Deacon) on Feb 25, 2005 at 19:42 UTC

    You could also use Text::CSV_XS. It is designed for comma seperated data, but you can set the character to be used as a seperator to what ever you want. That would even handle data like:

    some field|another field|"field with | in it"|more fields
    May the Force be with you
      Text::xSV is better as it handles more edge cases.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Grabbing anything between pipe delimiters
by jZed (Prior) on Feb 25, 2005 at 19:38 UTC
    You might want to check out AnyData or DBD::AnyData which handle pipe-separated files with either a tied-hash or a DBI/SQL interface. BTW, they aren't delimiters - delimiters go around things; they are separators - separators separate things the way the pipes separate the fields in your example.
Re: Grabbing anything between pipe delimiters
by sh1tn (Priest) on Feb 25, 2005 at 19:48 UTC
    An excerpt from perldoc:
    An example: $s1 = $s2 = "I am very very cold"; $s1 =~ s/ve.*y //; # I am cold $s2 =~ s/ve.*?y //; # I am very cold

    perldoc -q regex
    perldoc perlre


Re: Grabbing anything between pipe delimiters
by brian_d_foy (Abbot) on Feb 25, 2005 at 21:00 UTC

    Your regular expressions are probably matching too much, and I'm guessing that you are seeing some pipe characters in the stuff in $1 and so on. You want to make the quantifiers non-greedy so they stop matching when they find the thing that comes after them.

    @matches = m/(.*?)\|/g;

    But you might want split()

    @matches = split m/\|/;

    As for reading characters into a variable, if you are reading from a filehandle, read() may be what you are after. If you are getting the characters from another scalar, you probably want substr() and index(). See the perlfunc page for the details.

    Good luck!

    --
    brian d foy <bdfoy@cpan.org>
Re: Grabbing anything between pipe delimiters
by Marcelo (Initiate) on Feb 25, 2005 at 21:32 UTC
    What about
    $s = "Johnathan Company| 2.33 |-2.5|Jeremiah|-5"; @a = split /\|/, $s; # Use $a[0], etc.