in reply to Re: Re: More Variable length regex issues
in thread More Variable length regex issues

You don't mention why split is not an option. I am guessing because you aren't just trying to split a string on some delimiter, you are trying to learn regexes, and this is a problem you feel comfortable with. There is nothing wrong with that, but realize that what pzbagel said was your answer ... your values are stored in the array. What you are trying to do - match some arbitrary numbers of items and populate $1 through $N inside the match operator just doesn't make sense to me. I mean, that's what the g modifier is for ... match all occurances, no matter how many you find.

You hint and Java and Python, but you don't specify what language you are really trying to solve this problem in. If i had to guess, i would say you are using PHP or some Java library that modeled itself against Perl's regexes. Can't help you with the Java stuff, but if it's PHP you are using, then try preg_match_all(). It is like preg_match with Perl's g match modifer, but it's usage is a bit tricky:
<?php $mystr = 'foo,bar,moo,cow'; preg_match_all('/(\w+)\,?/',$mystr,$matches); ?> <ul> <?php foreach ($matches[1] as $match) { ?> <li><?=$match?></li> <?php } ?> </ul>
If you are using Python, then you can use the exact same regex with Python's re.findall():
#!/usr/bin/python from re import findall mystr = 'foo,bar,moo,cow' values = findall('(\w+)\,?',mystr) for val in values: print val
Hope this helps, i feel kinda dirty now ... PHP and Python at a Perl site! ;)

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re: (jeffa) 3Re: More Variable length regex issues
by dextius (Monk) on Jun 09, 2003 at 03:47 UTC

    Sheesh, -1 reputation abounds on this one..

    Ok, I'm parsing USMTF, it has repeatable fields all through it. I can use split, but I have non-repeatable fields that require more delicate processing..

    Doing a split will ignore much of the value of parsing using a regex. The problem is, the target text I am matching against is not fixed length, not even fixed patterns. The last few elements can be repeated infinitely.. I guess I'll go back to mastering regular expressions v2 and see if I missed something..

    Thank you for your time..

    ps. I am forced to using Cold Fusion MX, which is powered by Jrun, which uses the oro module. I prefer Perl to solving problems, but I have to take care of this first..

      You keep on using that word 'split' ... i do not think you know what it means. ;) Consider the following:
      my $str = 'foo,bar,moo,cow'; my @value = $str =~ m/(\w+)\,?/g; print "@value\n"; @value = split(',',$str); print "@value\n"
      They both achieve the same results, and guess which one is easier to understand?

      You say have non-repeatable fields, how does using a regex make this easier than split? What do you think split uses to split? A regex! Besides, oro has a family of split functions. You could always do a series of splits if multiple delimiters are used:

      my $str = 'a,b,c:d,e,f:g,h,i'; my @part = split(':',$str); foreach my $part (@part) { my @subpart = split(',',$part); print "@subpart\n"; }
      The split functions found in the org.apache.oro package can do this, you just have to jump through more hoops. ;) Not that it matters, but one of my beefs about Java is not being able to process lists easily like you can in Perl:
      print $_,$/ for map split(',',$_), split(':', $str);
      Best of luck.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
        I am not clearly explaining this issue.. Your examples are not exactly detailing my criteria because I am not fully explaining my problem, I apologize.

        I have a string of characters that use the same delimiter. Some of the fields are mandatory, some are optional, and some may be repeated infinitely. I want to extract those values AND validate the fields all at once within a single regular expression. I want these values to be available to me afterward. A simple example..

        use Data::Dumper; my $foo = "one,123,a s d f,a,b,c,d,e,f,g,h"; my @bar = $foo =~ /^([a-z]{3}),([0-9]{3}),([a-z\s]{1,7}),(?:([a-z]),|( +[a-z]$)){1,}/; print Dumper(\@bar);

        Consider everything after the 3rd element to repeat, possibly to infinity, but we need to make sure they are single characters, otherwise I want the entire regex to fail immediately.

        Again, thank you for your time, you have spent more than enough time working with me, and I very much so appreciate it..