regex - problem with the loop I believe or maybe the regex itself ?

trummelbummel has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regex - problem with the loop I believe or maybe the regex itself ? by AnomalousMonk (Archbishop) on Feb 12, 2014 at 16:36 UTC
I don't entirely understand the OP, but will something like this serve (sorry for all the wraparound)? c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'workset((ab;joiret;garg)) c wasdobao; erhgahufdgah; c workse +t((adsghlia) c aghaoeriarg;oi c aasdfgohaerg c workset(empty) c ah;sorguiaerg c aoi;hgruio;ghaer c p +layA c dIonly'; print qq{'$s'}; ;; my $rev_dIonly = qr{ ylnoId }xms; my $rev_workset = qr{ \(\( teskrow }xms; ;; my $rs = reverse $s; my ($capture) = $rs =~ m{ ($rev_dIonly .*? $rev_workset) }xms; print qq{'$capture'}; $capture = reverse $capture; print qq{'$capture'}; " 'workset((ab;joiret;garg)) c wasdobao; erhgahufdgah; c workset((adsghl +ia) c aghaoeriarg;oi c aasdfgo haerg c workset(empty) c ah;sorguiaerg c aoi;hgruio;ghaer c playA c dI +only' 'ylnoId c Ayalp c reahg;oiurgh;ioa c greaiugros;ha c )ytpme(teskrow c +greahogfdsaa c io;graireoahga c )ailhgsda((teskrow' 'workset((adsghlia) c aghaoeriarg;oi c aasdfgohaerg c workset(empty) c + ah;sorguiaerg c aoi;hgruio;ghaer c playA c dIonly' [download] See perlre, perlretut, perlrequick. (And the proper closing tag for a code block opened with `<c>` is a `</c>` tag: note the `/` forward-slash.)	[reply] [d/l] [select]
Re^2: regex - problem with the loop I believe or maybe the regex itself ? by trummelbummel (Initiate) on Feb 12, 2014 at 18:06 UTC
Thank you but that does unfortunately not help. So firstly, one question, if I put the string into an array separated by a delimiter space, will reversing it reverse the characters itself as well? i thought this is only happening if it is a string, but not with a list? Hence if I print the array it is fine, we do not have to reverse to search for e.g. ylnoId. Secondly, this was just an example of the string maybe a bad one. But the point is. I do not want to change much of the script as it seems to work fine. Except for that I am not able to parse the part in which the if statement is used. The right string is fed into the loop, but then the if statement is not executing. If you use print Dumper \@reversedarray; it works fine until i add the if statement. If you could help me on that i would be more than grateful. If you have any further questions please do not hesitate to ask!	[reply]
Re^3: regex - problem with the loop I believe or maybe the regex itself ? by shmem (Chancellor) on Feb 12, 2014 at 18:18 UTC
So firstly, one question, if I put the string into an array separated by a delimiter space, will reversing it reverse the characters itself as well? i thought this is only happening if it is a string, but not with a list? See reverse. In scalar context, reverse will reverse a string, in list context, reverse reverses a list. If you want both, use `@yra = reverse map { scalar reverse $_ } @ary;` [download] update: if you want to match parens, don't escape the backslash. `- if ($str =~ /\bdIonly:\b(.?)\bworkset\b\\(\\(/g); + if ($str =~ /\bdIonly:\b(.?)\bworkset\b\(\(/g);` [download] perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l] [select]
Re^3: regex - problem with the loop I believe or maybe the regex itself ? by Kenosis (Priest) on Feb 12, 2014 at 18:15 UTC
Can you share an actual line or two that you want to parse, and show what you want extracted? I'm not sure that I understand the need to `reverse` the components of each line.	[reply] [d/l]
Re^4: regex - problem with the loop I believe or maybe the regex itself ? by Anonymous Monk on Feb 12, 2014 at 20:24 UTC
Re^5: regex - problem with the loop I believe or maybe the regex itself ? by AnomalousMonk (Archbishop) on Feb 12, 2014 at 20:41 UTC
Some notes below your chosen depth have not been shown here
Re^5: regex - problem with the loop I believe or maybe the regex itself ? by Kenosis (Priest) on Feb 12, 2014 at 20:31 UTC
Re^4: regex - problem with the loop I believe or maybe the regex itself ? by Anonymous Monk on Feb 12, 2014 at 21:27 UTC
Re: regex - problem with the loop I believe or maybe the regex itself ? by kcott (Archbishop) on Feb 13, 2014 at 06:48 UTC
G'day trummelbummel, Welcome to the monastery. It's much easier for us to help if you provide a short, representative example of your initial data and a clear indication of your expected output. A written description of the data is rarely, if ever, useful. The data should be shown in `<code>...</code>` tags. I realise this is your first post: please just keep this in mind for future reference. I'm wondering if your question falls into the "XY Problem" category: you've focussed more on a specific solution rather than on the actual problem. Your problem seems to be that you have a text file and you need to extract some data from it. Your current solution involves manipulating the data with `split()`, `reverse()` and `join()`: you've said this is necessary — I'm not convinced that it is. If you'd care to consider another solution, take a look at the following (then the Notes at the end). #!/usr/bin/env perl -l use strict; use warnings; my $sep = '-' x 60 . "\n"; my ($start, $end) = ('workset((', 'dIonly'); my $all_to_end_incl; { local $/ = $end; $all_to_end_incl = <DATA>; } print $sep, "Up to and including first '$end':\n", $all_to_end_incl; my $start_end_incl = substr $all_to_end_incl, index $all_to_end_incl, +$start; print $sep, "From '$start' to '$end' inclusive:\n", $start_end_incl; my $start_end_excl = substr $start_end_incl, length($start), -length($ +end); print $sep, "From '$start' to '$end' exclusive:\n", $start_end_excl; my $paren_group_re; $paren_group_re = qr{ $ (?: (?> [^()]+ ) \| (??{ $paren_group_re }) )* $ }x; my $workset_re = qr{ ( workset$\( (?: [^(]+ (??{ $paren_group_re }) [, ]* )* $\) ) }x; $start_end_excl =~ $workset_re; print $sep, "Wanted extract:\n", $1; __DATA__ ... other data before 'workset' found ... workset(( RiskCA(cA, 3) RiskCB(cB, 2)) c workset((RiskCA(cA, 3), RiskCB(cB, 2), totPaycA(cA, 7), totPaycB(c +B, 6))) *********** trial #682 ceq pAAr(rA, cA, P1) c pAAc(rA, cA, P2) c ineqAA(rA, cA, P3) = (pAAc(r +A, cA, ... rl dec c cognum(X2) c watch(X1) c worklist(L) c workset((S, maxtotIneq +C(cB, X))) => watch(X1 + 1) c cognum(X2 + 1) c worklist(nil) c workset(e +mpty) c playA c dIonly [label avoidMaxI] . X2 --> 3 X1 --> 3 [download] That code has output at each stage: partly for demo purposes and partly because I'm not entirely sure which parts you want. Only the last part, which I'm certain you wanted, is displayed; the full output is in the spoiler. `Wanted extract: workset((RiskCA(cA, 3), RiskCB(cB, 2), totPaycA(cA, 7), totPaycB(cB, 6 +)))` [download] <Reveal this spoiler or all in this thread> Notes: The `$paren_group_re` regex may look a little daunting; however, it's almost a verbatim copy of the code example in "perlre: Extended Patterns", so you can find a discussion there. The data is read in blocks up to, and including, '`dIonly`' with '`local $/ = $end;`'. See "perlvar: Variables related to filehandles" for more details. I've only read the filehandle (`DATA`) once; you can use a `while` loop and handle each instance of "`workset((...dIonly`" separately. You haven't given enough information about the input file for me to know, but maybe dealing with multiple shorter strings would be useful. If your "`workset((RiskCA(cA, 3), ..., totPaycB(cB, 6)))`" data is split across multiple lines, the code I've posted will handle this without modification. If you want to get that back into one line and lose the additional whitespace, consider using y/// something like this: '`y/\n / /s`' ("perlop: Quote-Like Operators" has more complete details). Everything else should be fairly straightforward, but do ask if there's something you don't understand. -- Ken	[reply] [d/l] [select]
Re: regex - problem with the loop I believe or maybe the regex itself ? by tangent (Parson) on Feb 12, 2014 at 18:52 UTC
I just want to go from dIonly to the first workset(( including some text and then two parenthesis at the end )) Using your original code this works for me - note that I took out the colon (:) from dIonly. `foreach $str (@reversedarray) { if ($str =~ /\bdIonly\b.?\bworkset\b\(\(([^)])/ ) { print "content of workset before dIonly: $1\n"; } }` [download]	[reply] [d/l]
Re^2: regex - problem with the loop I believe or maybe the regex itself ? by trummelbummel (Initiate) on Feb 13, 2014 at 02:20 UTC
thanks that is a good start. So could you please tell me how I can extend the content that it returns, right now it firstly does not return the whole content of workset, but only a subpart secondly even if I apply global, it does only do it for one part, but there are several dIonly in the string, hence I would want to extract a number of these worksets.	[reply]
Re^3: regex - problem with the loop I believe or maybe the regex itself ? by tangent (Parson) on Feb 13, 2014 at 03:52 UTC
If I understand correctly, and ignoring the reverse line stuff, what you want to extract is everything between the last occurrence of 'workset((..' up to 'dIonly', and that there are a number of these in each line. If that is the case I would forget about reversing and just split each line on 'dIonly', then find the substring for each segment: `my @a_out; my $line = read_file('data.txt'); if ($line =~ /\bdIonly\b/) { # remove everything after last 'dIonly' $line =~ s/(.)\bdIonly\b.?$/$1/; my @segments = split(/\bdIonly\b/,$line); for my $str (@segments) { if ($str =~ /.(\bworkset\b\(\(.)/ ) { my $workset = $1; print "workset content: $workset\n\n"; push(@a_out,$workset); } else { print "no workset\n"; } } }` [download]	[reply] [d/l]
Re: regex - problem with the loop I believe or maybe the regex itself ? by 2teez (Vicar) on Feb 12, 2014 at 18:43 UTC
hi trummelbummel, ..but I am reversing it for the above reasons to: dIonly c ....................................workset((.......)) and I need the items in the workset as well... Maybe something like this could help: `use warnings; use strict; my $node = 'workset((ab;joiret;garg)) c wasdobao; erhgahufdgah; c workset((adsghl +ia) c aghaoeriarg;oi c aasdfgohaerg c workset(empty) c ah;sorguiaerg +c aoi;hgruio;ghaer c playA c dIonly '; my $modified = join q{ } => reverse split /\s+/ => $node; # you could print modified to see the reversed string if ( my @dat = $modified =~ /$\(?(.+?)$/g ) { print "@dat\n"; # prints empty adsghlia ab;joiret;garg }` [download] You might have to look into the documentations mentioned by AnomalousMonk. Hopes that helps. If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l]