regex man they are tough

tgolf4fun has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regex man they are tough by Transient (Hermit) on Apr 28, 2005 at 16:01 UTC
Is this search string a regexp or a normal string? If not: `foreach $line ( @data ) { $line =~ s/ //gm; $first_part = substr($line, 0, 16); # get first 16 $idx = index( $line, "Search String", 16 ); $second_start = $idx > 30 ? $idx - 30 : 0; if ( $idx != -1 ) { $second_part = substr( $line, $second_start, 30 ); $third_part = substr( $line, $idx+length("Search String"), 30 ); } else { $second_part = $third_part = ""; } $line = ""; }` [download] (untested) Update: Added some bounds checking to avoid errors	[reply] [d/l]
Re^2: regex man they are tough by tgolf4fun (Novice) on Apr 28, 2005 at 17:31 UTC
the search string is a normal string sorry about that, but this is giving me ideas of which way to go. Trying out	[reply]
Re^2: regex man they are tough by cazz (Pilgrim) on Apr 29, 2005 at 14:56 UTC
I prefer to use regexps instead of your example due to brevity of handling the failure cases. Even so, you can tweak yours to be a bit more resilant. This bit doesn't take into account not having 30 characters before the search string: `$idx = index( $line, "Search String", 16 );` [download] Since we know there must be 30 bytes of data before the Search String, do this: `$idx = index($line, "Search String", 16+30);` [download] The same idea is true for the "third_part" bit in your code and isn't hard to handle.	[reply] [d/l] [select]
Re^2: regex man they are tough by tgolf4fun (Novice) on Apr 29, 2005 at 16:59 UTC
where does the scalar $index come from in your code above?	[reply]
Re^3: regex man they are tough by Transient (Hermit) on Apr 29, 2005 at 17:07 UTC
The scalar $idx contains the return value from the builtin function index.	[reply]
Re: regex man they are tough by cowboy (Friar) on Apr 28, 2005 at 16:08 UTC
You should insert some code tags to make it easier to read. Other than that, your logic seems messed up here. You might try capturing what you want to keep in a s/// substition, like below (untested): `foreach my $i (@n) { # strip any white space $i =~ s/\s//gm; # is the /m multi-line needed? # this checks for beginning with 16 digits, plus a matching string. # this will not match unless both the digits, and the string (plus 3 +0 chars on each end) # are in your string. $i =~ s/^(\d{16}).*?(.{30}Some search tring.{30})/$1$2/gm; # again, +is the /m needed? }` [download] I would highly recommend reading a tutorial, such as `perldoc perlrequick perldoc perlretut` [download] Update: fixed typo in my regex	[reply] [d/l] [select]
Re^2: regex man they are tough by Transient (Hermit) on Apr 28, 2005 at 16:11 UTC
Might want to make the last group optional, as it should still match for the first 16.	[reply]
Re: regex man they are tough by tlm (Prior) on Apr 28, 2005 at 17:11 UTC
Is this what you want? `foreach $i (@n) { my ( $first_sixteen ) = $i =~ /^\s*(\d{16})/; my ( $pre, $post ) = $i =~ /(.{30})Some search string(.{30})/; warn "something not right with $i\n", next if grep !defined $_, $first_sixteen, $pre, $post; # do something with $first_sixteen, $pre, $post }` [download] the lowliest monk	[reply] [d/l]
Re: regex man they are tough by blazar (Canon) on Apr 28, 2005 at 16:29 UTC
Hello oh knowledge masters. You guys have helped me along the way to learning about PERL, so I find myself stuck again. I have a large file that I read into an array, and I need to strip out information that is not needed. Now the first 16 characters of the data I want to keep(for every record), and then after that I want to keep 30 characters before and 30 charaters after another set string. Regexen are not only tough, they're cool too, which is probably the reason why you want to use them in the first place. Said all this, and considering that I've not read in detail the description of your problem nor your attempt, just take into account that: It's {sometimes,often} convenient to use two or more regexen rather than a single one (although trying to do it with the latter can be tempting or even appealing), For fixed length substring extraxction it's convenient to take a look at substr and unpack as well. Update: Incidentally, no such a thing as PERL. See: `perldoc -q 'difference between "perl" and "Perl"'`.	[reply] [d/l]
Re: regex man they are tough by tlm (Prior) on Apr 28, 2005 at 17:34 UTC
It occurred to me that the snippet I posted, even if it does what you want, it probably it doesn't tell you much about why yours is not working, so here are just a couple of comments on your regexps. The `/m` modifier is useful only if you are matching a string that contains multiple lines; it tells perl to match `^` and `$` to the beginnings and ends of lines. Study this example and you will see what I mean: `use strict; use warnings; my $string = "foo\nbar\nbaz\n"; print "1st match ", $string =~ /^bar/ ? "succeeded\n" : "failed\n"; print "2nd match ", $string =~ /^bar/m ? "succeeded\n" : "failed\n"; print "3rd match ", $string =~ /bar$/ ? "succeeded\n" : "failed\n"; print "4th match ", $string =~ /bar$/m ? "succeeded\n" : "failed\n"; __END__ 1st match failed 2nd match succeeded 3rd match failed 4th match succeeded` [download] Next, the `/g` modifier makes sense only if you ~~are matching the same string multiple times~~ want to match the same regexp multiple times in the same string. For example, using the same string as for the example above: `while ( $string =~ /(a\w+)/g ) { print "$1\n"; } __END__ ar az` [download] Lastly, the expression `$i =~ s/.//gm` simply sets $i to the empty string (in this case the `/m` modifier does nothing; you'd get the same result without it). I don't think this gets you anything, but if that's what you wanted to do, it is simpler to just assign the empty string: `$i = ''`. Update: Corrected sloppy wording. Thanks to Roy Johnson. the lowliest monk	[reply] [d/l] [select]
Re^2: regex man they are tough by Roy Johnson (Monsignor) on Apr 28, 2005 at 18:08 UTC
the /g modifier makes sense only if you are matching the same string multiple times Not entirely true. In scalar context (such as when it's the conditional of a while), it's about repeated matching. In list context, it will do the global match and return all the captures at once. `$_ = 'pig dog goggles'; my @hits = /g/g; print "@hits\n";` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]