Re: regex man they are tough
by Transient (Hermit) on Apr 28, 2005 at 16:01 UTC
|
Is this search string a regexp or a normal string?
If not:
foreach $line ( @data ) {
$line =~ s/ //gm;
$first_part = substr($line, 0, 16); # get first 16
$idx = index( $line, "Search String", 16 );
$second_start = $idx > 30 ? $idx - 30 : 0;
if ( $idx != -1 ) {
$second_part = substr( $line, $second_start, 30 );
$third_part = substr( $line, $idx+length("Search String"), 30 );
} else {
$second_part = $third_part = "";
}
$line = "";
}
(untested)
Update:
Added some bounds checking to avoid errors | [reply] [d/l] |
|
|
the search string is a normal string sorry about that, but this is giving me ideas of which way to go. Trying out
| [reply] |
|
|
$idx = index( $line, "Search String", 16 );
Since we know there must be 30 bytes of data before the Search String, do this:
$idx = index($line, "Search String", 16+30);
The same idea is true for the "third_part" bit in your code and isn't hard to handle. | [reply] [d/l] [select] |
|
|
where does the scalar $index come from in your code above?
| [reply] |
|
|
The scalar $idx contains the return value from the builtin function index.
| [reply] |
Re: regex man they are tough
by cowboy (Friar) on Apr 28, 2005 at 16:08 UTC
|
You should insert some code tags to make it easier to read.
Other than that, your logic seems messed up here.
You might try capturing what you want to keep in
a s/// substition, like below (untested):
foreach my $i (@n) {
# strip any white space
$i =~ s/\s//gm; # is the /m multi-line needed?
# this checks for beginning with 16 digits, plus a matching string.
# this will not match unless both the digits, and the string (plus 3
+0 chars on each end)
# are in your string.
$i =~ s/^(\d{16}).*?(.{30}Some search tring.{30})/$1$2/gm; # again,
+is the /m needed?
}
I would highly recommend reading a tutorial, such as
perldoc perlrequick
perldoc perlretut
Update: fixed typo in my regex
| [reply] [d/l] [select] |
|
|
Might want to make the last group optional, as it should still match for the first 16.
| [reply] |
Re: regex man they are tough
by tlm (Prior) on Apr 28, 2005 at 17:11 UTC
|
foreach $i (@n) {
my ( $first_sixteen ) = $i =~ /^\s*(\d{16})/;
my ( $pre, $post ) =
$i =~ /(.{30})Some search string(.{30})/;
warn "something not right with $i\n", next
if grep !defined $_, $first_sixteen, $pre, $post;
# do something with $first_sixteen, $pre, $post
}
| [reply] [d/l] |
Re: regex man they are tough
by blazar (Canon) on Apr 28, 2005 at 16:29 UTC
|
Hello oh knowledge masters. You guys have helped me along the way to learning about PERL, so I find myself stuck again. I have a large file that I read into an array, and I need to strip out information that is not needed. Now the first 16 characters of the data I want to keep(for every record), and then after that I want to keep 30 characters before and 30 charaters after another set string.
Regexen are not only tough, they're cool too, which is probably the reason why you want to use them in the first place. Said all this, and considering that I've not read in detail the description of your problem nor your attempt, just take into account that:
- It's {sometimes,often} convenient to use two or more regexen rather than a single one (although trying to do it with the latter can be tempting or even appealing),
- For fixed length substring extraxction it's convenient to take a look at substr and unpack as well.
Update: Incidentally, no such a thing as PERL. See: perldoc -q 'difference between "perl" and "Perl"'.
| [reply] [d/l] |
Re: regex man they are tough
by tlm (Prior) on Apr 28, 2005 at 17:34 UTC
|
It occurred to me that the snippet I posted, even if it does what you want, it probably it doesn't tell you much about why yours is not working, so here are just a couple of comments on your regexps. The /m modifier is useful only if you are matching a string that contains multiple lines; it tells perl to match ^ and $ to the beginnings and ends of lines. Study this example and you will see what I mean:
use strict;
use warnings;
my $string = "foo\nbar\nbaz\n";
print "1st match ", $string =~ /^bar/ ? "succeeded\n" : "failed\n";
print "2nd match ", $string =~ /^bar/m ? "succeeded\n" : "failed\n";
print "3rd match ", $string =~ /bar$/ ? "succeeded\n" : "failed\n";
print "4th match ", $string =~ /bar$/m ? "succeeded\n" : "failed\n";
__END__
1st match failed
2nd match succeeded
3rd match failed
4th match succeeded
Next, the /g modifier makes sense only if you are matching the same string multiple times want to match the same regexp multiple times in the same string. For example, using the same string as for the example above:
while ( $string =~ /(a\w+)/g ) {
print "$1\n";
}
__END__
ar
az
Lastly, the expression $i =~ s/.//gm simply sets $i to the empty string (in this case the /m modifier does nothing; you'd get the same result without it). I don't think this gets you anything, but if that's what you wanted to do, it is simpler to just assign the empty string: $i = ''.
Update: Corrected sloppy wording. Thanks to Roy Johnson.
| [reply] [d/l] [select] |
|
|
the /g modifier makes sense only if you are matching the same string multiple times
Not entirely true. In scalar context (such as when it's the conditional of a while), it's about repeated matching. In list context, it will do the global match and return all the captures at once.
$_ = 'pig dog goggles';
my @hits = /g/g;
print "@hits\n";
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |