Chomping most of a _long

RupertSw has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Chomping most of a _long_ text string by holli (Abbot) on May 29, 2005 at 15:34 UTC
Why not just `s/^.+_{70}//s` [download] ? holli, /regexed monk/	[reply] [d/l]
Re^2: Chomping most of a _long_ text string by RupertSw (Initiate) on May 29, 2005 at 15:58 UTC
That'll be fine on a long bit of text? I got the impression you were supposed to avoid REs on > ~1kb bits of text. In which case, thanks a lot, and I'll shut up and get on with it! Rupert	[reply]
Re^3: Chomping most of a _long_ text string by ysth (Canon) on May 29, 2005 at 17:06 UTC
That should be `s/^.+?_{70}//s`.	[reply] [d/l]
Re^3: Chomping most of a _long_ text string by holli (Abbot) on May 29, 2005 at 16:00 UTC
No, regexes scale well with long strings. holli, /regexed monk/	[reply] [d/l]
Re^2: Chomping most of a _long_ text string by RupertSw (Initiate) on May 29, 2005 at 21:22 UTC
My goodness! Far more replies than I expected! Regexes are working wonderfully for me, and I'm pleased to say I got the anti-greediness question mark trick myself (but went to work as a waiter, hence not thanking all the replies earlier) What I was actually parsing was a mailbox full of past issues of @Risk, a security list. My code now has a loop doing something like this: `$$email =~ s/^.+?_{70}\n\n([[:digit:]])/$1/s; while($$email =~ /(\d{2}\.\d+\.\d) CVE: ([^\n]+)\nPlatform: ([^\n]+)\n +Title: ([^\n]+)\nDescription: (.+?)\nRef: (http[^\n]*)/gs) { print "CVE: $2\nPlat: $3\nTitle: $4\nDesc: $5\nURL: $6\n\n"; }` [download] And, yes, I have yet to tidy up the RE, but it works and is more than fast enough for what I need. Many thanks again Rupert	[reply] [d/l]
Re: Chomping most of a _long_ text string by TedPride (Priest) on May 29, 2005 at 17:51 UTC
index works just fine here, no need to use regex. `use strict; use warnings; my $u = 70; # Number of underscores my $t = join '',<DATA>; substr($t,0,index($t,"\n".'_'x$u."\n")+$u+2) = ''; print $t; __DATA__ Junk data goes here ______________________________________________________________________ Useful data goes here` [download]	[reply] [d/l]
Re^2: Chomping most of a _long_ text string by ihb (Deacon) on May 29, 2005 at 18:37 UTC
I pondered whether I prefered `index()` or `$+[0]` and I concluded `$+[0]`. The `index()` expression becomes so cluttered, and it has a problem if the substring isn't found. Then you'll destroy the beginning of the string anyway, and remove `$u`-1 chars from the beginning. I figured that most likely, if the marker isn't there, it's already removed. If not you still need to perform some check, and for me the regex version is nicer. I'd like to flip the coin and say "matching works find here, no need to use `index()`", but if one likes `index()` one should use `index()`. :-) `ihb` See perltoc if you don't know which perldoc to read!	[reply] [d/l] [select]
Re: Chomping most of a _long_ text string by ihb (Deacon) on May 29, 2005 at 16:40 UTC
`# Find the mark. $str =~ /_{70}/ and substr($str, 0, $+[0], ''); # Replace everything up to right after # the match with the empty string.` [download] `ihb` See perltoc if you don't know which perldoc to read!	[reply] [d/l]
Re^2: Chomping most of a _long_ text string by ysth (Canon) on May 29, 2005 at 17:10 UTC
See perltoc if you don't know which perldoc to read! I would actually recommend starting with perldoc perl, not perltoc.	[reply]
Re^3: Chomping most of a _long_ text string by ihb (Deacon) on May 29, 2005 at 18:29 UTC
Both documents are great, but I get the feeling perltoc is a far less known document, so I prefer to spread that instead. perltoc is right to the target if you're looking for documentation, just like perlfunc is when looking for functions. (Reading "See perldoc perl" might also feel like the ultimate RTFM slap.) `ihb` See perltoc if you don't know which perldoc to read!	[reply]
Re: Chomping most of a _long_ text string by davidrw (Prior) on May 29, 2005 at 18:52 UTC
Im not sure if this is any better performance-wise than the regex suggestions, but you could use `split`: `(undef, $goodpart) = split(/_{70}/, $msg);` [download] This works better if $msg is a single email at a time, though you could split a whole inbox as well...	[reply] [d/l] [select]
Re^2: Chomping most of a _long_ text string by ihb (Deacon) on May 29, 2005 at 19:19 UTC
You'd better have a limit on that split as well, so that you don't chop up any e-mails that have 70 underscores in it: `(undef, $goodpart) = split /_{70}/, $msg, 2;` [download] Note that since this supposedly is a huge string you create a huge copy while doing this, even if you assign it back to `$msg`, afaik. `ihb` See perltoc if you don't know which perldoc to read!	[reply] [d/l] [select]
Re: Chomping most of a _long_ text string by TedPride (Priest) on May 29, 2005 at 21:20 UTC
He said that the emails are automated, so I'm assuming that every email has the marker in it. It would also be easy to modify the code so that the substr is only done if index > -1. You're probably right though that the savings between regex and substr are minimal enough so that the neater code can be used rather than the most efficient.	[reply]