How do I delete from a delimiter to the end of a file?

wcw has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, My knowledge of PERL is quite limited. Here's my problem: I have been able to parse 20K+ messages representing the archives of a forum at Yahoogroups by using mboxparser. To facilitate loading the individual elements of each message into mySQL, they are delimited by ctrl-A instead of /n. Each record is then delimited by ctrl-B. Each email message, however, has tag line advertising that I want to remove. The tag lines begin with: "------ Yahoo" Can someone point me to a script or give me guidance on how to search a file for the term "------ Yahoo" and delete it and all text following until the ctrl-B delimiter. With kind regards for you assistance, Bill

Comment on How do I delete from a delimiter to the end of a file?

Replies are listed 'Best First'.
Re: How do I delete from a delimiter to the end of a file? by ikegami (Patriarch) on Aug 25, 2008 at 00:29 UTC
`$file =~ s/------ Yahoo.?(?=\cB)//sg;` [download] Update*: Added "s" modifier.	[reply] [d/l]
Re^2: How do I delete from a delimiter to the end of a file? by JavaFan (Canon) on Aug 25, 2008 at 12:51 UTC
.? usually comes with a speed penalty (unless the optimizer eliminates running .? at all), as Perl needs to do bookkeeping for possible backtracking. I'd write it as: `s/------ Yahoo[^\cb]+//g; # Keeps the ^B s/------ Yahoo[^\cb]+\cB//; # Removes the ^B as well.` [download]	[reply] [d/l]
Re^3: How do I delete from a delimiter to the end of a file? by ikegami (Patriarch) on Aug 25, 2008 at 20:34 UTC
I don't know from where you got your information, but it appears to be incorrect. `Rate JavaFan JavaFan_noplus ikegami JavaFan 104/s -- -5% -16% JavaFan_noplus 109/s 5% -- -12% ikegami 123/s 19% 13% -- Rate JavaFan JavaFan_noplus ikegami JavaFan 109/s -- -2% -11% JavaFan_noplus 110/s 2% -- -10% ikegami 122/s 13% 11% -- Rate JavaFan JavaFan_noplus ikegami JavaFan 103/s -- -5% -21% JavaFan_noplus 109/s 5% -- -17% ikegami 131/s 27% 20% --` [download] Read more... (1117 Bytes)	[reply] [d/l] [select]
Re^4: How do I delete from a delimiter to the end of a file? by JavaFan (Canon) on Aug 26, 2008 at 06:41 UTC
Re: How do I delete from a delimiter to the end of a file? by kyle (Abbot) on Aug 25, 2008 at 02:46 UTC
I actually like ikegami's solution better, but this is what I thought of first: `perl -pi -e 'BEGIN{$/="\cB"} s{-{6}\sYahoo.*\z}{$/}ms' list of files` [download]	[reply] [d/l]
Re^2: How do I delete from a delimiter to the end of a file? (slurp--) by tye (Sage) on Aug 25, 2008 at 03:12 UTC
I suspect yours will work better than ikegami's in a large number of situations. Reading 20k messages one-at-a-time is likely a better idea than requiring the entire archive of 20k messages to be read into memory at once. Yours is even a complete example, not just a single regex that leaves the process of replacing files and slurping as an exercise. - tye	[reply]