Breaking huge text file apart

HTTP-404 has asked for the wisdom of the Perl Monks concerning the following question:

Hello i have very big text file, that i want to break in peaces, this stirng bellow splaits this parts

---------------------------------
CheatBook-DataBase 2001 v3.0 - http://www.cheatbook.de              
+                                                                     
+         и    <50 spaces here>                      11th Hour<newli
+ne>
---------------------------------
[download]

this is 1 single line, is there any ay i can rip out this 11th Hour thing i thought of something like this

/^CheatBook-DataBase 2001 v3.0 - http://www.cheatbook.de\W+(\w+)$/
[download]

any suggestions, thnak you very much in advance

Comment on Breaking huge text file apart Select or Download Code

Replies are listed 'Best First'.
Re: Breaking huge text file apart by andye (Curate) on Jul 28, 2001 at 20:49 UTC
A couple of easy options. If the file is small enough to fit in memory, then you could do something like this: `my $pattern = "----- blah de blah ----"; do_whatever($_) foreach split /$pattern/, join '', <> ;` [download] If you don't want to read the whole thing into memory in one go, you could do something like this: `{ local $/ = "----- blah de blah ----"; while (my $this_section = <>) { chomp $this_section; do_whatever($this_section); } }` [download] andy.	[reply] [d/l] [select]
Re: Breaking huge text file apart by abstracts (Hermit) on Jul 28, 2001 at 21:25 UTC
Hello If I'm understanding correctly, you have a file consisting of lines. Each line looks like `CheatBook-DataBase 2001 v3.0 - http:://.... (50 spaces) some text` [download] And you want to delete the spaces and the trailing text. If that's what you wanted, then you can use a simple substitution: `s/ {50}.//;` [download] to remove the first 50 consecutive spaces it sees and everything after them. However* if what you want to keep is the stuff after the spaces, then you can use split this way: `my $str = ... my @array = split /\s+/, $str; my $title = pop(@array); # get the last element # or even $str =~ / {50,}(.*)/; # match 50 or more spaces and capture # remaining text my $title = $1;` [download] Hope this helps,,, Aziz,,,	[reply] [d/l] [select]
Re: Breaking huge text file apart by grinder (Bishop) on Jul 30, 2001 at 12:45 UTC
You just want the 11th hour text, right? On my browser, I can see a weird ASCII character just before it. Let's say, for the purpose of the argument that it is ASCII code 15. The following code would do the trick: `while( <IN> ) { chomp; my $title = substr( $_, rindex( $_, chr(15) ) + 1 ); }` [download] There will be a few spaces left before the first 1 in "11th" but I figure you know how to deal with that. For instance, if there are a fixed number of spaces, you could say `rindex( $_, chr(15) ) + 4`. The key to the solution is the use of rindex, which will be a good deal faster than using a regex. -- `g r i n d e r`	[reply] [d/l]

g r i n d e r

`g r i n d e r`