HTTP-404 has asked for the wisdom of the Perl Monks concerning the following question:

Hello i have very big text file, that i want to break in peaces, this stirng bellow splaits this parts
--------------------------------- CheatBook-DataBase 2001 v3.0 - http://www.cheatbook.de + + ÿÿÿÿÿÿÿÿ¨  <50 spaces here>  11th Hour<newli +ne> ---------------------------------
this is 1 single line, is there any ay i can rip out this 11th Hour thing i thought of something like this
/^CheatBook-DataBase 2001 v3.0 - http://www.cheatbook.de\W+(\w+)$/
any suggestions, thnak you very much in advance

Replies are listed 'Best First'.
Re: Breaking huge text file apart
by andye (Curate) on Jul 28, 2001 at 20:49 UTC
    A couple of easy options.

    If the file is small enough to fit in memory, then you could do something like this:

    my $pattern = "----- blah de blah ----"; do_whatever($_) foreach split /$pattern/, join '', <> ;
    If you don't want to read the whole thing into memory in one go, you could do something like this:
    { local $/ = "----- blah de blah ----"; while (my $this_section = <>) { chomp $this_section; do_whatever($this_section); } }
    andy.
Re: Breaking huge text file apart
by abstracts (Hermit) on Jul 28, 2001 at 21:25 UTC
    Hello

    If I'm understanding correctly, you have a file consisting of lines. Each line looks like

    CheatBook-DataBase 2001 v3.0 - http:://.... (50 spaces) some text
    And you want to delete the spaces and the trailing text. If that's what you wanted, then you can use a simple substitution:
    s/ {50}.*//;
    to remove the first 50 consecutive spaces it sees and everything after them.

    However if what you want to keep is the stuff after the spaces, then you can use split this way:

    my $str = ... my @array = split /\s+/, $str; my $title = pop(@array); # get the last element # or even $str =~ / {50,}(.*)/; # match 50 or more spaces and capture # remaining text my $title = $1;
    Hope this helps,,,

    Aziz,,,

Re: Breaking huge text file apart
by grinder (Bishop) on Jul 30, 2001 at 12:45 UTC
    You just want the 11th hour text, right? On my browser, I can see a weird ASCII character just before it. Let's say, for the purpose of the argument that it is ASCII code 15. The following code would do the trick:

    while( <IN> ) { chomp; my $title = substr( $_, rindex( $_, chr(15) ) + 1 ); }

    There will be a few spaces left before the first 1 in "11th" but I figure you know how to deal with that. For instance, if there are a fixed number of spaces, you could say rindex( $_, chr(15) ) + 4. The key to the solution is the use of rindex, which will be a good deal faster than using a regex.

    --
    g r i n d e r