in reply to speed up split?

I'm pretty sure you don't want the /g option to your regex in split, and the /s option is pointless with the pattern you're using. Also, your split will be throwing away those opening tag fragments. Is that what you want it to do?

Your s/// would be better written with tr///.

The HTML parsing portion of your program might be better replaced with HTML::TokeParser::Simple


Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^2: speed up split?
by Anonymous Monk on Mar 22, 2005 at 17:10 UTC
    well,
    in my tabs and lists the meaning of an item is given by the items before. It is not just searching some strings. And to program such a parser seems to me as complicated as doing it myself line by line.

    But I would like to try tr///, but I feel a little bit uncertain. To get ird of all \n and \r, is that correct?

       $s =~ tr/\x0D\x0A//d;
    
    Or is ther a better solution?
    Thanks in advance,
    Carl
      Your use of tr is correct. It's a little easier to read if you just say
      tr/\r\n/ /s;
      I am replacing runs of \n and/or \r with single spaces.

      Caution: Contents may have been coded under pressure.
        It's not only easier to read, but it's also more portable. There's nothing to say that \n will be ASCII 10 and \r will be ASCII 13. In fact, on old Macs, \n is ASCII 13 ... :-)

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.