curd8341 has asked for the wisdom of the Perl Monks concerning the following question:

Hello All, I would like to get help on the following question: I have a string with the content as:

<script></script> <script>var onWL=0;</script> <script> var a_fInDelItms = 0; </script>
I would like to write a perl script to retrieve anything between <script> and </script> and save it into a array. Thanks in advance !

  • Comment on how to retrieve data between <script></script> and save to an array
  • Download Code

Replies are listed 'Best First'.
Re: how to retrieve data between <script></script> and save to an array
by Anonymous Monk on Apr 10, 2012 at 07:02 UTC

      Hi, there, the one you posted following won't work unfortunately

      my $left = quotemeta '<fun>'; my $right = quotemeta '</nuf>'; while( $line =~ /$left(.*?)$right/i ){ my $between = $1; }

      in my string, there are mixed one line and multiple line between <script> and </script>, how to use regular expression to handle both cases ? Thanks, curd

        Sure it will, just add /s , as in m//si
Re: how to retrieve data between <script></script> and save to an array
by CountZero (Bishop) on Apr 10, 2012 at 14:09 UTC
    If you put your string inside a wrapper tag, then XML::Twig makes it very easy to extract the contents of the script-tags.
    use Modern::Perl; use Data::Dump qw/dump/; use XML::Twig; my $text; my @results; { local $/ = undef; $text = <DATA>; } my $t = XML::Twig->new( twig_handlers => { script => \&script } ); $t->parse("<wrapper>$text</wrapper>"); say dump(@results); sub script { my ( $t, $script ) = @_; my $text = $script->trimmed_text(); push @results, $text; } __DATA__ <script></script> <script>var onWL=0;</script> <script> var a_fInDelItms = 0; </script>
    Output:
    ("", "var onWL=0;", "var a_fInDelItms = 0;")
    

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      Hello CountZero, your code works very well ! Thanks ! However, if the long string contains the following html page in which I want to retrieve content between <script> and </script>, how to do that by using your way ? Thanks,

      <html> <head> <script language="javascript" src="" id="scptMDtl"></script> <script language="javascript">var onWL=0;var oJS = new Object();</scri +pt> <script language="javascript"> var a_fInDelItms = 0; var a_fPrv = 1; var a_fTxt = 0; </script> </head> <body> <textarea id="txtBdy" class="w100 txtBdy" style="display:none"> This is a test. </textarea> <script language="javascript"> var a_sId = "RgAAAADz6rvTg9+xRIGTnTWlmKbHBwCeJ0LFThYQQaU\/Nvmo +3DG+AAACNa1IAABO5K\/6RD2SRbSFY4IMoQCbAAAAFHnvAAAJ"; var a_sCK = "TuSv+kQ9kkW0hWOCDKEAmwAAABXdNQ=="; </script> </body> </html>
        Just put that long string in $text and feed it to $t->parse("$text");

        It works as advertised!

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics