in reply to get string between two < tags > in .js file (xml)

Just do it the right way and be done.   16 megabytes is not “huge.”   This is a JSON-formatted file, and within that file some of the records are in XML format.   Therefore, first use a CPAN package that understands JSON.   Then, feed the extracted strings into another CPAN package that understands XML.   From here, an XPath query can dive right into the XML to extract from it precisely whatever you need to know.   Because of XPath, you do not have to write code to pick apart the XML structure itself.   You could, in less than 50 lines of “code that you actually had to write,” be looking at a robust and reliable (i.e. “real”) solution to this task.   Finito!

You are simply de-constructing the file in more or less the same way that it was originally constructed; probably using the same tool.   It is, if I may say, abjectly pointless to “prove” that something can be done the wrong way, even if you “succeed.”   (And, please, take this stern-sounding advice in an impersonal way, not as a flame, but as the pointed and direct admonition from an engineering colleague who deems it very important to get this point across.)

Replies are listed 'Best First'.
Re^2: get string between two < tags > in .js file (xml)
by kamchez (Initiate) on Jul 03, 2012 at 15:40 UTC

    thank you for your reply ... Yes that would be the proper way of doing it, you are absolutely right and I will look into it. For now , here is a quick and dirty solution that solved it for me:

    use strict; use warnings; use File::Basename; use Text::ParseWords; if ($#ARGV == 0) { open my $file_in, "<", $ARGV[0] or die "Couldn't open file '$A +RGV[0]': $! \nDid you specify a valid file?"; # open up a new file to write the changes made to open my $file_out, ">", "$ARGV[0].new" or die "Can't write new fil +e '$ARGV[0].new' : $! \nDo you have write permissions?"; # these are our currently active market makers my @list=("BNP","CBK","CIT","NDS","OHD","OHM","RBN","RBS","SEK","S +GA"); while (<$file_in>) { ## write all changes to new file print $file_out $_; # if we find a match for any Symbols if ($_ =~ m%</Name><Symbol>([^<]+)%) { my $SYMBOL; my $MATCH; $SYMBOL = $1; # and the $SYMBOL matches the array @list for active market ma +kers if (grep {$_ eq $SYMBOL} @list) { # Print and add the line marketMakerOrganization: $SYMBOL to t +he $file_out print $file_out "\t\tmarketMakerOrganization:\"$SYMBOL +\",\n"; } } } } else { print "You need to specify an input file \n"; print "\n"; print "They are usually located here : \n"; print "/PATH/orderbooks-xx-hostname-yy.x.xxx.xx.js"; print "\n"; print "Usage : ".basename($0)." difffile.txt \n"; print "\n"; exit; }