kamchez has asked for the wisdom of the Perl Monks concerning the following question:

The txt files are not that large (about 1000 lines usually) What I want to achieve is this : get all occurrences of :

Orderbook ID: ID_ Symbol: xxxxx xxxxx ISIN: SExxxxxxxxx

and print it out like so : ID:ISIN:SYMBOL

for example : ID_:SExxxxxxxxxx:xxxxx xxxxx

This is what I've written now :

use strict; use warnings; use File::Basename; use Text::ParseWords; open (FILE, $ARGV[0]); sub getValue { $_ =~ s/\s//g; my ($name, $value) = split(/:/); chomp($value); return $value; } while (<FILE>) { my (@ID,@ISIN,@SYMBOL); if ($_ =~ m/ID:/) { @ID = getValue($_); } if ($_ =~ m/ISIN:/) { @ISIN = getValue($_); } if ($_ =~ m/Symbol:/) { @SYMBOL = getValue($_); } my @FULLVAR = (@ID,@ISIN,@SYMBOL); my $num = 0; my $count = 0; foreach (@ID,@ISIN,@SYMBOL) { printf "$_:"; } } } else { print "You need to specify an input file \n"; print "Usage : ".basename($0)." difffile.txt \n"; exit; }
Orderbook ID: XXX Symbol: DUMMY 00OXX ISIN: SE000123456 Market: Plain Dummy SE ## inactivationTime: [2012-01-01T00:00:00] => [2012-01-10T00:00:00] ## XML/Warrant/ReimbursementDay: [2012-01-06] => [2012-01-10] ## XML/Warrant/LastTradedDay: [2012-01-06] => [2012-01-10] ## unpublicationTime: [2012-01-01T00:00:00] => [2012-01-10T00:00:00
can everybody please change all of the log file data to the following DUMMY data? please! it's really important!!

Replies are listed 'Best First'.
Re: matching strings into array from txtfile and printing on same line
by ww (Archbishop) on Jun 16, 2012 at 01:49 UTC

    Line 23: printf is not the operator to use ( Perl is not C ). Use print with a trailing newline where appropriate(or, with a recent version of Perl, say)

    You'll find this ( sorta' ) works:

    for (@ID,@ISIN,@SYMBOL, "\n",) { print "$_:"; } ...

    That's a really ugly hack and produces many lines containing only a single colon.

    Nonetheless, for part one, see printf and compare to print.

    For part two, writing a coherent set of suggestions about the other problems (unnecessary use of arrays which you then route thru a foreach; failing to give your series of "ifs" an escape route (better, perhaps, an if, two elsifs and an else); etc.; would take more time than I have just now. [<update> Besides, muba did a fine job on most of it while I was still fiddling. </update>] Sorry, but perhaps you can infer (or read docs) how this works:

    #!/usr/bin/perl use 5.014; sub getValue { my ($value, $name); # $name is a throwaway; why? $_ = shift; ( $name, $value ) = split(/:/); $value =~ s/\s//g; return $value; } print "ID SYMBOL ISIN\n"; while (<DATA>) { my ($ID,$ISIN,$SYMBOL); if ($_ =~ m/ID:/) { $ID = getValue($_); $ID = $ID . ":"; # append colon removed by split chomp $ID; print $ID; } elsif ($_ =~ m/Symbol:/) { $SYMBOL = getValue($_); chomp $SYMBOL; $SYMBOL .= ":"; # alternate method to append colon print $SYMBOL; } elsif ($_ =~ m/ISIN:/) { $ISIN = getValue($_); chomp $ISIN; print $ISIN ."\n"; } else { next; } } __DATA__ Deleted; see careless OP's request in node below and +, as msgd to me: "Sorry for bothering you sir. But could you please change the data on +node : http://www. perlmonks.org/? displaytype= displaycode; node_id= +976512 to : http://www. perlmonks.org/? displaytype= displaycode; nod +e_id=991009 , appreciate it! !prod data"

    output:

    ID SYMBOL ISIN QYQ:LUP2L100OHM:SE0004017929 R1M:TLS2K50OHM:SE0004018539 QNF:MINILONGOMXAO:SE0003990183 QX8:ALF2K160OHM:SE0004017440 NC0:BOL2K170OHM:SE0003842137 NEV:NOK2K90OHM:SE0003843069

    OP or OP's management seems to believe one can make things disappear from the internet.

    Wrong. Can you spell "cache" or "waybackmachine?"

    OP also seems to think I have some hold over kenosis, as evidenced by this message (verbatim), also received by me:

    "kamchez says Re Re: matching strings into array from txtfile and printing on same line

    kenosis, could you please change all of the "data" to dummy data as follows, the reason is that the current data is important production data that shouldn't be online (my fault) and that my employer wants it removed asap "

      could all of you please change all of the "data" to dummy data as follows, the reason is that the current data is important production data that shouldn't be online (my fault) and that my employer wants it removed asap. I would greatly appreciate the help!!! you can change all of the data to the following :

      __DATA__ contract diff, generated at 2012-06-00T00:00:48 ========================================================= Reading old orderbooks from: gzip -cd < /PATH/orderbook_20120600T0000. +txt.gz| Reading new orderbooks from: gzip -cd < /PATH/orderbook_20120600T0002. +txt.gz | Modified contracts: (Total 0) ------------------------------------------ ------------------------------------------ Deleted contracts: (Total 144) ------------------------------------------ Orderbook ID: XXX Symbol: DUMMY 00OXX ISIN: SE000123456 Market: Plain Dummy SE ## inactivationTime: [2012-01-01T00:00:00] => [2012-01-10T00:00:00] ## XML/Warrant/ReimbursementDay: [2012-01-06] => [2012-01-10] ## XML/Warrant/LastTradedDay: [2012-01-06] => [2012-01-10] ## unpublicationTime: [2012-01-01T00:00:00] => [2012-01-10T00:00:00

Re: matching strings into array from txtfile and printing on same line
by aaron_baugher (Curate) on Jun 16, 2012 at 04:17 UTC

    I'm kind of partial to setting the input record separator when there is a consistent separator between records, and then it's usually possible to fashion a regex to pluck out the values from a single record:

    #!/usr/bin/env perl use Modern::Perl; local $/ = ""; # split records on a blank line while (<DATA>) { if ( m[Orderbook\ ID: \s* (.+?) \n \s* Symbol: \s* (.+?) \n \s* ISIN: \s* (.+?) \n ]xs ){ say "$1:$3:$2"; # comment the previous line and uncomment the next three if yo +u really # want to strip whitespace from your values, unlike your first # example # my $l = "$1:$3:$2"; # $l =~ s/\s+//g; # say $l; } } __DATA__ [Sample data removed by request of the original poster.]

    Aaron B.
    Available for small or large Perl jobs; see my home node.

      Nice work, Aaron!

Re: matching strings into array from txtfile and printing on same line
by muba (Priest) on Jun 16, 2012 at 01:10 UTC

    Try including a newline in your print statement.

    That being said, I felt there were some more things about your script that could use a little clean up.

    It's still not perfect though - not all incantations of the script seem to give an error message when you want it. I'll leave it as an exercise to you to find out how to solve that. Hint: check whether $ARGV[0] (or @ARGV) actually is set.

Re: matching strings into array from txtfile and printing on same line
by Kenosis (Priest) on Jun 16, 2012 at 03:38 UTC

    Here's another option with your data in the file data.txt:

    use Modern::Perl; use File::Slurp qw/read_file/; my $lines; for (read_file 'data.txt') { if ( /Orderbook/ .. /ISIN/ ) { $lines .= $_; } elsif ($lines) { my ( $id, $sy, $is ) = $lines =~ /ID:(.*)[\s\S]+Symbol:(.*)[\s\S]+ISIN:(.*)/; my $line = "$id:$sy:$is"; $line =~ s/\s//g; say $line; $lines = ''; } }

    Output:

    QYQ:LUP2L100OHM:SE0004017929 R1M:TLS2K50OHM:SE0004018539 QNF:MINILONGOMXAO:SE0003990183 QX8:ALF2K160OHM:SE0004017440 NC0:BOL2K170OHM:SE0003842137 NEV:NOK2K90OHM:SE0003843069
Re: matching strings into array from txtfile and printing on same line
by kamchez (Initiate) on Jun 16, 2012 at 08:39 UTC

    Thank you all for taking the time to help me out! It's truly amazing how many ways there are to do something ;) I've learned a lot by reading your replies and I've cleaned up the code a little and tweaked it with the input I got from each of you. here it goes;

    use strict; use warnings; use File::Basename; use Text::ParseWords; if ($#ARGV == 0) { open my $file, "<", $ARGV[0] or die "Couldn't open file '$ARGV +[0]': $! \nDid you specify a valid file?"; my ($ID,$ISIN,$SYMBOL); while (<$file>) { if ($_ =~ m/ID:/) { $ID = getValue($_); } if ($_ =~ m/Symbol:/) { $SYMBOL = getValue($_); } if ($_ =~ m/ISIN:/) { $ISIN = getValue($_); print "$ID:$ISIN:$SYMBOL\n"; } } } else { print "You need to specify an input file \n"; print "Usage : ".basename($0)." difffile.txt \n"; exit; } sub getValue { $_ =~ s/\s+//g; my ($name, $value) = split(/:/); chomp($value); return $value; }

    please let me know if there are any more improvements that can be done to make it more "clean" and "proper" according to "proper coding standards" ;9 Thank you all !!

    Output from diff_file, running :
    ./isin_parse.pm diff_20120614T1442.txt
    QYP:SE0001234567:LUP2L80OHM NGA:SE0001234567:SHB2K260OHM NJQ:SE0001234567:TRE2K70OHM NH0:SE0001234567:SKF3A190OHM NEY:SE0001234567:NOK3A70OHM QUA:SE0001234567:MINILONGLUPFO QPC:SE0001234567:MINISHRTOMXOO QP3:SE0001234567:MINISHRTOMXFO QU9:SE0001234567:MINILONGLUPEO P3B:SE0001234567:SWM2K240OHM P0N:SE0001234567:ASS3A160OHM NGF:SE0001234567:SHB3A260OHM R1B:SE0001234567:SKF2K210OHM QP8:SE0001234567:MINISHRTOMXKO R1K:SE0001234567:TEL3A123OHM P34:SE0001234567:SWE2K140OHM NAS:SE0001234567:ASS2K180OHM NC3:SE0001234567:BOL3A150OHM NDD:SE0001234567:HM2L280OHM NCX:SE0001234567:ERI3A120OHM P2D:SE0001234567:OXS3A1300OHM QNA:SE0001234567:MINILONGABBCO NGV:SE0001234567:SKF2K190OHM QPE:SE0001234567:MINISHRTOMXQO QU7:SE0001234567:MINILONGLUPCO
    nt!!