alkaloid has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a text file with the following code:
sub parse { open(CLOG, "<$flatfile") || die ("Cannot open $flatfile for readin +g... $!\n"); foreach $tracknumber (<CLOG>) { $tracknumber =~ s/\<Num_0\>/0/g; $tracknumber =~ s/\<Num_1\>/1/g; $tracknumber =~ s/\<Num_2\>/2/g; $tracknumber =~ s/\<Num_3\>/3/g; $tracknumber =~ s/\<Num_4\>/4/g; $tracknumber =~ s/\<Num_5\>/5/g; $tracknumber =~ s/\<Num_6\>/6/g; $tracknumber =~ s/\<Num_7\>/7/g; $tracknumber =~ s/\<Num_8\>/8/g; $tracknumber =~ s/\<Num_9\>/9/g; $tracknumber =~ s/\<Num_\.\>/\./g; $tracknumber =~ s/\<Num_\-\>/\-/g; $tracknumber =~ s/\<Num_\/\>/\//g; $tracknumber =~ s/\<Tab\>/\t/g; $tracknumber =~ s/\<Tab//g; $tracknumber =~ s/\<Up\>//g; $tracknumber =~ s/\<Down\>//g; $tracknumber =~ s/\<Alt\>//g; $tracknumber =~ s/\<Ctrl\>//g; $tracknumber =~ s/\<PgDwn\>//g; $tracknumber =~ s/\<PgUp\>//g; $tracknumber =~ s/\xA0/ /g; $tracknumber =~ s/\xA1//g; $tracknumber =~ s/\xA2//g; $tracknumber =~ s/\xA4//g; $tracknumber =~ s/\x0D/\n/g; $tracknumber =~ s/\x0A/\n/g; $tracknumber =~ s/\<Num_//g; $tracknumber =~ s/\<Num//g; $tracknumber =~ s/\<Nu//g; $tracknumber =~ s/\<Numpad_3\>/\n/g; $tracknumber =~ s/pad_3\>/\n/g; open(OUTPUT, "+>>$output") || die ("Cannot open output file$!\ +n"); print OUTPUT "$tracknumber"; close OUTPUT; } close CLOG; }
As you can see, I am doing quite a few search and replace functions, but the problem I am having is this: the text file from which I'm reading also contains statements such as <Left>, <Right>, and <Del> (it is the output from a keylogger), and I need to reflect the cursor movement that these keys cause in the final output file (I am parsing the output file because it the text is quite unreadable straight out of the logger.) Also, does anyone have an idea about how I might be able to parse the whole file in one big chunk instead of line-by-line (as "foreach" does)? The line-by-line business really messes up the final parsed output. Thanks guys/gals!

Replies are listed 'Best First'.
Re: Parsing Issue
by dvergin (Monsignor) on Feb 06, 2002 at 00:11 UTC
    Here's some working code to show the outline of a different approach. I've substituted in some self-contained data for demo purposes. Hope this helps.
    #!/usr/bin/perl -w use strict; sub parse { my $flatfile = shift; my %replace = ('<Tab>' => "\t", '<Numpad_3>' => "\n", 'pad_3' => "\n", 'etc' => "etc", ); my @delete = ('<Tab>', '<Up>', '<Down>', '<PgUp>', '<Num_', '<Num', '<Nu', 'etc', 'etc', ); #open(CLOG, "<$flatfile") # || die ("Cannot open $flatfile for reading... $!\n"); #foreach $tracknumber (<CLOG>) { foreach my $tracknumber (<DATA>) { for my $chr ('0'..'9', '-', '/', '.') { $tracknumber =~ s/<Num_($chr)>/$chr/g; } for (keys %replace) { $tracknumber =~ s/$_/$replace{$_}/g; } for (@delete) { $tracknumber =~ s/$_//g; } #open(OUTPUT, "+>>$output") # || die ("Cannot open output file$!\n"); #print OUTPUT "$tracknumber"; #close OUTPUT; print "$tracknumber"; } #close CLOG; } parse('somefile'); __DATA__ Since I have no idea what keylogger data looks <Tab>like,<Up> I'll just <PgUp> have to fudge <Tab>up<Down> something <Num_4> a test<Num_.>
    Regarding your line-by-line issue. I suspect something else is the cause of your messed-up output. (Specifics, please) But if the file is not too big, you can suck the whole thing up into a single scaler and then do the three s/// alterations on the whole thing at once.

    Update: After sleeping on it, I repent of suggesting the use of a hash to store the %replace data. It works in this case but it's a risky pattern. The order that keys are returned from a hash is not guaranteed. So consider the @delete data above and imagine if 'Num_', 'Num', and 'Nu' were replace items. If the data at hand were 'blah blah Num_ blah', it would make a difference if 'Nu' were evaluated before the other two. So we need to store these Replace items in an array to assure they will be evaluated in the specified order. It could be an array of hashes but there's no need for that, an array of arrays will do nicely and is more efficient. (untested snippets follow)

    my @replace = (['<Tab>', "\t"], ['<Numpad_3>', "\n"], ['pad_3', "\n"], ['etc', "etc"]); # ... later in loop foreach my $aref (@replace) { $tracknumber =~ s/$aref->[0]/$aref->[1]/g; }

    ------------------------------------------------------------
    "Perl is a mess and that's good because the
    problem space is also a mess.
    " - Larry Wall

      Thank you for your help...I apparantly did not proofread my post good enough. One of the main issues I'm having is when the parser encounters the output of "<Left>, <Right>, and <Del>." Is there some way to mirror the action of these keys before writing to a new output file?
      In other words, if I have some text that looks like:
      "requested to remove  LD  for 4.99 .. did tell her if she makes calls that asre   L D will be billed  before 02/25/02 <Right><Right><Right><Right><Right><Right><Right><Right><Rigany <Right><Right><Right><Right><Left><Left><Left><Left><Del><Del>calls placed she <Right><Right><Right><Right><Right><Right><Right><Right><Right>.. she said she understood ...eg"
      Is there some way for the parser to emulate the effect of all those Left, Rights, and Dels (as in overwriting characters when Left is used, or Deleting characters when Del is used, etc.)? The text files being parsed will be under 50k...is this too much to parse at once, or can I get it all into a single scaler? Thanks again!
        In my innocence I thought cLive ;-)'s response was odd.

        But looking at the data and reflecting on the term "keylogger", I am now ready to ask also: "What color is your hat?"

        Beyond that, there is a lot more involved in this to accomplish what you now propose. Not the least of it is that parsing this data will require that the parsing script have a copy of the document being worked on since, from the snippet given above, it seems clear that the typist being logged is moving forward through a pre-existing document.

        Let me risk going a step further and suggest that this task is effectively beyond the range of any reasonable expenditure of effort. You list <Up>, <Down, <PgUp>, and <PgDown> as keys that will be in the keylogger data. Knowing where these keystrokes will take the curser would require a lot of information and a lot of calculations on margins, font size, variable width character metrics, etc.

        Good luck.

(cLive ;-) Re: Parsing Issue
by cLive ;-) (Prior) on Feb 05, 2002 at 23:27 UTC
    As always, the skeptic in we wonders... "What color is your hat?"

    .02

    cLive ;-)