Parsing/Removing Characters from Text File

mizmaster22 has asked for the wisdom of the Perl Monks concerning the following question:

Alright, I posted on here recently and did a terrible job at it. I want to retry and explain my situation as best as possible. I wrote a script to query my telnet system and record the output in to a text file. It does all of that and outputs the correct data to the text file, but it is giving me all of these unwanted terminal characters and I having a very hard time removing these. I am extremely new to Perl programming and would appreciate the help. Here is my output(everything bolded) are what I want removed:

list tvbs dnv 3334 06:00 [1;1H [24;0H [K 7 [1;1H [0;7m list tvbs dnv 3334 06:00 [0m 8 [23;0H [0;7m [0m [23;0H [2;1H [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B [K [B 7 [1;65H [0;7m Page 1 [0m 8 [3;1H [3;23H TVBS DIRECTORY REPORT [5;1HName: [5;47HDate: 11:43 am THU JUN, 2011 [6;9HDVN: 3555 [7;4HDVN Name: System [7;49HAcceptable Level: 20 [9;13H AVG AVG AVG CALLS % IN [10;13HCALLS ACD SPEED ABAND ABAND TALK/ CONN FLOW BUSY/ SERV [11;1HTIME [11;13HOFFERED CALLS ANSW CALLS TIME HOLD CALLS OUT DISC LEVL [13;1H 6:00- 7:00 [13;14H 12 [13;21H 0 [13;27H 0:00 [13;33H 0 [13;39H 0:00 [13;45H 0:00 [13;51H 0 [13;58H 12 [13;64H 0 [13;71H 0 [14;1H 7:00- 8:00 [14;14H 7 [14;21H 0 [14;27H 0:00 [14;33H 0 [14;39H 0:00 [14;45H 0:00 [14;51H 0 [14;58H 7 [14;64H 0 [14;71H 0 [15;1H 8:00- 9:00 [15;14H 15 [15;21H 0 [15;27H 0:00 [15;33H 2 [15;39H 0:08 [15;45H 0:00 [15;51H 0 [15;58H 13 [15;64H 0 [15;71H 0 [16;1H 9:00-10:00 [16;14H 12 [16;21H 0 [16;27H 0:00 [16;33H 0 [16;39H 0:00 [16;45H 0:00 [16;51H 0 [16;58H 12 [16;64H 0 [16;71H 0 [17;1H10:00-11:00 [17;14H 6 [17;21H 0 [17;27H 0:00 [17;33H 0 [17;39H 0:00 [17;45H 0:00 [17;51H 0 [17;58H 6 [17;64H 0 [17;71H 0 [18;1H----------- [18;14H------ [18;21H----- [18;27H----- [18;33H----- [18;39H----- [18;45H----- [18;51H------ [18;58H----- [18;64H----- [18;71H--- [19;1HSUMMARY [19;14H 52 [19;21H 0 [19;27H 0:00 [19;33H 2 [19;39H 0:08 [19;45H 0:00 [19;51H 0 [19;58H 50 [19;64H 0 [19;71H 0 7 [23;0H[0;7m

It should look like this:

list tvbs dnv 3334 TVBS DIRECTORY REPORT Name: Date: 11:53 am THU, 2011 DVN: 3555 DVN Name: Suppotrt Acceptable Level: 20 AVG AVG AVG CALLS % IN CALLS ACD SPEED ABAND ABAND TALK/ CONN FLOW BUSY/ SERV TIME OFFERED CALLS ANSW CALLS TIME HOLD CALLS OUT DISC LEVL 5:00- 6:00 1 0 0:00 0 0:00 0:00 0 1 0 0 6:00- 7:00 12 0 0:00 0 0:00 0:00 0 12 0 0 7:00- 8:00 7 0 0:00 0 0:00 0:00 0 7 0 0 8:00- 9:00 15 0 0:00 2 0:08 0:00 0 13 0 0 9:00-10:00 12 0 0:00 0 0:00 0:00 0 12 0 0 10:00-11:00 6 0 0:00 0 0:00 0:00 0 6 0 0 ----------- ------ ----- ----- ----- ----- ----- ------ ----- ----- --- SUMMARY 88 0 0:00 4

Thank you guys very much and sorry for the annoying formatting.

Comment on Parsing/Removing Characters from Text File

Replies are listed 'Best First'.
Re: Parsing/Removing Characters from Text File by roboticus (Chancellor) on Jun 16, 2011 at 17:45 UTC
mizmaster22: As was mentioned on your previous thread on the same subject, if you can tell the software that your terminal doesn't handle color, much of that may go away unless it's cursor positioning stuff. But anyway, you can remove that stuff by making some regular expressions that recognize the strings and remove them, something like: `#!/usr/bin/perl use strict; use warnings; my $txt = "06:00\x1b[1;1H \x1b[24;0H \x1b[K 7 \x1b[1;1H " . "\x1b[0;7m list tvbs dnv 3334 06:00 \x1b[0m 8"; # Remove strings like <esc>[<digits_or_semicolon><letter> $txt =~ s/\x1b\[[0-9;]+[A-Za-z]//g; print $txt;` [download] All you need to do is figure out a good regular expression to match the bits you want to delete, and remove them as shown in the example. (The one I provided might delete too much, so be sure to test thoroughly.) ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^2: Parsing/Removing Characters from Text File by mizmaster22 (Novice) on Jun 16, 2011 at 19:05 UTC
roboticus, thank you so much. I have been racking my brain on this problem for a long time now. Me and regex do not get along very well at all. The output is nearly flawless now except I am getting a one carriage line or new line symbol in front of each sentence, but other than that it looks great here is my extremely simple parsing code: use warnings; use strict; use File::Slurp; my $s = read_file("calldata.txt"); $s =~ s/\x1b\[0-9;+A-Za-z//g; write_file("calldata.txt", $s); __END__ I looked those symbols up with a hex editor and it looks like they are 0d and 0a. Once again, thank you very much for the help.	[reply]
Re: Parsing/Removing Characters from Text File by moritz (Cardinal) on Jun 16, 2011 at 17:40 UTC
So, what did you try? I kinda guess that the unwanted data starts (and maybe ends) with some non-printable control characters, so please try opening the data in a hex editor to verify or falsify this guess. If this guess is accurate, it might make it much easier to filter out the unwanted parts. Perl 6 - second systems done right	[reply]