bajangerry has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys,

I have been fighting with this project for a number of weeks now and have reached the point that I need to ask for some help from those who know more than I.

I am capturing a stream of data from a PBX splitting it into substrings and saving this to a database in specific columns which all works great except for one small problem. The PBX seems to send a signal to keep the socket alive when there is no data (ie no calls being made) and this adds a character (or more) to the start of the next string of data.

If I save the data to a text file this shows up as a "^@" at the beginning of the line of text. The number of these "^@" characters will vary based on the length of time between calls etc. What I need to know is how best do I filter my data stream to eliminate these characters? The data stream will start with a blank space followed by the days date, as in mm/dd or " 08/07" for the 7th August.

All characters after that fall in specific locations in the string so once I can find the start i.e. the 08/07 then if I can remove anything before that I will have my data... I just can't figure out how to do this!

This will need to work ongoing so using a specific number or date will not work. Also the data may be old so I can not compare the date to today's date either as a starting point. Any suggestions are greatfully accepted. If the current code would be helpful I can post it, didn't want to put too much info here to start with.

Thanks

P.S. Thought an example of the text file output might be helpful:

08/07 03:53P 00:00:46 130 9 4295794 08/07 03:56P 00:01:35 T001 001 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ 08/07 04:13P 00:00:04 + 08/07 04:13P 00:00:02 T001 001

Replies are listed 'Best First'.
Re: String manipulation
by moritz (Cardinal) on Aug 07, 2007 at 20:35 UTC
    That ^@ thingy looks a lot like some binary data with non-printable characters.

    As a start you could find out which char it is:

    # assuming the string is in $data: print ord(substr($data, 0, 1)), "\n";

    Then when you know exactly which character it is, you can remove it with a regular expression.

    The not-so-safe method would be to remove any leading non-space characters:

    $data =~ s/^[^ ]+//;

      ^@ is a common representation of char 0 (NUL in ASCII).
      I added your code so that it would print before the real data and this is the result:
      73
      08/07 04:48P 00:00:04 102 9 4249785
      73
      08/07 04:48P 00:00:02 T001 001
      As you can see I get a "73" printing before the line of data... strange!
      Actually, I stopped and restarted the script and guess what... the output has changed from "73" to "32"!
      I think I will try the "not so safe" method and see what happens. By the way, why do you say this is "not so safe"? because of possibly deleting actual data or is there another reason?
        I want to thank you guys for all your help, things look as if they are working fine now, you guys were great!
        I wrote it's "not so safe" because it deletes data if the real data does not begin with a blank. If the data is always formed as you say it is, the method is safe.
Re: String manipulation
by toolic (Bishop) on Aug 07, 2007 at 20:27 UTC
    The example input helps alot.

    If you want to remove all instances of ^@, you could use a regular expression like this:

    #!/usr/bin/env perl use warnings; use strict; while (<DATA>) { s/\^@//g; print; } __DATA__ 08/07 03:53P 00:00:46 130 9 4295794 08/07 03:56P 00:01:35 T001 001 ^@^ +@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ 08/07 04:13P 00:00:04 08/07 04:13P +00:00:02 T001 001
    This produces the following output:
    08/07 03:53P 00:00:46 130 9 4295794 08/07 03:56P 00:01:35 T001 001 08 +/07 04:13P 00:00:04 08/07 04:13P 00:00:02 T001 001
    If this is not what you are asking, then please give us more of your input, and expected output.
      Ok, I understand what you did there but there is more to this I think. let me show the code that reads and saves the data:
      my $sock = new IO::Socket::INET( PeerAddr => $host, PeerPort => $port, Proto => "tcp",) or die "Cannot connect to PBX on address $host port $port: $!"; while (<$sock>) { print; # print to screen as well to show raw data and connection chomp($_); open(DAT,">>$filename") || die("Cannot open smdr file"); print DAT $_; close(DAT);
      As you can see I print the string to screen before I save it to file and on the screen it looks perfect, everything lines up fine etc with no "^@" characters at all ever which makes me believe that this is not actually a string as such but some control character. That being the case do you think the code you provided will still work? I will try it and let you know.

        Looks like the "keep alive" is a null character (ASCII value 0).

        while (<$sock>) { s/^\0+//; # Remove leading null characters print; # print to screen as well to show raw data and connectio +n chomp($_);

        should fix the problem.


        DWIM is Perl's answer to Gödel