Storing unordered data from file in memory

Dirk80 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have the following input file (motorola s-record) which is hex-ascii:

S00600004844521B
S315FC0000187C631A787C6001243C804C00388400643C
S315FC000028908309003C200100382100007C000278FE
S315FC0000387C1043A67C1143A67C1243A67C1343A6DC
S315FC00004870630000606320004C00012C7C60012476
S315FC0000587C0004AC4C00012C3C6080007C0004ACA9
S315FC0000684C00012C7C70FBA64C00012C7C0004ACDB
S315FC0000787C631A787C79FBA6FF80010CFF00010CD7
S315FC000088FE80010CFE00010CFD80010CFD00010C3C
S315FC000098FC80010CFC00010C4C00012C48000008FB
S315FC0000A83F8000007C6802A6C0030000C023000055
S315FC0000B8C0430000C0630000C0830000C0A300006A
S315FC0000C8C0C30000C0E30000C1030000C123000058
S315FC0000D8C1430000C1630000C1830000C1A3000046
S315FC0000E8C1C30000C1E30000C2030000C223000034
S315FC0000F8C2430000C2630000C2830000C2A3000022
S315FC000108C2C30000C2E30000C3030000C32300000F
S315FC000118C3430000C3630000C3830000C3A30000FD
S315FC000128C3C30000C3E300004C00012C706300004D
S315FC0001387C0004AC4C00012C7C6001244C00012C96
S315FC000148706300004C00012C7C6001A44C00012C5F
S315FC0001587C6101A44C00012C7C6201A44C00012C9E
S315FC0001687C6301A44C00012C7C6401A44C00012C8A
S315FC0001787C6501A44C00012C7C6601A44C00012C76
S705FC000018E6
[download]

My goal is it to get the data part of the records starting with S3 of this file. Let's take the following S3-Line to show you how an S3 is working. S315FC0000187C631A787C6001243C804C00388400643C. 2 hex ascii digits mean 1 binary byte

ID (2 hex ascii digits): S3
Number of Data Bytes (2 hex ascii digits): 15
Address(8 hex ascii digits): FC000018
Data (Number of Data Bytes - Address (4 Bytes) - Checksum(1 Byte), i.e. 21 (15hex) - 4 - 1 = 16 Bytes, i.e. 32 hex ascii digits in this case): 7C631A787C6001243C804C0038840064
Checksum(2 hex ascii digits): 3C

So I wrote the following code which seems to work.

#!/usr/bin/perl

use strict;
use warnings;

my $in_file_name = "D:/temp.s3";
my $data = "";  

&extractDataFromSrecFile($in_file_name, \$data);
print($data,"\n");

sub extractDataFromSrecFile
{
    my ($file_name, $ref_data) = @_;
    
    open(my $fh, "<", $file_name) || die "Could not open \"$file_name\
+"";

    $$ref_data = "";
    while( my $srec = <$fh> )
    {        
        chomp($srec);

        my $id            = substr($srec,0,2);
        my $nb_data_bytes = hex(substr($srec,2,2)) - 4 - 1;
        my $address       = substr($srec, 4, 8);        
        
        if( ($id eq "S3") &&
            ($address ne "FC000000") )
        {
            $$ref_data .= substr($srec, 12, ($nb_data_bytes * 2));
        }
    }

    close( $fh );
}
[download]

But now let's assume that the S3-Records are not ordered correctly by address as before. Because my script just concatenates data the result would be wrong.

Here an example for the same file, but unordered.

S00600004844521B
S315FC0000187C631A787C6001243C804C00388400643C
S315FC0000387C1043A67C1143A67C1243A67C1343A6DC
S315FC000028908309003C200100382100007C000278FE
S315FC00004870630000606320004C00012C7C60012476
S315FC0000587C0004AC4C00012C3C6080007C0004ACA9
S315FC0000787C631A787C79FBA6FF80010CFF00010CD7
S315FC000088FE80010CFE00010CFD80010CFD00010C3C
S315FC000098FC80010CFC00010C4C00012C48000008FB
S315FC0000684C00012C7C70FBA64C00012C7C0004ACDB
S315FC0000A83F8000007C6802A6C0030000C023000055
S315FC0000B8C0430000C0630000C0830000C0A300006A
S315FC0000C8C0C30000C0E30000C1030000C123000058
S315FC0000D8C1430000C1630000C1830000C1A3000046
S315FC0000E8C1C30000C1E30000C2030000C223000034
S315FC0000F8C2430000C2630000C2830000C2A3000022
S315FC000108C2C30000C2E30000C3030000C32300000F
S315FC000118C3430000C3630000C3830000C3A30000FD
S315FC000128C3C30000C3E300004C00012C706300004D
S315FC0001387C0004AC4C00012C7C6001244C00012C96
S315FC000148706300004C00012C7C6001A44C00012C5F
S315FC0001587C6101A44C00012C7C6201A44C00012C9E
S315FC0001687C6301A44C00012C7C6401A44C00012C8A
S315FC0001787C6501A44C00012C7C6601A44C00012C76
S705FC000018E6
[download]

Now my question is how to do it. Is the best way to go once through the input file, sort it by address and then do it as before? But I think that this will have a slow performance. Here you see only a small example. The file can be really huge (e.g. 200 MB)

Or is the better way to just read in a line and then store the data at the right position in memory by means of the address. But now I ask me what shall I take to store the data. A scalar, an array or a hash (key contains address and value the data)? What is the most efficient way to do it.

It should also be possible to check if the data is complete or not. It could e.g. be that one S3-Record is missing and that at a certain address no data is available.

Also regard that it is possible that the lines could have a different length.

Thank you

Dirk

Comment on Storing unordered data from file in memory Select or Download Code

Replies are listed 'Best First'.
Re: Storing unordered data from file in memory by BrowserUk (Patriarch) on Jul 21, 2010 at 13:02 UTC
You can avoid sorting altogether. And you should. As your records are fixed length, (barring the first and last?), there is a direct arithmetic relationship between the address of a record and its position in the file. So, use a big scalar--the most efficient form of Perl memory--and write records directly in-place. Or even directly to disk and avoid large memory usage. As for tracking whether all the records have been written, the same arithmetic properties can be used to reduce the absence/presence of a given record, to a single bit in a bit-string. With 46-byte records, 200 MB reduces to a 1/2 MB bit-string for tracking. And this is easily and quickly checked for completion by a simple: `if( $tracking =~ m[[^\0]] ) { ## ... }` [download] I know you are using threading. If the records of an individual file are being produced by different threads, then some care will need to be taken to ensure these large scalars are not replicate per thread. Perhaps the simplest way would be to use a Queue to a single writing and tracking thread. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^2: Storing unordered data from file in memory by Dirk80 (Pilgrim) on Jul 21, 2010 at 15:11 UTC
In my example all S-Records have the same length. But this has not to be the case. Each S3-Record has a field which indicates the number of bytes for address, data and checksum. Can you explain me the tracking in more detail please. Because I do not understand your idea completely. You mean that I have a bit-string in memory. And I initialise it with 0 and if I find a record then I put a 1 at this position. At the end I have to check if all positions in the bit-string are set to 1. Is this correct?	[reply]
Re^3: Storing unordered data from file in memory by BrowserUk (Patriarch) on Jul 21, 2010 at 15:35 UTC
But this has not to be the case. Unless all the records in a given file (or, at least in each given portion of a file; though that does complicate things considerably), then the idea doesn't work. But if they are, then it converts an O(n log n) process to O(n), which will have a far more profound effect upon your processing performance than specific sort implementations ever will. The fundamental requirement for this to work is a simple arithmetic calculation to convert the address of a record to its position in the file. I envisioned this being something like: `my $filePos = ( $recAddr - $firstRecAddr ) * ( $recLen + 1 ) + $lenOfH +eader;` [download] This would then allow you to seek directly to the appropriate file (or ramfile) position, and write the record in its final position directly. A similar calculation can be used to address the appropriate bit in the tracking bit-vector. You mean that I have a bit-string in memory. And I initialise it with 0 and if I find a record then I put a 1 at this position. At the end I have to check if all positions in the bit-string are set to 1. Is this correct? You have the general idea, but I inverted the state of the bits. Ie. I envisaged initialising the all the bits to 1: `my $tracker = chr(255) x ( $noOfRecords / 8 );` Then unsetting the bits as the records are written: `vec( $tracking, 1, $recNo ) = 0;` Then testing for completion by search the tracking vector for non-zero bytes: `if( $tracking =~ m[[^\0]] ) {...` But doing it the other way, initialising to 0, setting bits and then searching for non-0xff bytes is the same. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]
Re: Storing unordered data from file in memory by Corion (Patriarch) on Jul 21, 2010 at 10:50 UTC
For ease of use, I would just read the whole file into an array and then sort that array by address (which should be identical to sorting it by the elements directly, judging from the format, except for the last row). Otherwise, if memory becomes a real issue, I would try to reconstruct the memory as a scalar, using substr to write the payload data into the correct location. If the string is too short for the current location, expand the string appropriately.	[reply]
Re^2: Storing unordered data from file in memory by Dirk80 (Pilgrim) on Jul 21, 2010 at 12:58 UTC
I tried your solution. Here is my code which seems to work. #!/usr/bin/perl use strict; use warnings; my $in_file_name = "D:/temp.s3"; my $data = ""; &extractDataFromSrecFile($in_file_name, \$data); print($data,"\n"); sub extractDataFromSrecFile { my ($file_name, $ref_data) = @_; my @in_data; # read file into memory open(my $fh, "<", $file_name) \|\| die "Could not open \"$file_name\ +""; while( my $srec = <$fh> ) { chomp($srec); my $id = substr($srec,0,2); my $address = substr($srec, 4, 8); if( ($id eq "S3") && ($address ne "FC000000") ) { push(@in_data, $srec); } } close( $fh ); # store sorted data and check if it is complete $$ref_data = ""; my $next_address = 0xFC000018; # start address for( sort{ hex(substr($a, 4, 8)) <=> hex(substr($b, 4, 8)) } @in_d +ata ) { my $srec = $_; my $address = hex(substr($srec, 4, 8)); my $nb_data_bytes = hex(substr($srec,2,2)) - 4 - 1; # completeness check if( $address != $next_address ) { $$ref_data = ""; print "ERROR: S3-Record with address " . sprintf("%08X", $ +next_address) . " is missing in input file!\n"; return; } $$ref_data .= substr($srec, 12, ($nb_data_bytes * 2)); $next_address = $address + $nb_data_bytes; } } [download] Because I want to learn feel free to improve the code and give hints what I could do better. Thank you Dirk	[reply] [d/l]
Re^3: Storing unordered data from file in memory by toolic (Bishop) on Jul 21, 2010 at 13:21 UTC
Because I want to learn feel free to improve the code and give hints what I could do better. Use printf instead of print and sprintf. This: `print "ERROR: S3-Record with address " . sprintf("%08X", $next_address +) . " is missing in input file!\n";` [download] is shorter (and perhaps clearer) as this: `printf "ERROR: S3-Record with address %08X is missing in input file!\n +", $next_address;` [download]	[reply] [d/l] [select]
Re: Storing unordered data from file in memory by Mr. Muskrat (Canon) on Jul 21, 2010 at 16:21 UTC
This is so weird. I was just working with S-record files yesterday. Here's the start of my S-record parser. #!/usr/bin/perl use strict; use warnings; process( $ARGV[0] ); sub process { my $file = shift; # First, grab the contents of the firmware hex (S19) file my $contents = slurp( $file ); my @lines = split /\n/, $contents; my %data; for my $line ( @lines ) { next if $line =~ /^S0/; # skip comments next if $line =~ /^S7/; # skip 32-bit termination records next if $line =~ /^S8/; # skip 24-bit termination records next if $line =~ /^S9/; # skop 16-bit termination records # We could do all sorts of things to bulletproof this but it w +orks my ( $rectype, $length, $address, $data, $checksum ) = $line =~ / ^ # start of line (S[123]) # S1, S2, or S3 record ([0-9A-F]{2}) # record length including the address +and checksum ([0-9A-F]{8}) # address ([0-9A-F]+) # data or payload ([0-9A-F]{2}) # checksum \r?$ # end of line /x; my @bytes = unpack "(A2)*", $data; my $numbytes = scalar @bytes; my $nextaddr = sprintf( "%X", hex( $address ) + $numbytes ); $data{ $address } = { rectype => $rectype, length => $length, data => $data, checksum => $checksum, bytes => [ @bytes ], numbytes => $numbytes, nextaddr => $nextaddr, }; } my @ordered = sort { hex( $a ) <=> hex( $b ) } keys %data; for my $recno ( 0 .. $#ordered ) { my $curraddr = $ordered[ $recno ]; if ( ! exists $data{ $curraddr } ) { print "$curraddr <no data found>\n"; next; } my $rec = $data{ $curraddr }; my $nextrec = $ordered[ $recno + 1 ]; print "$rec->{rectype} $curraddr $rec->{length} $rec->{data} $ +rec->{checksum}\n"; if ( defined $nextrec && $nextrec ne $rec->{nextaddr} ) { print " <no data until $nextrec>\n"; } } } sub slurp { my $file = shift; my $text = do { local( @ARGV, $/ ) = $file ; <> }; return $text; } [download]	[reply] [d/l]
Re^2: Storing unordered data from file in memory by Dirk80 (Pilgrim) on Jul 21, 2010 at 19:59 UTC
Good solution. You chose the way to store everything in a hash. I like this way. But of course you load the complete file into memory. But if you have enough memory I think it is a very good way. UPDATE: Untested: But I think the data is hex-ascii. So I think that the unpack command should use '(H2)' instead of '(A2)'.	[reply]
Re^3: Storing unordered data from file in memory by AnomalousMonk (Archbishop) on Jul 21, 2010 at 21:02 UTC
No. See Intel HEX in perlpacktut for examples. (~~Also: See code by other respondents!~~ Update: Actually, now I look, I don't see any such examples.)	[reply]
Re^3: Storing unordered data from file in memory by Mr. Muskrat (Canon) on Jul 22, 2010 at 20:37 UTC
Untested: But I think the data is hex-ascii. So I think that the unpack command should use '(H2)' instead of '(A2)'. Only if you want it in binary. For my purposes, I prefer to keep it in hex until I get one of two specific addresses and then convert their data. This allows me to view the data without corrupting my terminal settings.	[reply]
Re: Storing unordered data from file in memory by Utilitarian (Vicar) on Jul 21, 2010 at 11:07 UTC
I would have thought , a hash with key for address and value for data, then `my prev_address=0; for my $address (sort keys %data){ ...` [download] `print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."`	[reply] [d/l] [select]
Re: Storing unordered data from file in memory by JavaFan (Canon) on Jul 21, 2010 at 12:03 UTC
Now my question is how to do it. Is the best way to go once through the input file, sort it by address and then do it as before? But I think that this will have a slow performance. Here you see only a small example. The file can be really huge (e.g. 200 MB) If the file is huge, I'd use `sort(1)` (that is, the external command) to sort the file. One may want to use grep to filter out unwanted lines (those not starting with S3). You may want to use a filter to put the address part into a column for sorting. One of the advantages is that `sort(1)` doesn't easily run out of memory (although, with modern computers, 200MB should be ok) - it will resort to using temp files if it doesn't have enough memory to do it all in-memory.	[reply]
Re^2: Storing unordered data from file in memory by Dirk80 (Pilgrim) on Jul 21, 2010 at 13:05 UTC
Is there also a perl sort command for files? I've seen that there exists File::Sort. What do you think of this? I want that the script is independent of OS and independent of the availability of an external command.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Storing unordered data from file in memory by repellent (Priest) on Jul 22, 2010 at 02:57 UTC
Sort by fixed-width ASCII addresses? S-records are fixed-width also, with characters `/[0-9A-F]/`, except for the `'S'`?! What more could we ask for? :) Here it is using Unix: `sort -k 1.5 s3.txt \| grep S3 > s3_sorted_by_addr.txt` [download]	[reply] [d/l] [select]