<?xml ver="blah"?>
<sections>
<build>
<field name="build time"></field>
<!--
This a non-delimited section of text that will be parsed .
+..
-->
<fields type="section of text 1">
<!--
A token-delimited section of text that will be parsed into
+ this area...
-->
</fields>
</build>
<run>
<errors>
<error time="some_integer" some_attr="more helpful info ab
+out error gets put here" />
<!--
... etc
-->
</errors>
<mismatches>
<mismatch time="another_integer" some_attr="more helpful i
+nfo about mismatch goes here" />
<!--
... etc
-->
</mismatches>
<perf_stats>
<stat type="performance item name">value</stat>
<!--
stats about speed, test time, etc go here ...
-->
</perf_stats>
</run>
</sections>
Extra notes (about flat logs):
- Flat logs which are parsed are 400~100k lines large.
- Current system uses sections of text vs a 'gold log' (good existing known output), and does either specialized subset field checking or straight diff(1)'ing. This unfortunately is a bad idea with a large number of tests because the number of logs is ( 1000 tests * (1-2 logs) * (1-3 test sets) ) => 1000 ~ 6000 logs. So, if the file format changes (i.e. a new feature is introduced to the toolchain) so will all affected logs, and bringing the 'gold logs' up to date will consume a lot of unnecessary time, and this is going to occur in the future.
Thus, by moving to a more structured data store, I can get away from the flat file's formats and get to a content based comparison system. |