in reply to How to process variable length fields in delimited file.

Thank you all for your responses. I will review them for a better understanding. I am very novice in perl, so I would like to read details to get a good understanding of the proposed methods

I could have put the exact code to begin with, but I did not want to get to long winded, but at times details are better. One reason I was thinking of using the \f character is I don't care about printing the data ( I say that now), the data once in a readable delimited file will pass to SPLUNK application for end use. The problem in the data is there is about every character in the text. There are maybe 1,0000,000 lines of text a day and from the below message these are text from network devices which include characters such as #@$^|}{[]<> and about every character I could think of. They had tabs in also. I finally grepped the file for several days of output and I did not find a \f. Other possibility is to use multicharacter delimiter such as @#! which is unlikely to be together as standard text.

Here is the devil in the details of the true layout and an example of 1 data line. I will review and when I have time, comment on the solution. Thank you all

For each message: 1. Record Starter: "====>" 2. Message ID (uuid) 3. Condition ID (uuid, for future use) 4. Network Type of message node: IP Node 1 Non IP Node 5 5. IP Address (see A.) 6. String length of the nodename 7. Nodename 8. Network Type of message generation node (see 4.) 9. IP Address of message generation node (see A.) 10. String length of the message generation nodename 11. Nodename of message generation node 12. Log only flag 13. Unmatched flag 14. Message source type Console 0x0001 Message API 0x0002 Logfile 0x0004 Monitor 0x0008 SNMP 0x0010 Server MSI 0x0020 Agent MSI 0x0040 Legacy Link 0x0080 | Schedule 0x0100 Internal 0x1000 Subproduct 0x2000 15. Notification flag:w 16. Trouble ticket flag 17. Acknowledge on troubleticket flag 18. Message creation date and time (see B. for the format) 19. Message receipt date and time (see B. for the format) | 20. Unbuffer time 21. Severity UNKNOWN 0x01 NORMAL 0x02 WARNING 0x04 CRITICAL 0x08 MINOR 0x10 MAJOR 0x20 22. Status of the auto action Failed 2 Started 8 Finished 9 Defined 11 Undefined 12 23. Network Type of auto action node (see 4.) 24. IP address of the node where the auto action is executed (see A. +) 25. String length of the nodename where the auto action is executed 26. Nodename of the node where the auto action is executed 27. Auto action creates annotation flag 28. Acknowledge flag of the auto action 29. Status of the operator initiated action (see 15.) 30. Network Type of operator initiated action node (see 4.) 31. IP address of the node where the operator initiated action is ex +ecuted 32. String length of the nodename where the oper. initiated action i +s executed 33. Nodename of the node where the operator initiated action is exec +uted 34. Operator initiated action creates annotation flag 35. Acknowledge flag of the operator initiated action 36. Time and date when the message has been acknowledged (see B. for + the format) 37. String length of the operator who has acknowledged the message 38. Name of the operator who has_acknowledged the message 39. String length of message source 40. Message source 41. String length of application 42. Application 43. String length of messagegroup 44. Messagegroup 45. String length of object 46. Object 47. String length of notification service name(s) 48. Notification service name(s) 49. String length of auto action call 50. Auto action call 51. String length of operator initiated action call 52. Operator initiated action call 53. String length of message text 54. Message text 55. String length of original message text 56. Original message text 57. Number of annotations 58. String length of message type 59. Message type 60. Esclate Flag 61. Assign flag 62. Escalation type 63. Date and time when the message was escalated (see B. for the for +mat) 64. Network Type of escalation node (see 4.) 65. Escalation server IP address 66. String length of escalation server node name 67. Escalation server node name 68. String length of the operator who has escalated the message 69. Name of the operator who has escalated the message 70. Instruction type: No instruction 0 Instruction text 1 Instruction Interface 2 Internal instruction 3 71. Read only flag 72. Original message number (uuid) 73. Time difference in seconds between agent time zone and GMT 74. String length of instruction ID or name 75. Instruction ID, instruction interface name or message numbers of internal instructions (depends on instruction type) | 76. Length of Instruction Interface parameters 77. Instruction Interface parameters 78. String length of service name 79. Service name 80. String length of message key 81. Message key 82. Duplicate count 83. Date/time when last duplicate was received (see B. for the form +at). This field is 0 if message has no duplicates. 84. CMA count. Number of custom message attributes. For each CMA: 1. CMA record starter: "CMA" 2. String length of the CMA name 3. CMA name 4. String length of the CMA value 5. CMA value For each annotation: 1. Annotation record starter: "ANNO" 2. Date and time of the annotation (see B. for the format) 3. Annotation number 4. String length of the author of the annotation 5. Author of the annotation 6. String length of the annotation text 7. Annotation text A. All IP addresses are in binary format the following script can be used to convert the IP address: #cat convert.sh #!/bin/ksh # convert.sh # usage convert <IP_ADDRESS_IN_BINARY_FORMAT> OPC_IP_ADDR=$(echo $1| awk '{printf("%d.%d.%d.%d\n", \ ((int($1)/16777216)%256), \ ((int($1)/65536)%256), \ ((int($1)/256)%256), \ ((int($1))%256) \ )}') echo "$1 = ${OPC_IP_ADDR}" #end of convert.sh B. All time specifications are in seconds since 1.1.1970 GMT 1 Example data line ====> 064191a8-7db9-71e6-12cc-abbb01aa0000 45f86528-d563-71e0-03bd-8a2 +39ed50000 1 175337506 39 router174.network.microsoft.com 1 -141380770 +2 44 syslog152.network.microsoft.com 1 0 4 0 0 0 1474214430 147421443 +1 0 2 12 0 0 0 0 0 12 0 0 0 0 0 1474214431 3 OpC 22 GNS_IOS_SYSLOG_ +2(1.71) 35 SYSLOG-cisco-ios-RADIUS-SERVERALIVE 4 DATA 13 mxgamdrnb08e + 0 0 0 116 RADIUS-6-SERVERALIVE: Group ACCT_GROUP: Radius server 1 +7.24.174.55:1645,1646 is responding again (previously dead). 235 2016 +-09-18T10:59:45.932408-05:00 mxgamdrnb08e.microsoft.com local7.info 2 +1395: Sep 18 15:59:44.907 GMT: %RADIUS-6-SERVERALIVE: Group ACCT_GRO +UP: Radius server 17.24.174.55:1645,1646 is responding again (previou +sly dead). 0 0 0 0 0 0 0 0.0.0.0 0 0 0 0 0000000000000000000000000 +00000000000 18000 0 0 44 systlog152.network.microsoft.com 70 SYSLOG +:mxgamdrnb08e:RADIUS-SERVER_STATUS:17.24.174.55:1645,1646:good 0 1474 +214431 20 CMA 15 ATRIUM_CATEGORY 6 SWITCH CMA 13 ATRIUM_IMPACT 0 CMA + 17 ATRIUM_IP_ADDRESS 12 10.15.212.34 CMA 15 ATRIUM_MAILCODE 7 GA8-89 +5 CMA 19 ATRIUM_MANUFACTURER 5 CISCO CMA 17 ATRIUM_NODE_GROUP 50 MANA +GENOC DATA SITE TYPE A2 CSCTG62793_DISABLE_RD CMA 15 ATRIUM_PRIORITY + 10 PRIORITY_5 CMA 14 ATRIUM_PRODUCT 18 Catalyst 3560x-24P CMA 13 ATR +IUM_REGION 2 US CMA 17 ATRIUM_SITE_GROUP 5 US-GA CMA 14 ATRIUM_URGENC +Y 0 CMA 13 ATRIUM_ciName 12 MXGAWDRNB08E CMA 13 MSC_IN_ATRIUM 1 Y CM +A 11 EventSource 10 MS_Network CMA 15 REMEDY_ticketID 1 N CMA 14 cond +ition_name 55 SYSLOG-cisco-ios-RADIUS-SERVERALIVE (resolution) [1628] + CMA 15 gns.alarm.class 8 BreakFix CMA 15 gns.alarm.state 10 REGISTER +ED CMA 19 gns.alarm.subobject 22 17.24.174.55:1645,1646 CMA 25 gns.cm +db.auto.ticket.flag 4 none

Replies are listed 'Best First'.
Re^2: How to process variable length fields in delimited file.
by shmem (Chancellor) on Oct 06, 2016 at 20:10 UTC
    Here is the devil in the details of the true layout and an example of 1 data line

    The squirrel is always in the details, since the devil is a squirrel. But I can't help you here with the data you provided (only one record? seriously?) since in "39 router174.network.microsoft.com" - well, "router174.network.microsoft.com" is just 31 chars long, not 39. Even with a NULL terminator it would be 32 chars long, not 39. Hence, the following is just bull - you know, garbage in => garbage out.

    while (<>) { s/\r?\n//; # strip line endings # get field numbers and field description if (/\s{2,3}(\d+)\. (.+)/) { my ($number, $text) = ($1,$2); $number--; # since first element of an array is 0, not 1 # if this field denotes string length, store it if ($text =~ /string length/i) { push(@lengths, $number); } # remember field number and text (only if not previously seen) $names{$number} = $text unless $names{$number}; next; # nothing else to do for this line. } # now process the one line of data, if at hand if (/^====>/) { # Record Starter, right? # split line at whitespace my @array = split; # for all length indicators, concatenate # subsequent array elements into one # complain if the size doesn't fit for my $index (@lengths) { my $length = $array[$index]; my $string; my $counter = 1; while (length $string < $length) { # join array elements with space to rebuild the field $string = join " ", $string, $array[$index + $counter] +; warn "length mismatch for $string: $length <=> ".lengt +h $string,"\n" if length $string > $length + 1; } # weed out concatenated elements from array splice @array, $index + 1, $counter; } # done, output the fields for (sort {$a <=> $b} keys %names) { print "$names{$_}: $array[$_]\n"; } } }
    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

      You are correct, sorry about giving only 1 record. And some have additional info. Without getting too lengthy, I included 10 records. I am reviewing comments and proceeding.