in reply to Re: Multi-CPU when reading STDIN and small tasks
in thread Multi-CPU when reading STDIN and small tasks

Each line is an individual syslog message which means that related lines will rarely be consecutive though they all related lines for an event should arrive in a very short time. In this data example, there are two separate events which need their data related but the events themselves are independent of each other. The events can be identified by 1) node=xxxxxxxxxx; and 2) node=aaaaaaaaaa
node=xxxxxxxxxx type=SYSCALL msg=audit(1485583201.776:5485082): arch=c000003e syscall=82 per=400000 success=yes exit=0 a0=7fc164006990 a1=7fc164006b70 a2=7fc164006b70 a3=7fc230853278 items=4 ppid=xxxxx pid=xxxxx auid=xxxxx uid=xxxxx gid=xxxxx euid=xxxxx suid=xxxxx fsuid=xxxxx egid=xxxxx sgid=xxxxx fsgid=xxxxx tty=(none) ses=4294967295 comm="somecommand" exe="/full/path/to/somecommand" key="delete"
node=xxxxxxxxxx type=CWD msg=audit(1485583201.776:5485082):  cwd="/another/cwd"
node=aaaaaaaaaa type=SYSCALL msg=audit(1485583203.459:5485148): arch=c000003e syscall=59 success=no exit=-2 a0=7f30b9d87149 a1=7f30b9d86860 a2=7f30b9d86bd8 a3=7f30b9d9c8c0 items=1 ppid=xxxxx pid=xxxxx auid=xxxxx uid=xxxxx gid=xxxxx euid=xxxxx suid=xxxxx fsuid=xxxxx egid=xxxxx sgid=xxxxx fsgid=xxxxx tty=(none) ses=16439 comm="command" exe="/bin/ksh93" key="cmdlineExecution"
node=xxxxxxxxxx type=PATH msg=audit(1485583201.776:5485082): item=0 name="arg-data-0" inode=268805 dev=fd:14 mode=040740 ouid=xxxxx ogid=xxxxx rdev=00:00 nametype=PARENT
node=aaaaaaaaaa type=CWD msg=audit(1485583203.459:5485148):  cwd="/a/cwd"
node=xxxxxxxxxx type=PATH msg=audit(1485583201.776:5485082): item=1 name="arg-data-1" inode=268805 dev=fd:14 mode=040740 ouid=xxxxx ogid=xxxxx rdev=00:00 nametype=PARENT
node=aaaaaaaaaa type=PATH msg=audit(1485583203.459:5485148): item=0 name="/etc/uname" nametype=UNKNOWN
node=xxxxxxxxxx type=PATH msg=audit(1485583201.776:5485082): item=2 name="arg-data-2" inode=269256 dev=fd:14 mode=0100640 ouid=xxxxx ogid=xxxxx rdev=00:00 nametype=DELETE
node=aaaaaaaaaa type=EOE msg=audit(1485583203.459:5485148):
node=xxxxxxxxxx type=PATH msg=audit(1485583201.776:5485082): item=3 name="arg-data-3" inode=269256 dev=fd:14 mode=0100640 ouid=xxxxx ogid=xxxxx rdev=00:00 nametype=CREATE
node=xxxxxxxxxx type=EOE msg=audit(1485583201.776:5485082):
  • Comment on Re^2: Multi-CPU when reading STDIN and small tasks

Replies are listed 'Best First'.
Re^3: Multi-CPU when reading STDIN and small tasks
by BrowserUk (Patriarch) on Jan 28, 2017 at 22:55 UTC

    Is it correct to assume that:

    1. the audit parameters (timestamp+???) on all the parts of a single multiline entry from a given node will be the same?
    2. When a line is received from a node with different audit parameters to the last line received from that node, that previous multi-line entry is complete?

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I don't think the assumptions are correct. Let me attempt to clarify/correct.

      1. Each event type (these both happen to be SYSCALL) with each type having it's own definition in both the number of lines making up the event and the content of each. Most event types contain an End of Event marker (the type=EOE) indicating that all of the data for that event has been sent. This is not 100% true as at least one event is only a single line with no EOE (one use of the expiry). Each node will have any and all of the different event types
      2. The multi-line entry (I may interchange this with "event" but we are referring to the same thing) is complete when the EOE record has been received or ?? for the event types which do not contain an EOE line. I haven't attempted to look up all of the event types and their definitions as I assumed that would be more heavy on the processing side to match on more patterns.

      Additional maybe-useful information:

      • The unique key is a combination of the node and the part after the colon in the msg=audit() section.
      • I'm trying to get from the start to this state (below) as quickly as possible and if constrained to a single thread, with the least amount of CPU usage. The "dedupe" regex seems to account for roughly 50% of the time in the loose testing I've performed. It also contributes to some of the CPU usage but I feel it's less than 50% of the CPU from observation of before/after adding it.

      If this helps, this is the final result of the processing (The new line added for readability and to signify that a new line is being started):

      node=xxxxxxxxxx type=SYSCALL msg=audit(1485583201.776:5485082): arch=c000003e syscall=82 per=400000 success=yes exit=0 a0=7fc164006990 a1=7fc164006b70 a2=7fc164006b70 a3=7fc230853278 items=4 ppid=xxxxx pid=xxxxx auid=xxxxx uid=xxxxx gid=xxxxx euid=xxxxx suid=xxxxx fsuid=xxxxx egid=xxxxx sgid=xxxxx fsgid=xxxxx tty=(none) ses=4294967295 comm="somecommand" exe="/full/path/to/somecommand" key="delete" type=CWD  cwd="/another/cwd" type=PATH item=0 name="arg-data-0" inode=268805 dev=fd:14 mode=040740 ouid=xxxxx ogid=xxxxx rdev=00:00 nametype=PARENT item=1 name="arg-data-1" item=2 name="arg-data-2" inode=269256 mode=0100640 nametype=DELETE item=3 name="arg-data-3" nametype=CREATE
      
      node=aaaaaaaaaa type=SYSCALL msg=audit(1485583203.459:5485148): arch=c000003e syscall=59 success=no exit=-2 a0=7f30b9d87149 a1=7f30b9d86860 a2=7f30b9d86bd8 a3=7f30b9d9c8c0 items=1 ppid=xxxxx pid=xxxxx auid=xxxxx uid=xxxxx gid=xxxxx euid=xxxxx suid=xxxxx fsuid=xxxxx egid=xxxxx sgid=xxxxx fsgid=xxxxx tty=(none) ses=16439 comm="command" exe="/bin/ksh93" key="cmdlineExecution" type=CWD  cwd="/a/cwd" type=PATH item=0 name="/etc/uname" nametype=UNKNOWN
      

        Hm. According to Red Hat Audit log files, assuming you and they are talking about the same thing, my assumptions are correct.

        If they are, your processing can be sped up considerably by utilising them.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.