Syscall Auditing in Production with Go-Audit

01 Nov 2018

Introduction

There’s a lot of interesting material on auditd/go-audit out there but in my opinion these are mostly toy examples that are good for exposure to auditd but don’t really paint an accurate picture of what it’s like to deploy it to production. I did this at work recently and found myself asking questions and dealing with things that these tutorials didn’t really cover. This post is going to be half braindump and half how-to, so it might get a little messy.

The goal of this post is to get you to a position where you can start using system call auditing to detect anomalies and odd behavior to create actionable security alerts and augment your incident response program.

What is Auditd?

Auditd is Linux’s audit daemon. It can be used to gather information from various parts of the linux kernel such as system call execution, selinux, and other anomalies. You can get a quick overview from audit.h but for now I’m just going to focus on syscall auditing. With auditd we can tell the kernel to record all executions of the syscalls we care about, including valuable context like its parameters, the process that executed it, and the user who launched that process.

Why go-audit?

I would check out Ryan Huber’s original blog post for more information. The first difference becomes immediately obvious when you try out both: the format the records come in. go-audit uses json which every modern log shipper/parsing platform has native support for. Traditional auditd on the other hand is a parsing nightmare of multiline key=value pairs which is workable but in my experience is difficult to get right and will still lead to debugging headaches, performance issues, and quite possibly a stroke. Additionally, go-audit is easy to configure with a straightforward yaml file. This makes keeping track of state simple as you no longer need to interact with auditctl commands just to figure out what’s going on.

Prerequisites

Before you even consider rolling out go-audit and auditd to production there’s a few things you need to have in place that I won’t go into much detail here:

an internal rpm/deb package repository to distribute the go-audit package
a config management system to centrally manage and deploy go-audit and configs
a logging pipeline (eg an ELK stack) that covers shipping, parsing, and searching logs
performance metrics to measure the non-zero impact this will have on your infrastructure

If you don’t have these four things auditd might not be the best use of your time. Working with auditd is not one of those “quick win” projects – it’s time consuming and difficult to get to an actionable state. For example you might want to look into deploying osquery before going down this route.

Starting Out

Let’s start out by taking inventory of the syscalls we care about for now. I’ll keep this list small so we don’t get overwhelmed with noise.

connect/listen: These syscalls are the building blocks for network IO. Whether it’s a malcious netcat listener, a reverse connect shell, or just plain regular behavior, it’ll all start here.
execve: This is the syscall used to launch new processes. It’ll be extremely noisy but it’s really useful for incident response, among other things.
ptrace/process_vm_readv/process_vm_writev: these syscalls are used for process introspection. Ideally these should not be taking place at all in your production environment so this won’t take much effort to turn into useful alerts.
open: This syscall is the building block for all file IO. It’s extremely noisy but we can filter it down based on what files are being opened or what’s being returned. The rules here are defined to log on file open failures due to permission errors (EPERM, EACCES) which may be a sign of someone already on the system doing some recon. This could easily be adjusted to log all file opens (successful and not) or even just particular files such as /etc/passwd being touched by a process that doesn’t need usually touch it (ie a web process)

Deploying These Changes

I’ll cover a quick snippet of a go-audit configuration that monitors the aforementioned syscalls.

output:
  stdout:
    enabled: true
    attempts: 1

filters:
  - syscall: 42
    message_type: 1306
    regex: saddr=(0200....7F|01)

rules:
  - -b 1024
  # 1. connect/listen: network io
  - -a exit,always -S connect -k netconns_out
  - -a exit,always -S listen  -k netconns_in
  # 2. execve: all new applications being executed
  - -a exit,always -S execve  -k execve
  # 3. ptrace/process_vm_readv/process_vm_writev: process introspection
  - -a exit,always -S ptrace  -F a0=16    -k ptrace_attach
  - -a exit,always -S ptrace  -F a0=16902 -k ptrace_seize
  - -a exit,always -S process_vm_readv    -k process_vm
  - -a exit,always -S process_vm_writev   -k process_vm
  # 4. open file failures
  - -a always,exit -S open -S openat -F exit=-EPERM  -k open_fail
  - -a always,exit -S open -S openat -F exit=-EACCES -k open_fail

Despite the huge difference in the types of syscall we’re monitoring, the rule format is largely the same:

-a exit,always: this tells auditd to log syscalls on function return and to always create an event for it
-S: the syscall name
-F: compare a syscall argument. I added this to ptrace because the first parameter is an enum. Because regular ptrace behavior (attaching, peeking, poking, stepping, etc) all use the same syscall even the most minimal use of ptrace will trigger a ton of events. We could log all ptrace calls regardless and filter them out at the alerting stage, but that’s up to you; I’ll cover that tradeoff below.
-k: a name for the filter so we can tell from the audit log which rule triggered it

Once you’ve got that saved and have go-audit running (go-audit -config file.yaml) you should start seeing entries fill up stdout. Some fun commands to generate log entries are nc -vlp 12345, nc google.com 80, gdb -p <pid> and so on. While you work on new rules it’s important to have example behavior that intentionally triggers them.

"manually formatted for readability"
"proctitle=gdb -p 13333"

{
    "sequence":1786,"timestamp":"1543629512.563",
    "messages": [
        {"type":1300,"data":"arch=c000003e syscall=101 success=yes exit=0 a0=10 a1=3415 a2=0 a3=0 items=0 ppid=12714 pid=14117 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts11 ses=4918 comm=\"gdb\" exe=\"/usr/bin/gdb\" key=\"ptrace_attach\""},
        {"type":1318,"data":"opid=13333 oauid=0 ouid=0 oses=4918 ocomm=\"python\""},{"type":1327,"data":"proctitle=676462002D70003133333333"}
    ],
    "uid_map":{"0":"root"}
}

"proctitle=nc google.com 80"
"saddr=67.207.67.3:53"
{
    "sequence":1936,"timestamp":"1543630887.101",
    "messages":[
        {"type":1300,"data":"arch=c000003e syscall=42 success=yes exit=0 a0=3 a1=7ff30b8fba94 a2=10 a3=12a items=0 ppid=12714 pid=14304 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts11 ses=4918 comm=\"nc\" exe=\"/bin/nc.openbsd\" key=\"netconns_out\""},
        {"type":1306,"data":"saddr=0200003543CF43030000000000000000"},
        {"type":1327,"data":"proctitle=6E6300676F6F676C652E636F6D003830"}
    ],
    "uid_map":{"0":"root"}
}

"proctitle=nc google.com 80"
"saddr=172.217.10.238:80"
{
    "sequence":1937,"timestamp":"1543630887.105",
    "messages":[
        {"type":1300,"data":"arch=c000003e syscall=42 success=yes exit=0 a0=3 a1=245e2e0 a2=10 a3=100007fffff0000 items=0 ppid=12714 pid=14304 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts11 ses=4918 comm=\"nc\" exe=\"/bin/nc.openbsd\" key=\"netconns_out\""},
        {"type":1306,"data":"saddr=02000050ACD90AEE0000000000000000"},
        {"type":1327,"data":"proctitle=6E6300676F6F676C652E636F6D003830"}],
    "uid_map":{"0":"root"}
}

I manually formatted these since they’re pretty long and the hex blobs might be a little difficult to read. The first record is from me using gdb to attach to another process. There’s a whole treasure trove of information here that’s valuable for anyone interested in IR; we can see who did it, who they did it as (uid/euid), the tty they used (which can be tied to an ssh connection, if applicable), the process that initiated the syscall, and whether it was successful or not. The next two lines have similar information but some added networking context; we have a dump of the saddr struct which gives us the IP and port of the source/destination (depending on who initiated the connection). These two lines were generated by me running nc google.com 80 – we can see the first entry is from the DNS lookup (to get the A record for google.com) and the second entry is the actual connection to google. Because these two syscalls were initiated by the same process we can see a lot of the same repeated information. This comes in handy when investigating because you can ask questions like “what did this shady looking process actually do?” and get a pretty big starting off point from this data.

Formatting for Better Alerting

In its unformatted and unfiltered state, go-audit’s records are only a small step above the default with auditd. go-audit’s maintainers insist on keeping its code at the absolute minimum for performance reasons which makes sense but is still a bit tedious because it means we need to introduce another moving part to the pipeline.

If you want to deploy go-audit to production you will absolutely need something to translate these to human readable values (ie something that parses saddr, proctitle, etc blobs). This isn’t just so you can understand the log lines faster (though it does help significantly!) You’ll need it to write maintainable and actionable alerts. For example, which whitelist would you rather maintain?

[
    '0200003543CF43030000000000000000',
    '02000050ACD90AEE0000000000000000',
    '02000016000000000000000000000000'
]

or

['67.207.67.3:53', '172.217.10.238:80', '0.0.0.0:22']

The top set is not ideal; it’s hard to read as a human and will take you longer to write and respond to alerts. The second one is immediately clear which hosts and ports are whitelisted.

If you use Logstash you’ll be happy to hear that there exists logstash-filter-goaudit that does exactly this. My project relied on fluentd which is written in Ruby so I was able to reuse most of that code (thanks Marcin!) Tools like this are great because they’re easy to drop in and immediately get a lot of value. After running the last audit record from the previous section through this plugin, here’s what we get:

{
  "@timestamp": "2018-12-01 02:21:27 +0000",
  "data": {
    "sequence": 1937,
    "unknown": [],
    "syscall": {
      "success": "yes",
      "exit": "0",
      "a0": "3",
      "a1": "245e2e0",
      "a2": "10",
      "a3": "100007fffff0000",
      "items": "0",
      "ppid": "12714",
      "pid": "14304",
      "auid": {"name": "root", "id": "0"},
      "uid": {"name": "root", "id": "0"},
      "gid": "0",
      "euid": {"name": "root", "id": "0"},
      "suid": {"name": "root", "id": "0"},
      "fsuid": {"name": "root", "id": "0"},
      "egid": "0",
      "sgid": "0",
      "fsgid": "0",
      "tty": "pts11",
      "key": "netconns_out",
      "arch": {"bits": 64, "endianness": "little", "name": "x86_64"},
      "id": "42",
      "session_id": "4918",
      "name": "connect",
      "command": "nc",
      "executable": "/bin/nc.openbsd"
    },
    "socket_address": {
      "family": "inet",
      "port": 80,
      "ip": "172.217.10.238",
      "unknown": "0000000000000000"
    },
    "proctitle": "nc google.com 80",
    "message": "root succeeded to connect to `172.217.10.238:80` via `/bin/nc.openbsd` as `nc`"
  },
  "error": null
}

So much better! We have a human readable summary (message + proctitle) and individual fields pulled out for us. Need to look for all accesses to an IP address and any port? Just filter for data.socket_address.ip = "<ip>". What about all failed connections made by root across the fleet, perhaps an attacker port scanning the internal network? data.syscall.name = "connect" data.syscall.uid.name = "root" data.syscall.exit != "0". This data was always accessible but now it’s easier to navigate thanks to the plugin.

Monitoring for Network Anomalies

Having go-audit is great for producing a lot of data but in many cases the only way to get actionable security alerts is to establish a baseline. For example, how do you distinguish a connect back shell as part of an exploit from a connection to an external API made by your application? With some SIEMs a lot of this comes for free; ie being able to ask questions like “have I seen this IP address before?” or “has this binary ever been seen anywhere?” Sometimes this isn’t available out of the box and you need to build tooling around it.

My approach for network connections was the following:

Collect a week’s worth of logs (after being processed similar to the previous section) from each type of machine you have.
Aggregate connection statistics so for each type of machine you have a list of connected IP addresses, ports, and the count of connections made to it.
Take a look at the outliers (one off connections or connections made very few times) and see if these need to be dropped. In my case these were atypical and were safe to drop (some, for example, were just me connecting to these machines directly for test purposes).
Take a look at the top connections and determine if they’re safe for whitelisting. If they are, you can instruct your alerting infrastructure to ignore connections to these ip addresses and particular ports.

Essentially you’re building a list of expected connections made by each server. This is difficult and error prone and for larger applications with more complex behavior, this may be unattainable. These steps are intentionally looseley defined and have no code because this part varies across different types of infrastructure; when I did this I leaned heavily on a lot of the operations tooling we already had in place so being hyper specific would likely be of no use to most people.

Alongside whitelisting connections so you get alerted on abnormal unexpected ones I was able to take advantage of the threat intelligence subscription (ie Crowdstrike, Fireeye, etc) we had. If you have access to one of these subscriptions it’s relatively low cost to use it here and immediately know when there’s an outbound connection to known bad IP addresses. Utilizing that data here will let you alert on high quality signals of malicious behavior such as connections to TOR, known C2 infrastructure, etc.

Detecting (Some) Exploitation

At the system call level, there are signals we can monitor for certain types of exploitation. Instead of writing alerts specific to some piece of software or known exploits we can leverage auditd to alert us to some of the primitives used in common memory corruption exploits.

In the example from earlier we configured auditd to log all execve calls. The obvious benefit of this is we have a complete listing of every single thing that runs on the system. However with the context thanks to auditd, we can actually look for shells that have been spawned as a result of a successful memory corruption exploit.

From the man pages, this is execve’s function prototype with a shell spawning example below it:

int execve(const char *filename, char *const argv[], char *const envp[]);
char* const argv = {"/bin/sh", NULL};
execve("/bin/sh", argv, envp);

One interesting thing here is that Linux will let you leave argv and envp as NULL and your code will still execute. An execve call with the second or third parameters NULL isn’t regular behavior… except when it’s from shellcode. Most off the shelf shellcode will leave these parameters NULL for the sake of complexity and space (setting registers to NULL vs writing shellcode that creates multiple null terminated string arrays). In fact you can take a look at shell-storm’s shellcode repository and you’ll find most shell spawning shellcode will have at least one NULL parameter. I’m not saying there will be no false positives but this behavior is unlikely to be exhibited from benign code.

You might ask “well, what’s stopping someone from writing better shellcode?” Nothing! However this will catch a lot of the low hanging fruit; the off the shelf shellcode and public exploits that otherwise you would have no way of knowing was happening.

In my case the alert I wrote to catch this behavior was:

data.syscall.name="execve" data.syscall.a0!="0" (data.syscall.a1="0" || data.syscall.a2="0")

Or, in plain english: alert me on an execve where the first parameter isn’t null but the second or third is. You can narrow this alert down to /bin/sh if you like but honestly this behavior is shady enough that it shouldn’t be happening at all.

Other exploitation primitives worth looking into are mmap or mprotect calls that mark memory pages as RWX; this is more likely to have false positives (ie many things that do JIT will allocate RWX pages) but it’s also not uncommon behavior for exploits as this lets them inject code into memory, whether that’s a multistage payload or injecting code into other processes. Either way it’s not great and it’s behavior you can alert on (ie mprotect(..., ..., PROT_READ|PROT_WRITE|PROT_EXEC))

Log Volume Trade-Offs (or: where do we filter?)

While working on this stuff you’ll notice in an even slightly complex infrastructure there’s a lot of room to do the same thing in different places. Do we run logstash-filter-goaudit on the node we’re monitoring? Do we run it on the log aggregator? Where do we drop uninteresting or irrelevant auditd records? What about disk space for all the things we do end up logging?

The answer is really boring: it depends. I’m not going to get into the capacity planning aspect of this problem but generally what you want to do is move as much alerting logic away from the monitored nodes as possible. The goal is to maximize visibility (for yourself) while minimizing the useful information that might tip off an attacker how to proceed next while keeping in mind log volume.

For example if an attacker ends up on one of your machines, the next logical step would be to do some recon and read all the alerting configuration. If they see execve(/usr/bin/curl, ...) or that you alert on traffic on non-standard ports (ie alerting on port not in [80, 443]) it would be very easy for them to switch up tactics and evade detection. However if you move all this logic to another server that requires another compromise, the only thing they’ll learn is that all execve and all connects are logged, but not necessarily anything about their parameters.

At a high level this is what the logging pipeline I designed looks like.

In this example, no special parsing or processing takes place at the individual node; they get their go-audit logs and ship them to the fluentd aggregator as soon as they possibly can. It’s at the aggregator where the magick happens; this is a dedicated place for fluentd-filter-goaudit to enrich the audit logs (with the added bonus of minimizing the CPU impact security logging will have on each host). If an attacker were to breach any of the individual nodes, they would have to also compromise an aggregator (which ideally would be enough to trigger an alert…) to truly figure out what type of events are being paid attention to.

Like I said, the right answer depends on the capacity of your infrastructure, and even then can still vary. For example in the above scenario, I found it made sense to drop uninteresting open syscalls at the node level just because it’s so much more volume than I care about. execve on the other hand is kept in tact and processed later in the pipeline because my definition of “interesting” here may change later. It’s better those are kept around just in case. You’ll see in my above example I’m doing the same thing with ptrace where I filter syscalls based on a0 on the node, as opposed to deeper in the pipeline.

When addressing dropping/filtering data early on there are a few questions you should ask:

What is the impact of not having this data? For example: Are we going to run into an issue where not logging 100% of all open syscalls affects an investigation? How confident are you with that answer? Are you sure at all?
What would an attacker gain knowing this data is being inspected/dropped? For example: If an attacker sees all PTRACE_ATTACH/SEIZE are being logged, how might their next steps change if at all? Would it be too late for them anyway? Are there other process manipulation techniques they might use or would their off the shelf tools not care?
If log volume is a concern, is it possible to keep the logs on disk but perhaps not searchable? In many scenarios it’s the log indexing that gets expensive. It might make sense to keep all the logs in one place and keep the security relevant alerting logs somewhere else. For example, you might want to keep all open syscalls on disk somewhere else but only pull down those logs when your SIEM suggests there’s a process doing something funky.

What’s next?

Documentation

This should go without saying but you need to have good documentation around your alerts. Documentation should cover every step:

Writing and maintaining the alert: As I mentioned earlier it’s important to have a list of commands that intentionally triggers the alerts you write so you have a better understanding of the behavior you’re monitoring in the first place. This is useful for when you revisit alerts (ie when you need to determine whether you need to keep it around) and write new ones.
Triaging the alert: If an auditd alert goes off, how is the person on call (not necessarily the person who worked on the alert!) supposed to triage it? Should they take a snapshot of the process? Are there any ancillary logs they should investigate? Alerts based on different syscalls may have different steps to triage so it’s important the responder knows how to react.

Reducing Attack Surface with Seccomp

All the alerts above serve the purpose of getting your attention to something bad happening as it happens. However when you reach a certain level of confidence that the behavior you’re alerting on no longer has false positives, it might be worth exploring blocking that behavior outright. For example, it’s great to know when someone is using gdb or any other memory introspection tool thanks to our ptrace alerts but realistically, that should never be happening in production. In cases like this where there’s no need for this functionality in production and that there are no false positives it’s worth exploring turning that behavior off entirely. What you want here is seccomp-bpf. There’s a good introduction to it here and here. I’m not going to go into details here but you can drastically reduce attack surface by deploying seccomp. In short what it can do is cut off access to syscalls you don’t need that could otherwise be used by attackers to escalate privileges (ie kernel exploits) or move laterally (ie attacking other processes or machines on the network). If you’ve got a good handle on what resources your applications need then this is a great next step.

Using a Firewall

A lot of my examples above involved inspecting connect and listen system calls. In cases where you already know what traffic should and shouldn’t look like, it might make more sense to just deploy a firewall. Of course just like with seccomp this requires fully understanding what your applications do on top of how the rest of the system behaves (eg can your package manager still function with restricted network access?) For example if your web server doesn’t make outbound connections, gets package updates from somewhere on the network, and only talks to database servers, you might be able to drop all traffic that doesn’t fit that description.

Gotchas

While working with auditd there are a few things you might run into that’ll cause some headaches if you don’t know what’s going on.

Only a single daemon can be connected to the audit netlink socket. What this means is the first process to subscribe to auditd notifications will be the only process with access to that information. If you find that go-audit isn’t writing any events, check that the default auditd daemon isn’t running or that you don’t have another go-audit process stashed inside a tmux session somewhere else.

go-audit (and auditd) does not play well with containers. If you’re using something like Docker or LXC you might find that go-audit and auditd will install and even start up cleanly but you won’t see any events. The solution is to run go-audit on the parent host, but even then you’ll still run into issues where audit records don’t tell you which container caused the event.

Use the -k flag when experimenting with rules. This marks the record with a string that makes it easy to grep for and quickly spot whenever your rule is hit. Even if you’re not doing anything complicated it at least makes incident response a little easier by letting you filter stuff you care or don’t care about.

ancat