** The most powerful system troubleshooting tools all tend to have one trait in common: a non-negligible learning curve. As the saying goes, if you want to drive a race car, you have to learn to drive stick*. Now to be fair, we designed sysdig from the ground up to be as intuitive and straightforward to use as possible. But to get the most out of the tool, it definitely helps to have an understanding of how system level information is presented in sysdig output. In this post, I will walk through some basic operating system elements and try to explain how they are represented by sysdig, including:
- Processes and Threads
- File Descriptors
- Events, System Calls and Switches
Sysdig output basics
Running sysdig without any filters will produce output that looks like this (colors added for easier explanation, below):
1 01:40:19.601363716 1 httpd (7513) > accept
2 01:40:19.601374197 1 httpd (7513) < accept fd=14(127.0.0.1:39175->127.0.0.1:80) tuple=127.0.0.1:39175->127.0.0.1:80 queuepct=0
3 01:40:19.601506564 1 httpd (7513) > read fd=14(127.0.0.1:39175->127.0.0.1:80) size=8000
4 01:40:19.601512497 1 httpd (7513) < read res=85 data=GET /textfile.txt HTTP/1.1..User-Agent: curl/7.35.0..Host: 127.0.0.1..Accept: */
5 01:40:19.601516976 0 httpd (3750) > switch next=0 pgft_maj=0 pgft_min=522 vm_size=350196 vm_rss=9304 vm_swap=0
6 01:40:19.601661779 1 httpd (7513) > open
7 01:40:19.601668329 1 httpd (7513) < open fd=15(/opt/lampp/htdocs/textfile.txt) name=/opt/lampp/htdocs/textfile.txt flags=1(O_RDONLY) mode=0
8 01:40:19.601699335 1 httpd (7513) > read fd=14(127.0.0.1:39175->127.0.0.1:80) size=8000
9 01:40:19.601701560 1 httpd (7513) < read res=-11(EAGAIN) data=
10 01:40:19.601855764 1 httpd (7513) > close fd=15(/opt/lampp/htdocs/textfile.txt)
11 01:40:19.601857490 1 httpd (7513) < close res=0** **
The fields printed by sysdig are:
- Incremental event number
- Event timestamp – customize this with the -t command line flag (more info)
- CPU ID
- Command name
- Thread ID
- Event direction – ‘>’ means ‘enter’, while ‘<’ means ‘exit’
- Event type
- Event arguments
Now let’s dive into this output a little further and see how some fundamental operating system elements are represented here by sysdig.
Processes and Threads
Threads are the core execution unit for the operating system and, as a consequence, for sysdig as well.Multiple threads can exist within the same process or command and share resources such as memory. This is clearly shown in the capture snippet above: note how all the events in the snippet are generated by the httpd command, but one of them (number 5) comes from a different thread, on a different CPU.
The thread ID (aka TID) is the basic identifier that you want to follow when tracking execution activity in your machine. You do that by just looking at the TID number, or by filtering out the noise with a command line like this one:
> sysdig thread.tid=7513
which will preserve only the execution flow for thread 7513.
Threads live inside processes, which are identified by a process ID (or PID). Most of the processes running on an average Linux box are single threaded, and in that case thread.tid is the same as proc.pid. Filtering by proc.pid is useful to observe how different threads interact with each other inside a process.
And here’s a couple of little tricks: if you want to see only events for single threaded processes, you can use this command line:
> sysdig proc.nchilds=0
While if you want to trace a specific user session or script, you can use:
> sysdig proc.apid = X
…where X is the pid of the shell running the script
You may have noticed that several of the events above have an argument called “fd”.
If a thread ID identifies a flow of execution, a file descriptor (FD) identifies an input/output flow.
In unix, a lot of stuff is abstracted as a file, including:
- network connections (sockets)
- standard input, standard output, and standard error
An FD is a numeric ID that uniquely identifies a file inside a process. This means that you can see the same FD number used more than once, but it will have to be in different processes. By following a process/FD combination, you can track specific I/O activity, and as usual you can let a filter help you:
> sysdig proc.name=httpd and fd.num=14
Sysdig is also nice enough to resolve FD numbers into human readable strings. For example, in the snippet above, notice how FD 14 is represented as 127.0.0.1:39175->127.0.0.1:80, which means:
- this is an IPv4 socket (4)
- the socket L4 protocol is TCP (t)
- the connection has been established by port 39175 on 127.0.0.1
- the destination of the connection is port 80 on 127.0.0.1
All of this information can conveniently be used in filters to isolate interesting activity. For example:
> sysdig fd.type=ipv4 > sysdig fd.l4proto=tcp > sysdig fd.sip=127.0.0.1 > sysdig fd.sport=39157
Most of the events that sysdig captures and displays are system calls. The system call interface includes a number of ‘functions’ that the operating system exports to the applications running on top of it to do stuff like open files, create network connections, read and write from an FD, and so on.
Usually, you can get more information about one of the sysdig events by typing:
> man 2 <event_name>
in a terminal or in google. Be aware, however, that differently from a tool like strace, sysdig doesn’t mirror the system calls’ syntax with absolute precision.** This means that, even if the system call manual page will give you a very good indication of what the event does and what its arguments mean, there might be discrepancies between the original system call and the corresponding sysdig event. If you want to know what the arguments are for any event, type:
> sysdig -L
As you’ll see if you run this command, sysdig exports many events. However, some of them tend to be more useful for debugging or exploration. For example:
- clone() and execve() give you insight into process creation and command execution.
- open(), close(), and the FD read and write functions offer visibility on disk I/O.
- socket(), connect(), and accept() give insight into network activity.
If you are interested in what these system calls do and how to interpret them, come back to the sysdig blog tomorrow for a whole post focused on that.
In the meantime, there’s an event that I want to mention because people sometimes get confused about it: switch.
The Switch Event
Some people don’t realize that sysdig is not limited to capturing system calls. Sysdig taps into the kernel to report other types of events as well. The switch event, for example, is generated every time there is a context switch, i.e. when the process scheduler puts a thread to sleep to execute another one. Observing scheduler activity is useful for scenarios such as determining when a process/thread runs with high granularity, debugging process synchronization issues, and monitoring when a thread is migrated to a different CPU.
switch is not a system call, so you won’t find a manual page for it. It’s also one of the few sysdig events that doesn’t have an exit event, so you will only see enter as a direction. switch can also be annoying at times, because sysdig tends to print many of them even on systems with relatively low activity. Fortunately, it’s easy to filter it out if you don’t need it:
> sysdig evt.type!=switch
Processes, threads, file descriptors, system calls and context switches are some of the fundamental elements of the Linux operating system. Hopefully it’s more clear now how sysdig represents information about these elements, and how you can use sysdig to explore your system in some pretty powerful ways. ;) If you’d like to dive into even more information on sysdig, be sure to check out the sysdig User Guide on our wiki.
And now that you understand what all that sysdig output on your screen means, it’s time to explore which system calls to monitor and what useful information they hide. More on this in my next post tomorrow – so be sure to come back!
And remember to follow us on twitter for cool sysdig commands with the #DigoftheDay.
**This might not be a saying per se, but my friend Gerald Combs did just say it… so I guess it’s a saying now. Thanks Gerald! :)*
***In case you’re wondering, the main reason for this is performance. sysdig’s primary design goal is efficiency, so you can use it in production with minimal overhead. This puts constraints on the order the arguments are returned, to minimize page fault probability. In addition, sysdig adds some useful arguments to some of the system calls. For example, memory-related events like brk or mmap return the total process memory usage, which tends to be handy when troubleshooting memory-related issues. *