I recently run into an issue regarding an embedded system already deployed in the field for several customers.
Those systems were running a daemon from us which sometimes at some point after a few hours or days was reported to have stopped doing the task it was expected to be doing.
A more fine-grained analysis showed that the daemon process was still running, and it was not a good sign to see that it was taking 100% of the CPU, even in scenarios in which that daemon should have been doing almost no work at all.
The system reported that all the CPU load from the process was being spent in userspace, which looked even more strange, because that daemon was expected to be offloading most of its work to the kernel. That quickly pointed the issue to be some kind of infinite tight loop in userspace which was calling no system calls, probably due to some data structure memory corruption which we were unable to trigger easily during our test phase. A quick check with strace
showing no syscalls being issued for the thread taking 100% of the CPU confirmed the suspicion.
We knew which kind of issue we had in front of us, but how to really find out where the issue was occurring? If we had known how to trigger it, we could have reproduced it in our test environment in which we have plenty of tools to analyze what's going on: modifying the binary to print more information, using a debugger, etc. However, we were so far able to see it only in production environments, and that meant we needed to find a way to get at least some minimal information to know where to start looking at the issue. The problem with the production environment, which is used for customers, is that it contains no analysis or debugging tools in it which makes it quite difficult to find where the program is actually stuck.
Luckily for us, the Linux kernel provides us with some information for processes running which can be really useful for us in this scenario. I am talking about /proc/$PID/task/$TID/stat
file here. This file contains information related to a specific thread $TID
which is part of process $PID
. You can find more information on the kind of information this file provides by looking at function fs/proc/array.c:proc_tid_stat
in the kernel. More specifically, we are interested in the 30th value appearing in that file, which provides us with the EIP
value of the thread specified from the path of the file. The EIP
contains the address of memory being executed by the thread, which means that in general if we print that value from time to time we should be able to follow which of our source code the process is executing, provided of course that we are able to transform those in-memory process addresses to source code filename & line references.
And of course, we can do that too, and the best gain from this process is that we don't need to install specific tools into the production system. We can usually do all the conversion from our PC just taking some small information from the running system for late post-processing. We basically need two things from the running production system:
- List of
EIP
values from the offending thread, obtained as already explained above. - Content of the file
proc/$PID/maps
from the same process of the offending thread. We will explain later why is this content sometimes needed.
In our system, we will mainly need:
addr2line
tool (part ofbinutils
)- binary file of the offending process compiled with debug symbols (compiled with
-g
flags).
Now, what's the process involved to convert the running in-memory address into some file:line information I can use to look at corresponding code? In general, it's really easy, just call addr2line
like this:
addr2line -a -p -C -f -i -e "$EXEFILE" $a
For instance, let's say our offending process comes from a binary called infinite
and we found it's been running at some point at address 400597
:
$ addr2line -a -p -C -f -i -e infinite 400597
0x0000000000400597: infinite_function at infinite.c:19 (discriminator 1)
Cool! we now know the infinite loop scope includes infinite.c
line 19
. We can just pass a few more addresses to have a better idea on which is the code being called the most.
And now you may be wondering… why the hell do I need that /proc/$PID/maps
file?! Well… because unfortunately this is not as easy as shown here for all scenarios. This simple scenario only covers the case in which the process is running code from the main binary. However, if the process is running some code inside some shared library at the time we record the EIP
value, a bit more work must be done.
First of all, we need to generally find out to which module that process' in-memory code address belongs too. That's the kind of information that maps
file provides. Let's have a look at an example with the infinite
binary being run:
$ cat /proc/$(pidof infinite)/maps
00400000-00401000 r-xp 00000000 08:03 3546387 /infinite
00600000-00601000 r--p 00000000 08:03 3546387 /infinite
00601000-00602000 rw-p 00001000 08:03 3546387 /infinite
020ff000-02120000 rw-p 00000000 00:00 0 [heap]
7f334b793000-7f334b928000 r-xp 00000000 08:03 658652 /usr/lib/libc-2.24.so
7f334b928000-7f334bb27000 ---p 00195000 08:03 658652 /usr/lib/libc-2.24.so
7f334bb27000-7f334bb2b000 r--p 00194000 08:03 658652 /usr/lib/libc-2.24.so
7f334bb2b000-7f334bb2d000 rw-p 00198000 08:03 658652 /usr/lib/libc-2.24.so
7f334bb2d000-7f334bb31000 rw-p 00000000 00:00 0
7f334bb31000-7f334bb54000 r-xp 00000000 08:03 658651 /usr/lib/ld-2.24.so
7f334bd1b000-7f334bd1d000 rw-p 00000000 00:00 0
7f334bd53000-7f334bd54000 r--p 00022000 08:03 658651 /usr/lib/ld-2.24.so
7f334bd54000-7f334bd55000 rw-p 00023000 08:03 658651 /usr/lib/ld-2.24.so
7f334bd55000-7f334bd56000 rw-p 00000000 00:00 0
7ffc57345000-7ffc57366000 rw-p 00000000 00:00 0 [stack]
7ffc573c5000-7ffc573c7000 r--p 00000000 00:00 0 [vvar]
7ffc573c7000-7ffc573c9000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Let's now imagine we record an address 7f334b84afe0
. The maps
shows that code section of libc-2.24.so
(code section can be identified because it contains the x
executable flag) is mapped to the process address space at addresses starting from 7f334b793000
going up to 7f334b928000
. As our recorded address falls between those 2 values, it means it's inside the code section of libc.
Once we know that code comes from libc and not from our main binary, we know we need to inspect that binary file instead of the one from the main binary when using addr2line
or other tools like objdump
or nm
.
When using those tools, then we need to remember we should usually use the offset from the starting point of that module rather than using the memory address we recorded from our production system. That means, for instance using last example, that instead of 0x7f334b84afe0
we may want to look at offset 0xB7FE0
inside that module (0x7f334b84afe0
– 0x7f334b793000
). Let's see the difference in this scenario:
$ addr2line -a -p -C -f -i -e ./infinite 7f334b84afe0
0x00007f334b84afe0: ?? ??:0
$ addr2line -a -p -C -f -i -e /usr/lib/libc-2.24.so 7f334b84afe0
0x00007f334b84afe0: ?? ??:0
$ addr2line -a -p -C -f -i -e /usr/lib/libc-2.24.so B7FE0
0x00000000000b7fe0: __nanosleep_nocancel at :?
We can see the last one using the offset starting from the libc code section works correctly, as we were actually calling usleep()
in the code. There is no source information (:?
is displayed instead of filename & line) because the libc I used for the experiment does not contain debug information (hence we see here too that having the binary in your PC compiled with the -g
is important to get all the information).
You can find a testbed environment to play with what is discussed here in the following repository I built up: https://github.com/pespin/trace-eip