Native Application Tracing

h2hack

We frequently conduct security assessments on embedded Linux devices. After initially rooting these devices, we can dive deeper into their components, often discovering monolithic software responsible for managing inter-process communication (IPC) and services ranging from IEC to web servers. However, these binaries tend to be large, making it challenging to identify the initial execution path of attacker controlled input data. This post outlines several strategies that have proven effective when trying to map these paths.

    
               ┌─┐                                                        
               └─┼──────────────────┐                 │                   
                 └──────┐           │                 │                   
                        │           │                 │                   
                        │           │                 │   │               
                        │           │                 │   │               
                        │           │                 │   │       │       
                        │           │                 │   │       │       
                        │      ┌────┼──────@          │   │       ▼       
                        │      │    │                 ▼   │               
                        │      │    │                     │    │          
       ┌──────┐         │      │    │                   │ │    │          
       │      │         │      │    │                   │ ▼    │          
       │      │         │      │    │                   │      │          
0──────┘      │         │      │    │                   │      │          
              │         │    ┌─┼────┼┐                  │      │          
              │   ┌─────┘    │ │    └┘                  │      │          
              │   └─┐        │ │                        │      │          
         ┌────┴     │        └─┘                        │      │          
         │          │                                   │      │          
         └──────────┘                                   │      │          
                                                        ▼      │          
                                                               │          
                                                               │          
                                                               ▼

Basic Blocks

By tracking the number of executed instructions and dividing that by the total number of instructions, you obtain a metric known as code coverage. Coverage can help you giving you a general clue of useful code in your binary. If you also recorded the program counter, you are able to reconstruct the paths taken in your target. However, context switching after every instruction can be quite resource-intensive.

To optimize this, we can leverage the fact that code segments without conditional statements do not branch. When considering two points, A and B, the sequence of instructions executed between these points, where no branching occurs, is referred to as a basic block. By instrumenting all basic blocks, we can reconstruct the program execution path without losing any tracing information.

    
                        [ BASIC BLOCK EXAMPLES ]


arm:             mips:               ppc:             x86:               
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
│MOV R0, #1    │ │li  $t0, 1       │ │li  r3, 1     │ │mov eax, 1       │
│ADD R1, R0, #2│ │add $t1, $t0, 2  │ │add r4, r3, 2 │ │add ebx, eax, 2  │
│SUB R2, R1, R0│ │sub $t2, $t1, $t0│ │sub r5, r4, r3│ │sub ecx, ebx, eax│
│BX LR         │ │jr $ra           │ │blr           │ │ret              │
└──────────────┘ └─────────────────┘ └──────────────┘ └─────────────────┘

Strategies

Recording basic blocks provides excellent coverage and insight into a program's execution flow, but it comes with a significant drawback. Depending on the size of the binary, a program might consist of tens of thousands of basic blocks. Instrumenting every single block can therefore still have a substantial impact on performance.

This issue becomes even more critical when working with embedded systems. Such systems often have limited hardware resources and may run on real-time kernels like Linux-RT. These environments impose strict timing constraints, and any disruption to the execution flow can cause the system to miss deadlines, often resulting in program termination.

To mitigate these challenges, several strategies can be employed. One effective approach is to reduce the number of tracepoints. This can be achieved by splitting the tracing process into multiple stages. For example, you can start with a higher-level abstraction by only intercepting the first basic block of each function. Once you identify areas of interest, you can refine the tracing to focus on the basic blocks within those specific functions.

Another important consideration is the type of tracepoint used. The simplest method for capturing trace data is setting a breakpoint at the desired location and recording the program counter (PC) when a trap signal occurs. While breakpoints are straightforward and relatively cost-efficient in terms of setup, handling the associated signals can be performance-intensive. This makes breakpoints a trade-off between performance impact and the level of detail captured.

A more advanced alternative is to use the technique described by Frida's Stalker Engine. Stalker dynamically copies the instructions about to execute and interlaces them with custom logging code. This approach offers flexibility, as you can tailor the level of overhead based on your logging requirements. Additionally, since the original instructions are preserved, checksum calculations remain accurate, even after instrumentation.

An example of this technique is visualized below.

    
                ┌──────────────────────┐                
                │00001000 push ebp     │                
                │00001001 mov ebp, esp │                
                │00001003 call 1234    │                
                │00001008 mov esp, ebp │                
                │0000100A pop ebp      │                
                │0000100B ret          │                
                └───────────┬──────────┘                
                            │ Copy and                  
                            │ Interlace                 
┌───────────────────────────┴──────────────────────────┐
│00004000 call log_handler                             │
│00004005 push ebp                                     │
│00004006 call log_handler                             │
│0000400B mov ebp, esp                                 │
│0000400D call log_handler                             │
│00004012 push 00001008 ; CALL STACK side-effect       │
│00004013 push 1234     ; arg2/2: branch target        │
│00004014 push exec_ctx ; arg1/2: execution context    │
│00004019 call gum_exec_ctx_replace_current_block_with │
└──────────────────────────────────────────────────────┘

However, we have observed that the setup time of copying the instructions can take significantly more time than inserting breakpoints, making them oftentimes not feasible for systems with tight timing constraints.

So how can we trace a program?

FRIDA

When your target allows it, this is probably the best method of instrumenting a binary. At the time of writing, Frida supports the following architectures on the Linux platform to varying degrees:

x86/x64
arm64
armhf
mips
mips64
mips64el
mipsel

Even though this list is quite impressive, we encountered various issues when attempting to run frida-server on embedded devices. One notable issue is that the armhf version is compiled for armv7, which means the binary cannot run on older ARM CPUs.

The following example demonstrates the basic usage of frida-trace. In this scenario, we connect to a frida-server running on port 1337/TCP. The target process, identified by pid 456, loads the module libar.so.

If the target application lacks symbol information, you can use the [MODULE!OFFSET] format to specify the function offset you want to trace.

$ frida-trace -H 192.168.19.67:1337 -i libar.so!some_function -p 456
$ frida-trace -H 192.168.19.67:1337 -a target_wo_symbols!0x59eb0 -p 456

Once the function is traced, you can instrument it using the JavaScript handler located at __handlers__/module/functions.js.

$ cat __handlers__/libc.so.6/recv.js
defineHandler({
    onEnter(log, args, state) {
        this.buffer = args[1];
        this.size = args[2].toInt32();
        this.fd = args[0];
    },

    onLeave(log, retval, state) {
        if (retval.toInt32() > 0) {
            var buf = this.buffer.readByteArray(retval.toInt32());
            log(`recv(sockfd=${this.fd}, buf=${this.buf}, len=${this.size})`);
            log(hexdump(buf));
        }
    }
});

The code above shows how to instrument recv to print the buffer content directly to stdout:

$ frida-trace -H 192.168.19.67:1337 -i recv -i some_other_function -p 456
Instrumenting...
recv: Loaded handler at "./__handlers__/libc.so.6/recv.js"
Started tracing 3 functions. Web UI available at http://localhost:1337/
[...]
        /* TID 0x2dc */
12462 ms  recv(sockfd=0x84, buf=undefined, len=1024)
12462 ms
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000010  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000020  41 0a

frida-trace is an excellent tool for rapid development but lacks flexibility. If you have special requirements or need custom behavior when a tracepoint is hit, you can use the Interceptor interface instead.

var functions = [];
functions.push({name: 'frame_dummy', start: 0x1180});
functions.push({name: 'main', start: 0x1189});
[...]
functions.push({name: '_ZN4BirdC1Ev', start: 0x155a});
functions.push({name: '_fini', start: 0x15b4});

var base = Module.getBaseAddress('a.out');

var count = 0;
console.log("[!] Base: " + base);

functions.forEach(entry => {
    var func = ptr(entry.start + parseInt(base,16));
    Interceptor.attach(func, {
        onEnter: function (args) {
            var func_addr = this.context.pc;
            console.log("[*] " + func_addr);
        }
    });
    count++;
});
console.log("[+] attached functions: " + count);

$ frida -l frida_trace.js ./a.out
Spawning ./a.out...
[!] Base: 0x572bfe877000
[+] attached functions: 28
[...]
[*] 0x572bfe878189
[*] 0x572bfe87849e
[*] 0x572bfe878480
[*] 0x572bfe8782a8
[*] 0x572bfe878090
[*] 0x572bfe878070
[*] 0x572bfe878418
[*] 0x572bfe878090
[*] 0x572bfe878090

To use Stalker for basic block tracing, you can leverage a script written by yrp to generate drcov-based output. Keep in mind that this script only works for local usage. If you need to attach to a remote server, you can apply the following patch:

--- frida-drcov.py  2024-11-28 16:32:25.125664889 +0100
+++ frida-drcov.py.remote   2024-10-02 16:53:26.071038695 +0200
@@ -279,12 +279,18 @@
    parser.add_argument('-D', '--device',
            help='select a device by id [local]',
            default='local')
+    parser.add_argument('-H', '--host',
+            help='select a device by host')

    args = parser.parse_args()

    outfile = args.outfile

-    device = frida.get_device(args.device)
+    if args.host:
+        device = frida.get_device_manager().add_remote_device(args.host)
+    else:
+        device = frida.get_device(args.device)
+

    target = -1
    for p in device.enumerate_processes():

Breakpoint Based

You can also write your own tracer using ptrace(2) to attach to a tracee and examine or modify its memory as needed. If you have specific requirements or are simply curious about this approach, consider checking out the ftrace implementation by finixbit or the blog series by Kain.

An easy alternative is to leverage the abstraction layer provided by gdb and write a script to insert and manage breakpoints. We adapted this idea from cy1337, who developed a Ghidra plugin for this purpose. Our port to Binary Ninja can be found here.

Keep in mind that the overhead introduced by gdb makes this approach more suitable for function tracing. In any non-trivial binary, the sheer number of basic blocks can make this impractical. Nonetheless, this is an effective and relatively fast way to gather tracing information, as long as you have a gdbserver running with python support.

$ gdb -q -x gdb_script.py binary
[ ... ]
Function: entry
Function: FUN_00136d20
Function: _DT_INIT
Function: _INIT_0
Function: FUN_001347e0
Function: FUN_00136a55
[ ... ]

Other

While these solutions might not work on embedded Linux devices, they are still valuable to know if you need to trace a desktop application.

Intel PIN

Pin is a dynamic binary instrumentation framework designed for Intel-based x86-64 CPUs. Originally created as a tool for computer architecture analysis, it has been repurposed to build various helpful utilities, such as CodeCoverage by Gianni. As the name suggests, this pintool collects code coverage data in a format compatible with Lighthouse.

Usage example:

$ pin -t obj-intel64/CodeCoverage.dylib -- ./test
CodeCoverage tool by Agustin Gianni (agustingianni@gmail.com)
White-listed images not specified, instrumenting every module by default.
Logging code coverage information to: trace.log
Loaded image: 0x0000000101bf1000:0x0000000101bf1fff -> test
Loaded image: 0x00007fff6d167000:0x00007fff6d1dafff -> dyld
Loaded image: 0x00007fff94b07000:0x00007fff94b5afff -> libc++.1.dylib
Loaded image: 0x00007fff942fa000:0x00007fff942fbfff -> libSystem.B.dylib
Loaded image: 0x00007fff8bf30000:0x00007fff8bf59fff -> libc++abi.dylib
Loaded image: 0x00007fff875ac000:0x00007fff875b0fff -> libcache.dylib

DynamoRIO

DynamoRIO is a robust library for runtime code manipulation. It is specifically designed for profiling and instrumentation, making it an excellent choice for generating coverage data. DynamoRIO includes a built-in tracer called drcov, and it currently supports x86-64, armhf, and aarch64.

Although DynamoRIO’s API is versatile and comes with multiple examples, you still need to write your own tools. For instance, axtaxt created a simple hit tracer with about 150 lines of code. However, using Frida’s JavaScript API, similar functionality could be achieved with less than 30 lines.

DynamoRIO has its specific use cases, but in our experience, we often opted for Frida due to its flexibility and rapid development capabilities.

Debugger

Some debuggers support trace recording. While these traces are often imprecise and difficult to analyze, they can still be useful when other methods fail.

For IDA Pro, a manual is available for its trace functionality, which supports most architectures. However, we have encountered numerous issues, such as incomplete traces and skipped instructions or functions, making it unreliable in many cases.

Other tools, such as CheatEngine, offer tracing capabilities with features like Ultimap and Ultimap 2. Similar to Intel PIN, Ultimap 2 works exclusively with Intel processors. Additionally, x64dbg, ollydbg, and Immunity Debugger provide tracing support to varying extents.

Conclusion

Tracing and instrumenting binaries is a powerful skill for understanding program behavior during reverse engineering. There are numerous tools and techniques available, each with its own set of limitation.

For embedded systems, where performance and timing constraints can be critical, lightweight and tailored tracing techniques should be used. Ultimately, the best approach depends on your specific requirements, including the target platform, performance constraints, and the granularity of data you need.

In the next post we will demonstrate how to use these techniques for finding security bugs in firmware with a real-world example.

    
                                                      
─────────────────►     ┌───────┐                   
                ───────┘       └──────┬─┐          
       ─────────────────────►  ─────► │ └►         
            ┌─────────────────►       └───────────►
            │                                      
      ──────┘

201224