Is the Secure Kernel really secure? Diving into Apple's Exclave world with the GLx Research Platform
Introduction
Apple's guardian over Exclaves is an L4-based microkernel called cL4. That seems to be the official name in various places, including the XNU source code, but it is also referenced as the Secure Kernel (SK). Not much is publicly known about this kernel because it is locked away in a binary bundle and there is no mechanism to introspect iOS or macOS devices at that level. This is one of the main reasons why we developed our GLx Research Platform. It enables us to run the Secure Kernel inside a controlled hypervisor environment and attach a debugger or other introspection tools.
During our research into the Secure Kernel we repeatedly wondered why it is called the “Secure Kernel”. Aside from being secure from the prying eyes of security researchers and defenders, there does not seem to be much “secure” about it. In this post we want to share what we mean by this, and why we were surprised while reading its code.
Experiment
The Secure Kernel runs in the Guarded World at GL1, one level below the Secure Page Table Monitor (SPTM). It is the guardian over the Exclave world, which runs across many different address spaces inside GL0, which is kind of like the userland of the Guarded World. For our experiment we applied a small patch to the code inside the "sharedcache" component that hosts, among other things, the Exclave scheduler. The purpose of this patch was to demonstrate how any code running inside Exclave userland can cause the Secure Kernel to crash. This experiment simulates what an attacker could do if they achieved a memory corruption bug or code execution in any Exclave userland component.
Triggering a Crash
When we run the patched binaries the GLx Research Platform will immediately return informing us about a panic triggered by the Secure Kernel.
% ./glx-runner --cl4=patchedcl4 --silent SPTM Panic: SK panic request: exception.c:35 XRT Panic Buffer:
We get a panic from SK with the string "exception.c:35". This is the panic string SK emits whenever there is an unhandled exception at the SK execution level.
Attaching the Debugger
Obviously that panic does not reveal much information about what happened. So we run the whole thing again, but this time we tell the GLx Research Platform to start a GDB stub and to hook into the native exception handler.
% ./glx-runner --cl4=patchedcl4 --gdb 1234 --gdb-wait --hook-cl4-exception-handlers [gdb] listening on 127.0.0.1:1234 [gdb] waiting for debugger connection
Now we can attach LLDB and see what is going on.
% lldb (lldb) gdb-remote 127.0.0.1:1234 Unable to find file at address 0xfffff8001fbd0000 Process 1 stopped * thread #1, stop reason = signal SIGTRAP frame #0: 0x0000000009ffc000 kernel -> 0x9ffc000: b 0x9ffc800 ; start 0x9ffc004: udf #0x0 0x9ffc008: ; unknown opcode 0x9ffc00c: ; unknown opcode Target 0: (kernel) stopped. (lldb) image list [ 0] 0F7B4C4F-1289-34A1-9479-09451801E26F 0x0000000009ffc000 /System/Volumes/Data/Users/sesser/Desktop/DEEPDIVE26/DATA/kernel /System/Volumes/Data/Users/sesser/Desktop/DEEPDIVE26/DATA/kernel.dSYM/Contents/Resources/DWARF/kernel (lldb) c Process 1 resuming
The gdbstub automatically informs LLDB that we are looking into the cL4 kernel and this allows LLDB to lookup the DWARF debugging information for this binary.
After letting the code run for a while we get an exception.
Process 1 stopped * thread #1, stop reason = signal SIGTRAP frame #0: 0x003038383a632e6b error: memory read failed for 0x3038383a632e00 Target 0: (kernel) stopped. (lldb) re re esr_el1 far_el1 pc spsr_el1 esr_el1 = 0x000000008a000000 far_el1 = 0x003038383a632e6b pc = 0x003038383a632e6b spsr_el1 = 0x0000000020400bc5
At this point we can decode the ESR_EL1 value to understand what triggered the exception but from the looks of it it is already pretty clear that the program counter is not what it is supposed to be. Therefore it is no surprise that ESR_EL1 decodes into ESR_EC_PC_ALIGNMENT_FAULT. This means whatever the Exclave user land code did somehow changed the program counter of the Secure Kernel. Let's investigate how that was possible.
Dumping Process Info
Before we go any further we can query the xrt_process_info structures that are reachable via TPIDR_EL0 to see if we are really triggered by our code patch.
(lldb) oreg_dump_process_info xrt_process @ 0x0000000e214a26b0 images = 1 threads = 5 ASID = 0xffffffffffffffca addrspace = "schedulerAddrSpace" === Images === xrt_image @ 0x0000000e21630930 main flags = 0x0000000000000081 uuid = 8377dd49-f896-38c6-8afb-f1ab7e1a4456 aslr_slide = 0x0000000e183d4000 load_addr = 0x0000000e203d0000 length = 0x1100000 name = "sharedcache" === Threads === xrt_thread @ 0x00000003fff97b78 name4 = "H000" tid = 0x5 (5) xrt_thread @ 0x00000003fffffb78 name4 = "expt" tid = 0x4 (4) xrt_thread @ 0x0000000e214f3b78 name4 = "entr" tid = 0x3 (3) xrt_thread @ 0x0000000e214e3b78 name4 = "CPU0" tid = 0x2 (2) xrt_thread @ 0x0000000e21623b78 name4 = "init" tid = 0x1 (1) === Components === xrt_component @ 0x0000000e21440900 tag4 = "schd" (0x64686373) long_name = "scheduler" xrt_component @ 0x0000000e214408c8 tag4 = "shda" (0x61646873) long_name = "native-scheduler.allocator" xrt_component @ 0x0000000e214408a0 tag4 = "shds" (0x73646873) long_name = "native-scheduler.scheduler"
This output clarifies that the problematic code happened within the schedulerAddrSpace which is no surprise to us because that is where we applied the patch.
Crash Triage
Back to the crash. Let’s dump the registers to see where the program counter ended up and what controlled it.
(lldb) find-runtime-base --from-lr [find-runtime-base] Kind: bundle [find-runtime-base] Found base: 0xffffff8000000000 [find-runtime-base] UUID: 0F7B4C4F-1289-34A1-9479-09451801E26F [find-runtime-base] Module file base: 0xffffff8000000000 [find-runtime-base] Slide (signed): 0 (0x0 as u64) [find-runtime-base] ✅ applied module load addresses (lldb) re re general: x0 = 0x0000000e21636180 x1 = 0x0000000000000000 x2 = 0x0000000000000000 x3 = 0x0000000e214e3e00 x4 = 0x0000000000000002 x5 = 0x0000000a76605d20 x6 = 0x0000000000001167 x7 = 0x0000000e216347c0 x8 = 0x003038383a632e6b x9 = 0x0000000000000058 x10 = 0xffffff8000012690 ... x28 = 0x0000000e214e36f0 x29 = 0xfffffffffffb3ff0 x30 = 0xffffff80000084ec kernel`ExclaveSystemCallHandler + 484 sp = 0xfffffffffffb3fc0 pc = 0x003038383a632e6b cpsr = 0x20400bc5
From this we learn that our return address points into the ExclaveSystemCallHandler of the Secure Kernel. And x8 is exactly the same as our wrong program counter.
So lets have a look at what happens at the code before our return address.
(lldb) x/15i $lr-0x30 0xffffff80000084bc: cbnz x9, 0xffffff8000008728 ; <+1056> 0xffffff80000084c0: mov w8, #0x4 ; =4 0xffffff80000084c4: b 0xffffff8000008764 ; <+1116> 0xffffff80000084c8: ldr x8, [x0] 0xffffff80000084cc: lsr x8, x8, #58 0xffffff80000084d0: mov w9, #0x58 ; =88 0xffffff80000084d4: adrp x10, 10 0xffffff80000084d8: add x10, x10, #0x690 0xffffff80000084dc: umaddl x8, w8, w9, x10 0xffffff80000084e0: ldr x8, [x8, #0x28] 0xffffff80000084e4: mov x1, x19 0xffffff80000084e8: blr x8 <----------------------------- XXXXX 0xffffff80000084ec: b 0xffffff8000008780 ; <+1144> 0xffffff80000084f0: and x8, x19, #0xfffffffc0 0xffffff80000084f4: mov x18, x8
I want to highlight two things right now before we continue tracing the crash back to its origin. First of all cL4 is loaded to the address 0xffffff8000000000 which means there is no randomization of the load address of the Secure Kernel happening. Secondly we crashed on a plain blr x8 opcode. There is no PAC protection of pointers anywhere nearby.
When we look in more detail what is happening under the hood the code above decompiles to.
value_from_memory = MEMORY[X0]; func_ptr = function_table + 0x58 * (value_from_memory >> 58) + 0x28; func_ptr();
This clearly reads inside a function table using the upper 6 bits of a value from memory as index. The index is completely trusted. At this point we can simply load the code into a disassembler/decompiler like Binary Ninja to simplify understanding what is going on.
ffffff8000008308 void ExclaveSystemCallHandler(void* param1, void* param2, void* syscallnumber) ffffff8000008308 { ffffff8000008308 int32_t syscall = (uint32_t)syscallnumber; ffffff800000832c void* entry_a4; ffffff800000832c void* entry_a5; ffffff800000832c void* entry_a6; ffffff800000832c void* entry_a7; ffffff800000832c void* entry_a8; ffffff800000832c sub_FFFFFF800000ADEC(param1, param2, syscallnumber, entry_a4, entry_a5, ffffff800000832c entry_a6, entry_a7, entry_a8); ffffff8000008334 void* const x0; ffffff8000008334 ffffff8000008334 if (syscall > 2) ffffff8000008334 { ... ffffff8000008334 } ffffff8000008334 else if (!syscall) ffffff8000008338 { ffffff80000083d8 int64_t* x0_1 = param1 & 0xfffffffc0; ffffff80000083d8 ffffff80000083f0 if (x0_1) ffffff80000084e8 (*(uint64_t*)((uint32_t)(*(uint64_t*)x0_1 >> 58) * 0x58 + ffffff80000084e8 &function_table[0].invoke))(x0_1, param2); ffffff80000083f0 else ffffff800000844c *(uint64_t*)*(uint64_t*)_ReadMSR(tpidr_gl1) = 4; ffffff8000008338 }
And that code is pretty easy to understand. When system call 0 is triggered the first parameter is used as a pointer to some memory. The value at that address is then trusted and used as an index into the function table. That means all an attacker has todo is call SVC 0 with a malicious address.
Security implications
This small experiment demonstrates two things.
First, SK appears to trust Exclave userland to not supply it with malicious input in this code path. Second, SK does not appear to defend itself with common hardening techniques you would normally expect in a kernel at this privilege level, such as base address randomization and protecting function pointers (for example with PAC). We are aware that return addresses are protected with PAC when placed on the stack in many places.
Considering how the Exclave world implements privilege separation with many isolated address spaces, each of which makes heavy use of image load address randomization, PAC, and even MTE, it seems like a vital oversight to not harden the Secure Kernel against abuse originating from Exclave code.
Is this exploitable?
Obviously the burning question is if this is exploitable. At the moment of writing this we believe it is not exploitable because of the layout of the cL4 kernel binary and the limitation on how far behind the function table we can reach.
So what is behind our function table?
00012690 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000126b0 00 00 00 00 00 00 00 00 fc e8 00 00 80 ff ff ff |................| 000126c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00012710 fc e8 00 00 80 ff ff ff 00 00 00 00 00 00 00 00 |................| 00012720 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00012760 00 00 00 00 00 00 00 00 00 24 01 00 80 ff ff ff |.........$......| 00012770 00 00 00 00 00 00 00 00 18 24 01 00 80 ff ff ff |.........$......| ... 00012f50 60 57 00 00 80 ff ff ff dc 5b 00 00 80 ff ff ff |`W.......[......| 00012f60 30 5c 00 00 80 ff ff ff 30 5c 00 00 80 ff ff ff |0\......0\......| 00012f70 78 5c 00 00 80 ff ff ff ac 5c 00 00 80 ff ff ff |x\.......\......| 00012f80 53 4b 20 70 61 6e 69 63 20 72 65 71 75 65 73 74 |SK panic request| <--- past end 00012f90 3a 20 00 53 4b 20 75 73 65 72 20 70 61 6e 69 63 |: .SK user panic| 00012fa0 20 72 65 71 75 65 73 74 3a 20 00 61 72 6d 2d 69 | request: .arm-i| 00012fb0 6f 00 72 61 6e 67 65 73 00 63 6f 6d 70 61 74 69 |o.ranges.compati| 00012fc0 62 6c 65 00 64 61 72 74 2c 74 38 31 31 30 00 65 |ble.dart,t8110.e| 00012fd0 78 63 6c 61 76 65 2d 73 69 64 00 64 61 72 74 2d |xclave-sid.dart-| 00012fe0 69 64 00 64 65 76 69 63 65 74 72 65 65 2e 63 3a |id.devicetree.c:| 00012ff0 36 39 00 64 61 72 74 2d 6f 70 74 69 6f 6e 73 00 |69.dart-options.| 00013000 64 65 76 69 63 65 74 72 65 65 2e 63 3a 37 39 00 |devicetree.c:79.| 00013010 72 65 61 6c 2d 74 69 6d 65 00 69 6e 73 74 61 6e |real-time.instan| 00013020 63 65 00 72 65 67 00 64 65 76 69 63 65 74 72 65 |ce.reg.devicetre| 00013030 65 2e 63 3a 39 37 00 64 65 76 69 63 65 74 72 65 |e.c:97.devicetre| 00013040 65 2e 63 3a 31 30 35 00 64 65 76 69 63 65 74 72 |e.c:105.devicetr| 00013050 65 65 2e 63 3a 31 31 32 00 64 65 76 69 63 65 74 |ee.c:112.devicet| 00013060 72 65 65 2e 63 3a 31 32 34 00 64 65 76 69 63 65 |ree.c:124.device| 00013070 74 72 65 65 2e 63 3a 37 35 00 64 65 66 61 75 6c |tree.c:75.defaul| 00013080 74 73 00 65 78 63 6c 61 76 65 2d 69 6f 2d 72 61 |ts.exclave-io-ra| 00013090 6e 67 65 73 00 64 65 76 69 63 65 74 72 65 65 2e |nges.devicetree.| ... 00013840 35 00 6f 62 6a 5f 65 63 2e 63 3a 31 37 35 37 00 |5.obj_ec.c:1757.| 00013850 6f 62 6a 5f 65 63 2e 63 3a 31 37 36 34 00 6f 62 |obj_ec.c:1764.ob| 00013860 6a 5f 65 63 2e 63 3a 31 37 36 38 00 6f 62 6a 5f |j_ec.c:1768.obj_| 00013870 65 63 2e 63 3a 31 37 37 30 00 6f 62 6a 5f 65 63 |ec.c:1770.obj_ec| 00013880 2e 63 3a 31 37 37 37 00 6f 62 6a 5f 65 63 2e 63 |.c:1777.obj_ec.c| 00013890 3a 31 37 37 39 00 6f 62 6a 5f 75 6e 74 79 70 65 |:1779.obj_untype| 000138a0 64 2e 63 3a 33 34 34 00 6f 62 6a 5f 75 6e 74 79 |d.c:344.obj_unty| 000138b0 70 65 64 2e 63 3a 33 35 37 00 6f 62 6a 5f 75 6e |ped.c:357.obj_un| 000138c0 74 79 70 65 64 2e 63 3a 33 35 39 00 6f 62 6a 5f |typed.c:359.obj_| 000138d0 75 6e 74 79 70 65 64 2e 63 3a 33 36 31 00 6f 62 |untyped.c:361.ob| 000138e0 6a 5f 75 6e 74 79 70 65 64 2e 63 3a 32 30 00 6f |j_untyped.c:20.o| 000138f0 62 6a 5f 75 6e 74 79 70 65 64 2e 63 3a 32 32 00 |bj_untyped.c:22.| 00013900 6f 62 6a 5f 75 6e 74 79 70 65 64 2e 63 3a 32 35 |obj_untyped.c:25| 00013910 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00013c30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00013c40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
From the memory dump above we can see that directly after the function table there is the string table and then zero padding until the end of the segment. There are no double NUL bytes between strings and all strings contain only printable characters. This means never ever should an 8 byte pointer read from that table result in a memory address that can be controlled by an attacker. This can be verified by dumping the page tables via the gdbstub and looking into the configuration.
=== ARM64 Page Table Walk (16KB granule) === TCR_EL1 = 0x010000037519b51c IPS: 42-bit PA (mask 0x000003ffffffffff) T0SZ=28 (VA bits 36), start level L2 T1SZ=25 (VA bits 39), start level L1 TTBR0_EL1 = 0x0002000011a58000 (ASID 0x0002, base PA 0x0000000011a58000) TTBR1_EL1 = 0x00c400001161c000 (ASID 0x00c4, base PA 0x000000001161c000) TTBR0 VA region: 0x0000000000000000 - 0x0000000fffffffff TTBR1 VA region: 0xffffff8000000000 - 0xffffffffffffffff
The configuration of T0SZ and T1SZ make it impossible for the string content to result in a valid pointer.
Conclusion
The purpose of this little experiment was to demonstrate two things. First, the Secure Kernel seems to trust Exclave userland to not supply it with malicious input. Second, it does not protect itself with basic hardening measures such as randomizing its own base address or protecting function pointers from the function table with PAC.
In this specific case the problem does not appear exploitable, but it is the simplest test case we could think of to illustrate the broader issue: a malicious Exclave app might try to subvert the kernel by feeding it carefully crafted input.
Stefan Esser