Transasm - x86 redundancy
A Python tool that transpiles x86 instructions into equivalent x86 instructions, exploiting machine code redundancy.
Demo Link to heading
1/transasm$ poetry run transasm
2> add eax, ebx
3== input:
4mnemonic: add eax, ebx
5bytes: 0x01 0xd8
6prefix: 0x00 0x00 0x00 0x00
7opcode: 0x01 0x00 0x00 0x00
8rex: 0x00
9modrm: 0xd8 (mod: 0b11) (reg: 0b011) (rm: 0b000)
10modrm offset: 0x01
11disp: 0x00
12sib: 0x00 (scale: 0b00) (index: 0b000) (base: 0b000)
13
14== alternative:
15mnemonic: add eax, ebx
16bytes: 0x03 0xc3
17prefix: 0x00 0x00 0x00 0x00
18opcode: 0x03 0x00 0x00 0x00
19rex: 0x00
20modrm: 0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011)
21modrm offset: 0x01
22disp: 0x00
23sib: 0x00 (scale: 0b00) (index: 0b000) (base: 0b000)
24
25>
Tests Link to heading
The following is an extract of the unit tests of transasm.
It shows some of the transformations the tool is able to provide. Take a look at the test_prime_x86_64_program
and test_primes_x86_program
tests.
1/transasm$ poetry run test
2test_prime_x86_64_program (transasm.tests.func.test_primes_program.TestPrimesProgram) ...
3<< 31 d2 -> xor edx, edx
4>> 33 d2 -> xor edx, edx
5<< 31 c9 -> xor ecx, ecx
6>> 33 c9 -> xor ecx, ecx
7<< 83 f9 02 -> cmp ecx, 2
8>> 81 f9 02 00 00 00 -> cmp ecx, 2
9<< 89 c8 -> mov eax, ecx
10>> 8b c1 -> mov eax, ecx
11<< 31 db -> xor ebx, ebx
12>> 33 db -> xor ebx, ebx
13<< 83 e9 01 -> sub ecx, 1
14>> 81 e9 01 00 00 00 -> sub ecx, 1
15<< 83 f9 01 -> cmp ecx, 1
16>> 81 f9 01 00 00 00 -> cmp ecx, 1
17<< 01 ca -> add edx, ecx
18>> 03 d1 -> add edx, ecx
19<< 48 89 d6 -> mov rsi, rdx
20>> 48 8b f2 -> mov rsi, rdx
21<< 48 31 ff -> xor rdi, rdi
22>> 48 33 ff -> xor rdi, rdi
23ok
24test_primes_x86_program (transasm.tests.func.test_primes_program.TestPrimesProgram) ...
25<< 31 d2 -> xor edx, edx
26>> 33 d2 -> xor edx, edx
27<< 31 c9 -> xor ecx, ecx
28>> 33 c9 -> xor ecx, ecx
29<< 83 f9 02 -> cmp ecx, 2
30>> 81 f9 02 00 00 00 -> cmp ecx, 2
31<< 89 c8 -> mov eax, ecx
32>> 8b c1 -> mov eax, ecx
33<< 31 db -> xor ebx, ebx
34>> 33 db -> xor ebx, ebx
35<< 83 e9 01 -> sub ecx, 1
36>> 81 e9 01 00 00 00 -> sub ecx, 1
37<< 83 f9 01 -> cmp ecx, 1
38>> 81 f9 01 00 00 00 -> cmp ecx, 1
39<< 01 ca -> add edx, ecx
40>> 03 d1 -> add edx, ecx
41<< 66 83 c4 08 -> add sp, 8
42>> 66 81 c4 08 00 -> add sp, 8
43ok
44test_try_transform_acc_with_imm (transasm.tests.unit.test_transform.TestUtils) ... ok
45test_try_transform_duplicate_opcode_extensions (transasm.tests.unit.test_transform.TestUtils) ... ok
46test_try_transform_duplicate_x86_opcodes (transasm.tests.unit.test_transform.TestUtils) ... ok
47test_try_transform_gv_ev_instruction (transasm.tests.unit.test_transform.TestUtils) ... ok
48test_try_transform_gv_ev_instruction_using_displ (transasm.tests.unit.test_transform.TestUtils) ... ok
49test_try_transform_imm_operand_size (transasm.tests.unit.test_transform.TestUtils) ... ok
50test_try_transform_x86_64_using_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
51test_try_transform_x86_using_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
52test_try_transform_zero_scale_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
53test_try_transform_zeroing (transasm.tests.unit.test_transform.TestUtils) ... ok
54test_modrm_type (transasm.tests.unit.test_types.TestTypes) ... ok
55test_sib_type (transasm.tests.unit.test_types.TestTypes) ... ok
56test_clear_bit (transasm.tests.unit.test_utils.TestUtils) ... ok
57test_get_ev_gv_equivalent_opcode_for_reg_ops (transasm.tests.unit.test_utils.TestUtils) ... ok
58test_get_x86_64_instruction (transasm.tests.unit.test_utils.TestUtils) ... ok
59test_get_x86_instruction (transasm.tests.unit.test_utils.TestUtils) ... ok
60test_has_ev_gv_equivalent_opcode_for_reg_ops (transasm.tests.unit.test_utils.TestUtils) ... ok
61test_has_register_operands (transasm.tests.unit.test_utils.TestUtils) ... ok
62test_has_rex_prefix (transasm.tests.unit.test_utils.TestUtils) ... ok
63test_is_bit_set (transasm.tests.unit.test_utils.TestUtils) ... ok
64test_set_bit (transasm.tests.unit.test_utils.TestUtils) ... ok
65test_swap_base_index_in_sib (transasm.tests.unit.test_utils.TestUtils) ... ok
66test_swap_reg_rm_in_modrm (transasm.tests.unit.test_utils.TestUtils) ... ok
67test_yield_x86_64_instructions (transasm.tests.unit.test_utils.TestUtils) ... ok
68
69----------------------------------------------------------------------
70Ran 26 tests in 0.067s
71
72OK
Examples Link to heading
Example with register operands and the ModR/M byte Link to heading
Some x86 instructions have two opcodes so we can write the following two forms:
instructions | bytes | opcode reference | opcode table | ModR/M |
---|---|---|---|---|
add dword ptr [rcx], eax |
01 01 |
reg/mem32, reg |
Ev, Gv |
0x01 (mod: 0b00) (reg: 0b000) (rm: 0b001) |
add eax, dword ptr [rcx] |
03 01 |
reg, reg/mem32 |
Gv, Ev |
0x01 (mod: 0b00) (reg: 0b000) (rm: 0b001) |
When both operands are registers, the opcode is redundant. To obtain the same instruction with a different opcode, we need to rewrite the ModR/M part and especially, invert the reg and the rm parts.
instructions | bytes | opcode reference | opcode table | ModR/M |
---|---|---|---|---|
add eax, ebx |
01 d8 |
reg/mem32, reg |
Ev, Gv |
0xd8 (mod: 0b11) (reg: 0b011) (rm: 0b000) |
add eax, ebx |
03 c3 |
reg, reg/mem32 |
Gv, Ev |
0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011) |
Example with register operands and the SIB byte Link to heading
If the scale factor is 1 (sib.scale == 0b00), then it is possible to swap the base and the index register:
instructions | bytes | SIB |
---|---|---|
mov rax, qword ptr [rbx + rcx] |
48 8b 04 0b |
0x0b (scale: 0b00) (index: 0b001, rcx) (base: 0b011, rbx) |
mov rax, qword ptr [rcx + rbx] |
49 8b 04 19 |
0x19 (scale: 0b00) (index: 0b011, rbx) (base: 0b001, rcx) |
mov byte ptr [eax + ebx], 5 |
67 c6 04 18 05 |
0x18 (scale: 0b00) (index: 0b011, ebx) (base: 0b000, eax) |
mov byte ptr [ebx + eax], 5 |
67 c6 04 03 05 |
0x03 (scale: 0b00) (index: 0b000, eax) (base: 0b011, ebx) |
Note that this manipulation alters the literal representation of the assembly.
Example with duplicate opcode extensions Link to heading
Some instructions have two opcode extensions like the TEST extension when used with a immediate operand:
With Group 3 Eb (0xf6
):
instructions | bytes | ModR/M reg |
---|---|---|
test bl, 0x10 |
f6 c3 10 |
000 |
test bl, 0x10 |
f6 cb 10 |
001 |
With Group 3 Ev (0xf7
):
instructions | bytes | ModR/M reg |
---|---|---|
test ebx, 0xaabbccdd |
f7 c3 dd cc bb aa |
000 |
test ebx, 0xaabbccdd |
f7 cb dd cc bb aa |
001 |
Example with duplicate opcode on x86 Link to heading
Group 1 have have duplicated opcodes for 0x80
and 0x82
on x86:
instructions | bytes |
---|---|
add byte ptr [eax], 0x10 |
80 00 10 |
add byte ptr [eax], 0x10 |
82 00 10 |
ModR/M reg | Instruction |
---|---|
000 |
ADD |
001 |
OR |
010 |
ADC |
011 |
SBB |
100 |
AND |
101 |
SUB |
110 |
XOR |
111 |
CMP |
Example with variable immediate operand sizes Link to heading
instructions | bytes |
---|---|
add eax, 0x10 |
83 c0 10 |
add eax, 0x10 |
81 c0 10 00 00 00 |
add rax, 0x10 |
48 83 c0 10 |
add rax, 0x10 |
48 81 c0 10 00 00 00 |
Under some constraints, we can also use the Eb, Ib
versions with opcodes 80
and 82
.
Example with opcodes targeting the accumulator register Link to heading
There is also a special case with the accumulator register where we can use the x4
and x5
opcodes:
instructions | bytes | ModR/M reg |
---|---|---|
add al, 0x10 |
04 10 |
000 |
add al, 0x10 |
80 c0 10 |
000 |
add eax, 0x10 |
05 10 00 00 00 |
000 |
add eax, 0x10 |
81 c0 10 00 00 00 |
000 |
add eax, 0x10 |
83 c0 10 |
000 |
add rax, 0x10 |
48 05 10 00 00 00 |
000 |
add rax, 0x10 |
48 81 c0 10 00 00 00 |
000 |
add rax, 0x10 |
48 83 c0 10 |
000 |
This works with the following common instructions: and
, or
, adc
, sbb
, sub
, xor
, cmp
. See the adc
equivalences below:
instructions | bytes | ModR/M reg |
---|---|---|
adc al, 0x10 |
14 10 |
000 |
adc al, 0x10 |
80 d0 10 |
010 |
adc eax, 0x10 |
15 10 00 00 00 |
000 |
adc eax, 0x10 |
81 d0 10 00 00 00 |
010 |
adc eax, 0x10 |
83 d0 10 |
010 |
adc rax, 0x10 |
48 15 10 00 00 00 |
000 |
adc rax, 0x10 |
48 81 d0 10 00 00 00 |
010 |
adc rax, 0x10 |
48 83 d0 10 |
010 |
Example with zero displacement Link to heading
When used with reg/reg operands, displacement size depends on the ModR/M mod part:
instructions | bytes | ModR/M |
---|---|---|
add dword ptr [eax], eax |
67 01 00 |
0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000) |
add dword ptr [eax + 00], eax |
67 01 40 00 |
0x40 (mod: 0b01) (reg: 0b000) (rm: 0b000) |
add dword ptr [eax + 00000000], eax |
67 01 80 00 00 00 00 |
0x40 (mod: 0b10) (reg: 0b000) (rm: 0b000) |
add qword ptr [rax], rax |
48 01 00 |
0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000) |
add qword ptr [rax + 00], rax |
48 01 40 00 |
0x00 (mod: 0b01) (reg: 0b000) (rm: 0b000) |
add qword ptr [rax + 00000000], rax |
48 01 80 00 00 00 00 |
0x40 (mod: 0b10) (reg: 0b000) (rm: 0b000) |
Example with the SIB byte Link to heading
The SIB byte has a corner case when it comes to the index and base parts. Index and base registers may be not encoded (e.g. direct addressing encoding). Depending on the SIB presence and the SIB.scale, we can craft 5 different but equivalent encodings for a single instruction:
In 32-bit mode:
instructions | bytes | SIB |
---|---|---|
mov byte ptr [0xaabbccdd], 0xff |
c6 05 dd cc bb aa ff |
|
mov byte ptr [0xaabbccdd], 0xff |
c6 04 25 dd cc bb aa ff |
0x25 (scale: 0b00) (index: 0b100) (base: 0b101) |
mov byte ptr [0xaabbccdd], 0xff |
c6 04 65 dd cc bb aa ff |
0x65 (scale: 0b01) (index: 0b100) (base: 0b101) |
mov byte ptr [0xaabbccdd], 0xff |
c6 04 a5 dd cc bb aa ff |
0xa5 (scale: 0b10) (index: 0b100) (base: 0b101) |
mov byte ptr [0xaabbccdd], 0xff |
c6 04 e5 dd cc bb aa ff |
0xe5 (scale: 0b11) (index: 0b100) (base: 0b101) |
———————————– | ————————— | ————————————————— |
mov byte ptr [esp - 0x56], 0xff |
c6 45 aa ff |
|
mov byte ptr [esp - 0x56], 0xff |
c6 44 24 aa ff |
0x24 (scale: 0b00) (index: 0b100) (base: 0b100) |
mov byte ptr [esp - 0x56], 0xff |
c6 44 64 aa ff |
0x64 (scale: 0b01) (index: 0b100) (base: 0b100) |
mov byte ptr [esp - 0x56], 0xff |
c6 44 a4 aa ff |
0xa4 (scale: 0b10) (index: 0b100) (base: 0b100) |
mov byte ptr [esp - 0x56], 0xff |
c6 44 e4 aa ff |
0xe4 (scale: 0b11) (index: 0b100) (base: 0b100) |
———————————– | ————————— | ————————————————— |
mov byte ptr [ebp + 0x56], 0xff |
c6 45 56 ff |
|
mov byte ptr [ebp + 0x56], 0xff |
c6 44 25 56 ff |
0x25 (scale: 0b00) (index: 0b100) (base: 0b101) |
mov byte ptr [ebp + 0x56], 0xff |
c6 44 65 56 ff |
0x65 (scale: 0b01) (index: 0b100) (base: 0b101) |
mov byte ptr [ebp + 0x56], 0xff |
c6 44 a5 56 ff |
0xa5 (scale: 0b10) (index: 0b100) (base: 0b101) |
mov byte ptr [ebp + 0x56], 0xff |
c6 44 e5 56 ff |
0xe5 (scale: 0b11) (index: 0b100) (base: 0b101) |
In 64-bit mode:
instructions | bytes | SIB |
---|---|---|
mov byte ptr [rsp - 0x56], 0xff |
c6 45 aa ff |
|
mov byte ptr [rsp - 0x56], 0xff |
c6 44 24 aa ff |
0x24 (scale: 0b00) (index: 0b100) (base: 0b100) |
… | … | … |
———————————– | ————————— | ————————————————— |
mov byte ptr [esp - 0x56], 0xff |
67 c6 45 aa ff |
|
mov byte ptr [esp - 0x56], 0xff |
67 c6 44 24 aa ff |
0x24 (scale: 0b00) (index: 0b100) (base: 0b100) |
… | … | … |
Example with legacy prefixes Link to heading
In 32-bit mode, we can omit some legacy prefixes:
instructions | bytes |
---|---|
add qword ptr [eax], eax |
01 00 |
add qword ptr [eax], eax |
67 01 00 |
Some instructions might accept one or more prefixes:
instructions | bytes |
---|---|
nop |
90 |
nop |
66 90 |
nop |
66 67 90 |
nop |
66 66 67 90 |
Logic transformation Link to heading
Zeroing registers:
instructions |
---|
mov eax, 0x0 |
xor eax, eax |
sub eax, eax |
instructions | code | ModR/M |
---|---|---|
xor bx, bx |
66 31 db |
0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011) |
xor ebx, ebx |
31 db |
0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011) |
xor rbx, rbx |
48 31 db |
0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011) |
sub bx, bx |
66 29 db |
0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011) |
sub ebx, ebx |
29 db |
0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011) |
sub rbx, rbx |
48 29 db |
0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011) |
mov bx, 0 |
66 bb 00 00 |
0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000) |
mov eax, 0 |
b8 00 00 00 00 |
0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000) |
mov ebx, 0 |
bb 00 00 00 00 |
0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000) |
mov rax, 0 |
48 c7 c0 00 00 00 00 |
0xc0 (mod: 0b11) (reg: 0b000) (rm: 0b000) |
mov rbx, 0 |
48 c7 c3 00 00 00 00 |
0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011) |
To switch between the xor
and the sub
, we have to switch opcodes.
The switch between the xor
and the mov
is not supported yet.
Going further Link to heading
You can use these techniques to build more cool stuff:
- Obfuscation / diversification pre/post compilation (think cmake module, lief dissecting, llvm pass, etc)
- Steganography (take a look at Hydan)
- On the fly payload/shellcode polymorphism (within your favourite engine)
Download Link to heading
Get a copy at github.com/valkheim/transasm.