Transasm - x86 redundancy

A Python tool that transpiles x86 instructions into equivalent x86 instructions, exploiting machine code redundancy.

Demo Link to heading

 1/transasm$ poetry run transasm
 2> add eax, ebx
 3== input:
 4mnemonic:     add eax, ebx
 5bytes:        0x01 0xd8
 6prefix:       0x00 0x00 0x00 0x00
 7opcode:       0x01 0x00 0x00 0x00
 8rex:          0x00
 9modrm:        0xd8 (mod: 0b11) (reg: 0b011) (rm: 0b000)
10modrm offset: 0x01
11disp:         0x00
12sib:          0x00 (scale: 0b00) (index: 0b000) (base: 0b000) 
13
14== alternative:
15mnemonic:     add eax, ebx
16bytes:        0x03 0xc3
17prefix:       0x00 0x00 0x00 0x00
18opcode:       0x03 0x00 0x00 0x00
19rex:          0x00
20modrm:        0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011)
21modrm offset: 0x01
22disp:         0x00
23sib:          0x00 (scale: 0b00) (index: 0b000) (base: 0b000) 
24
25>

Info

The demo mode uses keystone and capstone as in my ASMShell project.

Tests Link to heading

The following is an extract of the unit tests of transasm.

It shows some of the transformations the tool is able to provide. Take a look at the test_prime_x86_64_program and test_primes_x86_program tests.

 1/transasm$ poetry run test
 2test_prime_x86_64_program (transasm.tests.func.test_primes_program.TestPrimesProgram) ...
 3<< 31 d2 -> xor edx, edx
 4>> 33 d2 -> xor edx, edx
 5<< 31 c9 -> xor ecx, ecx
 6>> 33 c9 -> xor ecx, ecx
 7<< 83 f9 02 -> cmp ecx, 2
 8>> 81 f9 02 00 00 00 -> cmp ecx, 2
 9<< 89 c8 -> mov eax, ecx
10>> 8b c1 -> mov eax, ecx
11<< 31 db -> xor ebx, ebx
12>> 33 db -> xor ebx, ebx
13<< 83 e9 01 -> sub ecx, 1
14>> 81 e9 01 00 00 00 -> sub ecx, 1
15<< 83 f9 01 -> cmp ecx, 1
16>> 81 f9 01 00 00 00 -> cmp ecx, 1
17<< 01 ca -> add edx, ecx
18>> 03 d1 -> add edx, ecx
19<< 48 89 d6 -> mov rsi, rdx
20>> 48 8b f2 -> mov rsi, rdx
21<< 48 31 ff -> xor rdi, rdi
22>> 48 33 ff -> xor rdi, rdi
23ok
24test_primes_x86_program (transasm.tests.func.test_primes_program.TestPrimesProgram) ...
25<< 31 d2 -> xor edx, edx
26>> 33 d2 -> xor edx, edx
27<< 31 c9 -> xor ecx, ecx
28>> 33 c9 -> xor ecx, ecx
29<< 83 f9 02 -> cmp ecx, 2
30>> 81 f9 02 00 00 00 -> cmp ecx, 2
31<< 89 c8 -> mov eax, ecx
32>> 8b c1 -> mov eax, ecx
33<< 31 db -> xor ebx, ebx
34>> 33 db -> xor ebx, ebx
35<< 83 e9 01 -> sub ecx, 1
36>> 81 e9 01 00 00 00 -> sub ecx, 1
37<< 83 f9 01 -> cmp ecx, 1
38>> 81 f9 01 00 00 00 -> cmp ecx, 1
39<< 01 ca -> add edx, ecx
40>> 03 d1 -> add edx, ecx
41<< 66 83 c4 08 -> add sp, 8
42>> 66 81 c4 08 00 -> add sp, 8
43ok
44test_try_transform_acc_with_imm (transasm.tests.unit.test_transform.TestUtils) ... ok
45test_try_transform_duplicate_opcode_extensions (transasm.tests.unit.test_transform.TestUtils) ... ok
46test_try_transform_duplicate_x86_opcodes (transasm.tests.unit.test_transform.TestUtils) ... ok
47test_try_transform_gv_ev_instruction (transasm.tests.unit.test_transform.TestUtils) ... ok
48test_try_transform_gv_ev_instruction_using_displ (transasm.tests.unit.test_transform.TestUtils) ... ok
49test_try_transform_imm_operand_size (transasm.tests.unit.test_transform.TestUtils) ... ok
50test_try_transform_x86_64_using_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
51test_try_transform_x86_using_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
52test_try_transform_zero_scale_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
53test_try_transform_zeroing (transasm.tests.unit.test_transform.TestUtils) ... ok
54test_modrm_type (transasm.tests.unit.test_types.TestTypes) ... ok
55test_sib_type (transasm.tests.unit.test_types.TestTypes) ... ok
56test_clear_bit (transasm.tests.unit.test_utils.TestUtils) ... ok
57test_get_ev_gv_equivalent_opcode_for_reg_ops (transasm.tests.unit.test_utils.TestUtils) ... ok
58test_get_x86_64_instruction (transasm.tests.unit.test_utils.TestUtils) ... ok
59test_get_x86_instruction (transasm.tests.unit.test_utils.TestUtils) ... ok
60test_has_ev_gv_equivalent_opcode_for_reg_ops (transasm.tests.unit.test_utils.TestUtils) ... ok
61test_has_register_operands (transasm.tests.unit.test_utils.TestUtils) ... ok
62test_has_rex_prefix (transasm.tests.unit.test_utils.TestUtils) ... ok
63test_is_bit_set (transasm.tests.unit.test_utils.TestUtils) ... ok
64test_set_bit (transasm.tests.unit.test_utils.TestUtils) ... ok
65test_swap_base_index_in_sib (transasm.tests.unit.test_utils.TestUtils) ... ok
66test_swap_reg_rm_in_modrm (transasm.tests.unit.test_utils.TestUtils) ... ok
67test_yield_x86_64_instructions (transasm.tests.unit.test_utils.TestUtils) ... ok
68
69----------------------------------------------------------------------
70Ran 26 tests in 0.067s
71
72OK

Examples Link to heading

Example with register operands and the ModR/M byte Link to heading

Some x86 instructions have two opcodes so we can write the following two forms:

instructions	bytes	opcode reference	opcode table	ModR/M
`add dword ptr [rcx], eax`	`01 01`	`reg/mem32, reg`	`Ev, Gv`	0x01 (mod: 0b00) (reg: 0b000) (rm: 0b001)
`add eax, dword ptr [rcx]`	`03 01`	`reg, reg/mem32`	`Gv, Ev`	0x01 (mod: 0b00) (reg: 0b000) (rm: 0b001)

When both operands are registers, the opcode is redundant. To obtain the same instruction with a different opcode, we need to rewrite the ModR/M part and especially, invert the reg and the rm parts.

instructions	bytes	opcode reference	opcode table	ModR/M
`add eax, ebx`	`01 d8`	`reg/mem32, reg`	`Ev, Gv`	0xd8 (mod: 0b11) (reg: 0b011) (rm: 0b000)
`add eax, ebx`	`03 c3`	`reg, reg/mem32`	`Gv, Ev`	0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011)

Example with register operands and the SIB byte Link to heading

If the scale factor is 1 (sib.scale == 0b00), then it is possible to swap the base and the index register:

instructions	bytes	SIB
`mov rax, qword ptr [rbx + rcx]`	`48 8b 04 0b`	0x0b (scale: 0b00) (index: 0b001, rcx) (base: 0b011, rbx)
`mov rax, qword ptr [rcx + rbx]`	`49 8b 04 19`	0x19 (scale: 0b00) (index: 0b011, rbx) (base: 0b001, rcx)
`mov byte ptr [eax + ebx], 5`	`67 c6 04 18 05`	0x18 (scale: 0b00) (index: 0b011, ebx) (base: 0b000, eax)
`mov byte ptr [ebx + eax], 5`	`67 c6 04 03 05`	0x03 (scale: 0b00) (index: 0b000, eax) (base: 0b011, ebx)

Note that this manipulation alters the literal representation of the assembly.

Example with duplicate opcode extensions Link to heading

Some instructions have two opcode extensions like the TEST extension when used with a immediate operand:

With Group 3 Eb (0xf6):

instructions	bytes	ModR/M reg
`test bl, 0x10`	`f6 c3 10`	`000`
`test bl, 0x10`	`f6 cb 10`	`001`

With Group 3 Ev (0xf7):

instructions	bytes	ModR/M reg
`test ebx, 0xaabbccdd`	`f7 c3 dd cc bb aa`	`000`
`test ebx, 0xaabbccdd`	`f7 cb dd cc bb aa`	`001`

Example with duplicate opcode on x86 Link to heading

Group 1 have have duplicated opcodes for 0x80 and 0x82 on x86:

instructions	bytes
`add byte ptr [eax], 0x10`	`80 00 10`
`add byte ptr [eax], 0x10`	`82 00 10`

ModR/M reg	Instruction
`000`	ADD
`001`	OR
`010`	ADC
`011`	SBB
`100`	AND
`101`	SUB
`110`	XOR
`111`	CMP

Example with variable immediate operand sizes Link to heading

instructions	bytes
`add eax, 0x10`	`83 c0 10`
`add eax, 0x10`	`81 c0 10 00 00 00`
`add rax, 0x10`	`48 83 c0 10`
`add rax, 0x10`	`48 81 c0 10 00 00 00`

Under some constraints, we can also use the Eb, Ib versions with opcodes 80 and 82.

Example with opcodes targeting the accumulator register Link to heading

There is also a special case with the accumulator register where we can use the x4 and x5 opcodes:

instructions	bytes	ModR/M reg
`add al, 0x10`	`04 10`	`000`
`add al, 0x10`	`80 c0 10`	`000`
`add eax, 0x10`	`05 10 00 00 00`	`000`
`add eax, 0x10`	`81 c0 10 00 00 00`	`000`
`add eax, 0x10`	`83 c0 10`	`000`
`add rax, 0x10`	`48 05 10 00 00 00`	`000`
`add rax, 0x10`	`48 81 c0 10 00 00 00`	`000`
`add rax, 0x10`	`48 83 c0 10`	`000`

This works with the following common instructions: and, or, adc, sbb, sub, xor, cmp. See the adc equivalences below:

instructions	bytes	ModR/M reg
`adc al, 0x10`	`14 10`	`000`
`adc al, 0x10`	`80 d0 10`	`010`
`adc eax, 0x10`	`15 10 00 00 00`	`000`
`adc eax, 0x10`	`81 d0 10 00 00 00`	`010`
`adc eax, 0x10`	`83 d0 10`	`010`
`adc rax, 0x10`	`48 15 10 00 00 00`	`000`
`adc rax, 0x10`	`48 81 d0 10 00 00 00`	`010`
`adc rax, 0x10`	`48 83 d0 10`	`010`

Example with zero displacement Link to heading

When used with reg/reg operands, displacement size depends on the ModR/M mod part:

instructions	bytes	ModR/M
`add dword ptr [eax], eax`	`67 01 00`	0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
`add dword ptr [eax + 00], eax`	`67 01 40 00`	0x40 (mod: 0b01) (reg: 0b000) (rm: 0b000)
`add dword ptr [eax + 00000000], eax`	`67 01 80 00 00 00 00`	0x40 (mod: 0b10) (reg: 0b000) (rm: 0b000)
`add qword ptr [rax], rax`	`48 01 00`	0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
`add qword ptr [rax + 00], rax`	`48 01 40 00`	0x00 (mod: 0b01) (reg: 0b000) (rm: 0b000)
`add qword ptr [rax + 00000000], rax`	`48 01 80 00 00 00 00`	0x40 (mod: 0b10) (reg: 0b000) (rm: 0b000)

Example with the SIB byte Link to heading

The SIB byte has a corner case when it comes to the index and base parts. Index and base registers may be not encoded (e.g. direct addressing encoding). Depending on the SIB presence and the SIB.scale, we can craft 5 different but equivalent encodings for a single instruction:

In 32-bit mode:

instructions	bytes	SIB
`mov byte ptr [0xaabbccdd], 0xff`	`c6 05 dd cc bb aa ff`
`mov byte ptr [0xaabbccdd], 0xff`	`c6 04 25 dd cc bb aa ff`	0x25 (scale: 0b00) (index: 0b100) (base: 0b101)
`mov byte ptr [0xaabbccdd], 0xff`	`c6 04 65 dd cc bb aa ff`	0x65 (scale: 0b01) (index: 0b100) (base: 0b101)
`mov byte ptr [0xaabbccdd], 0xff`	`c6 04 a5 dd cc bb aa ff`	0xa5 (scale: 0b10) (index: 0b100) (base: 0b101)
`mov byte ptr [0xaabbccdd], 0xff`	`c6 04 e5 dd cc bb aa ff`	0xe5 (scale: 0b11) (index: 0b100) (base: 0b101)
———————————–	—————————	—————————————————
`mov byte ptr [esp - 0x56], 0xff`	`c6 45 aa ff`
`mov byte ptr [esp - 0x56], 0xff`	`c6 44 24 aa ff`	0x24 (scale: 0b00) (index: 0b100) (base: 0b100)
`mov byte ptr [esp - 0x56], 0xff`	`c6 44 64 aa ff`	0x64 (scale: 0b01) (index: 0b100) (base: 0b100)
`mov byte ptr [esp - 0x56], 0xff`	`c6 44 a4 aa ff`	0xa4 (scale: 0b10) (index: 0b100) (base: 0b100)
`mov byte ptr [esp - 0x56], 0xff`	`c6 44 e4 aa ff`	0xe4 (scale: 0b11) (index: 0b100) (base: 0b100)
———————————–	—————————	—————————————————
`mov byte ptr [ebp + 0x56], 0xff`	`c6 45 56 ff`
`mov byte ptr [ebp + 0x56], 0xff`	`c6 44 25 56 ff`	0x25 (scale: 0b00) (index: 0b100) (base: 0b101)
`mov byte ptr [ebp + 0x56], 0xff`	`c6 44 65 56 ff`	0x65 (scale: 0b01) (index: 0b100) (base: 0b101)
`mov byte ptr [ebp + 0x56], 0xff`	`c6 44 a5 56 ff`	0xa5 (scale: 0b10) (index: 0b100) (base: 0b101)
`mov byte ptr [ebp + 0x56], 0xff`	`c6 44 e5 56 ff`	0xe5 (scale: 0b11) (index: 0b100) (base: 0b101)

In 64-bit mode:

instructions	bytes	SIB
`mov byte ptr [rsp - 0x56], 0xff`	`c6 45 aa ff`
`mov byte ptr [rsp - 0x56], 0xff`	`c6 44 24 aa ff`	0x24 (scale: 0b00) (index: 0b100) (base: 0b100)
…	…	…
———————————–	—————————	—————————————————
`mov byte ptr [esp - 0x56], 0xff`	`67 c6 45 aa ff`
`mov byte ptr [esp - 0x56], 0xff`	`67 c6 44 24 aa ff`	0x24 (scale: 0b00) (index: 0b100) (base: 0b100)
…	…	…

Example with legacy prefixes Link to heading

In 32-bit mode, we can omit some legacy prefixes:

instructions	bytes
`add qword ptr [eax], eax`	`01 00`
`add qword ptr [eax], eax`	`67 01 00`

Some instructions might accept one or more prefixes:

instructions	bytes
`nop`	`90`
`nop`	`66 90`
`nop`	`66 67 90`
`nop`	`66 66 67 90`

Logic transformation Link to heading

Zeroing registers:

instructions
`mov eax, 0x0`
`xor eax, eax`
`sub eax, eax`

instructions	code	ModR/M
`xor bx, bx`	`66 31 db`	0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
`xor ebx, ebx`	`31 db`	0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
`xor rbx, rbx`	`48 31 db`	0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
`sub bx, bx`	`66 29 db`	0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
`sub ebx, ebx`	`29 db`	0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
`sub rbx, rbx`	`48 29 db`	0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
`mov bx, 0`	`66 bb 00 00`	0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
`mov eax, 0`	`b8 00 00 00 00`	0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
`mov ebx, 0`	`bb 00 00 00 00`	0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
`mov rax, 0`	`48 c7 c0 00 00 00 00`	0xc0 (mod: 0b11) (reg: 0b000) (rm: 0b000)
`mov rbx, 0`	`48 c7 c3 00 00 00 00`	0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011)

To switch between the xor and the sub, we have to switch opcodes. The switch between the xor and the mov is not supported yet.

Going further Link to heading

You can use these techniques to build more cool stuff:

Obfuscation / diversification pre/post compilation (think cmake module, lief dissecting, llvm pass, etc)
Steganography (take a look at Hydan)
On the fly payload/shellcode polymorphism (within your favourite engine)

Download Link to heading

Get a copy at github.com/valkheim/transasm.