Transasm - x86 redundancy

A Python tool that transpiles x86 instructions into equivalent x86 instructions, exploiting machine code redundancy.

Demo Link to heading

 1/transasm$ poetry run transasm
 2> add eax, ebx
 3== input:
 4mnemonic:     add eax, ebx
 5bytes:        0x01 0xd8
 6prefix:       0x00 0x00 0x00 0x00
 7opcode:       0x01 0x00 0x00 0x00
 8rex:          0x00
 9modrm:        0xd8 (mod: 0b11) (reg: 0b011) (rm: 0b000)
10modrm offset: 0x01
11disp:         0x00
12sib:          0x00 (scale: 0b00) (index: 0b000) (base: 0b000) 
13
14== alternative:
15mnemonic:     add eax, ebx
16bytes:        0x03 0xc3
17prefix:       0x00 0x00 0x00 0x00
18opcode:       0x03 0x00 0x00 0x00
19rex:          0x00
20modrm:        0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011)
21modrm offset: 0x01
22disp:         0x00
23sib:          0x00 (scale: 0b00) (index: 0b000) (base: 0b000) 
24
25> 
Info
The demo mode uses keystone and capstone as in my ASMShell project.

Tests Link to heading

The following is an extract of the unit tests of transasm.

It shows some of the transformations the tool is able to provide. Take a look at the test_prime_x86_64_program and test_primes_x86_program tests.

 1/transasm$ poetry run test
 2test_prime_x86_64_program (transasm.tests.func.test_primes_program.TestPrimesProgram) ...
 3<< 31 d2 -> xor edx, edx
 4>> 33 d2 -> xor edx, edx
 5<< 31 c9 -> xor ecx, ecx
 6>> 33 c9 -> xor ecx, ecx
 7<< 83 f9 02 -> cmp ecx, 2
 8>> 81 f9 02 00 00 00 -> cmp ecx, 2
 9<< 89 c8 -> mov eax, ecx
10>> 8b c1 -> mov eax, ecx
11<< 31 db -> xor ebx, ebx
12>> 33 db -> xor ebx, ebx
13<< 83 e9 01 -> sub ecx, 1
14>> 81 e9 01 00 00 00 -> sub ecx, 1
15<< 83 f9 01 -> cmp ecx, 1
16>> 81 f9 01 00 00 00 -> cmp ecx, 1
17<< 01 ca -> add edx, ecx
18>> 03 d1 -> add edx, ecx
19<< 48 89 d6 -> mov rsi, rdx
20>> 48 8b f2 -> mov rsi, rdx
21<< 48 31 ff -> xor rdi, rdi
22>> 48 33 ff -> xor rdi, rdi
23ok
24test_primes_x86_program (transasm.tests.func.test_primes_program.TestPrimesProgram) ...
25<< 31 d2 -> xor edx, edx
26>> 33 d2 -> xor edx, edx
27<< 31 c9 -> xor ecx, ecx
28>> 33 c9 -> xor ecx, ecx
29<< 83 f9 02 -> cmp ecx, 2
30>> 81 f9 02 00 00 00 -> cmp ecx, 2
31<< 89 c8 -> mov eax, ecx
32>> 8b c1 -> mov eax, ecx
33<< 31 db -> xor ebx, ebx
34>> 33 db -> xor ebx, ebx
35<< 83 e9 01 -> sub ecx, 1
36>> 81 e9 01 00 00 00 -> sub ecx, 1
37<< 83 f9 01 -> cmp ecx, 1
38>> 81 f9 01 00 00 00 -> cmp ecx, 1
39<< 01 ca -> add edx, ecx
40>> 03 d1 -> add edx, ecx
41<< 66 83 c4 08 -> add sp, 8
42>> 66 81 c4 08 00 -> add sp, 8
43ok
44test_try_transform_acc_with_imm (transasm.tests.unit.test_transform.TestUtils) ... ok
45test_try_transform_duplicate_opcode_extensions (transasm.tests.unit.test_transform.TestUtils) ... ok
46test_try_transform_duplicate_x86_opcodes (transasm.tests.unit.test_transform.TestUtils) ... ok
47test_try_transform_gv_ev_instruction (transasm.tests.unit.test_transform.TestUtils) ... ok
48test_try_transform_gv_ev_instruction_using_displ (transasm.tests.unit.test_transform.TestUtils) ... ok
49test_try_transform_imm_operand_size (transasm.tests.unit.test_transform.TestUtils) ... ok
50test_try_transform_x86_64_using_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
51test_try_transform_x86_using_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
52test_try_transform_zero_scale_sib (transasm.tests.unit.test_transform.TestUtils) ... ok
53test_try_transform_zeroing (transasm.tests.unit.test_transform.TestUtils) ... ok
54test_modrm_type (transasm.tests.unit.test_types.TestTypes) ... ok
55test_sib_type (transasm.tests.unit.test_types.TestTypes) ... ok
56test_clear_bit (transasm.tests.unit.test_utils.TestUtils) ... ok
57test_get_ev_gv_equivalent_opcode_for_reg_ops (transasm.tests.unit.test_utils.TestUtils) ... ok
58test_get_x86_64_instruction (transasm.tests.unit.test_utils.TestUtils) ... ok
59test_get_x86_instruction (transasm.tests.unit.test_utils.TestUtils) ... ok
60test_has_ev_gv_equivalent_opcode_for_reg_ops (transasm.tests.unit.test_utils.TestUtils) ... ok
61test_has_register_operands (transasm.tests.unit.test_utils.TestUtils) ... ok
62test_has_rex_prefix (transasm.tests.unit.test_utils.TestUtils) ... ok
63test_is_bit_set (transasm.tests.unit.test_utils.TestUtils) ... ok
64test_set_bit (transasm.tests.unit.test_utils.TestUtils) ... ok
65test_swap_base_index_in_sib (transasm.tests.unit.test_utils.TestUtils) ... ok
66test_swap_reg_rm_in_modrm (transasm.tests.unit.test_utils.TestUtils) ... ok
67test_yield_x86_64_instructions (transasm.tests.unit.test_utils.TestUtils) ... ok
68
69----------------------------------------------------------------------
70Ran 26 tests in 0.067s
71
72OK

Examples Link to heading

Example with register operands and the ModR/M byte Link to heading

Some x86 instructions have two opcodes so we can write the following two forms:

instructions bytes opcode reference opcode table ModR/M
add dword ptr [rcx], eax 01 01 reg/mem32, reg Ev, Gv 0x01 (mod: 0b00) (reg: 0b000) (rm: 0b001)
add eax, dword ptr [rcx] 03 01 reg, reg/mem32 Gv, Ev 0x01 (mod: 0b00) (reg: 0b000) (rm: 0b001)

When both operands are registers, the opcode is redundant. To obtain the same instruction with a different opcode, we need to rewrite the ModR/M part and especially, invert the reg and the rm parts.

instructions bytes opcode reference opcode table ModR/M
add eax, ebx 01 d8 reg/mem32, reg Ev, Gv 0xd8 (mod: 0b11) (reg: 0b011) (rm: 0b000)
add eax, ebx 03 c3 reg, reg/mem32 Gv, Ev 0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011)

Example with register operands and the SIB byte Link to heading

If the scale factor is 1 (sib.scale == 0b00), then it is possible to swap the base and the index register:

instructions bytes SIB
mov rax, qword ptr [rbx + rcx] 48 8b 04 0b 0x0b (scale: 0b00) (index: 0b001, rcx) (base: 0b011, rbx)
mov rax, qword ptr [rcx + rbx] 49 8b 04 19 0x19 (scale: 0b00) (index: 0b011, rbx) (base: 0b001, rcx)
mov byte ptr [eax + ebx], 5 67 c6 04 18 05 0x18 (scale: 0b00) (index: 0b011, ebx) (base: 0b000, eax)
mov byte ptr [ebx + eax], 5 67 c6 04 03 05 0x03 (scale: 0b00) (index: 0b000, eax) (base: 0b011, ebx)

Note that this manipulation alters the literal representation of the assembly.

Example with duplicate opcode extensions Link to heading

Some instructions have two opcode extensions like the TEST extension when used with a immediate operand:

With Group 3 Eb (0xf6):

instructions bytes ModR/M reg
test bl, 0x10 f6 c3 10 000
test bl, 0x10 f6 cb 10 001

With Group 3 Ev (0xf7):

instructions bytes ModR/M reg
test ebx, 0xaabbccdd f7 c3 dd cc bb aa 000
test ebx, 0xaabbccdd f7 cb dd cc bb aa 001

Example with duplicate opcode on x86 Link to heading

Group 1 have have duplicated opcodes for 0x80 and 0x82 on x86:

instructions bytes
add byte ptr [eax], 0x10 80 00 10
add byte ptr [eax], 0x10 82 00 10
ModR/M reg Instruction
000 ADD
001 OR
010 ADC
011 SBB
100 AND
101 SUB
110 XOR
111 CMP

Example with variable immediate operand sizes Link to heading

instructions bytes
add eax, 0x10 83 c0 10
add eax, 0x10 81 c0 10 00 00 00
add rax, 0x10 48 83 c0 10
add rax, 0x10 48 81 c0 10 00 00 00

Under some constraints, we can also use the Eb, Ib versions with opcodes 80 and 82.

Example with opcodes targeting the accumulator register Link to heading

There is also a special case with the accumulator register where we can use the x4 and x5 opcodes:

instructions bytes ModR/M reg
add al, 0x10 04 10 000
add al, 0x10 80 c0 10 000
add eax, 0x10 05 10 00 00 00 000
add eax, 0x10 81 c0 10 00 00 00 000
add eax, 0x10 83 c0 10 000
add rax, 0x10 48 05 10 00 00 00 000
add rax, 0x10 48 81 c0 10 00 00 00 000
add rax, 0x10 48 83 c0 10 000

This works with the following common instructions: and, or, adc, sbb, sub, xor, cmp. See the adc equivalences below:

instructions bytes ModR/M reg
adc al, 0x10 14 10 000
adc al, 0x10 80 d0 10 010
adc eax, 0x10 15 10 00 00 00 000
adc eax, 0x10 81 d0 10 00 00 00 010
adc eax, 0x10 83 d0 10 010
adc rax, 0x10 48 15 10 00 00 00 000
adc rax, 0x10 48 81 d0 10 00 00 00 010
adc rax, 0x10 48 83 d0 10 010

Example with zero displacement Link to heading

When used with reg/reg operands, displacement size depends on the ModR/M mod part:

instructions bytes ModR/M
add dword ptr [eax], eax 67 01 00 0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
add dword ptr [eax + 00], eax 67 01 40 00 0x40 (mod: 0b01) (reg: 0b000) (rm: 0b000)
add dword ptr [eax + 00000000], eax 67 01 80 00 00 00 00 0x40 (mod: 0b10) (reg: 0b000) (rm: 0b000)
add qword ptr [rax], rax 48 01 00 0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
add qword ptr [rax + 00], rax 48 01 40 00 0x00 (mod: 0b01) (reg: 0b000) (rm: 0b000)
add qword ptr [rax + 00000000], rax 48 01 80 00 00 00 00 0x40 (mod: 0b10) (reg: 0b000) (rm: 0b000)

Example with the SIB byte Link to heading

The SIB byte has a corner case when it comes to the index and base parts. Index and base registers may be not encoded (e.g. direct addressing encoding). Depending on the SIB presence and the SIB.scale, we can craft 5 different but equivalent encodings for a single instruction:

In 32-bit mode:

instructions bytes SIB
mov byte ptr [0xaabbccdd], 0xff c6 05 dd cc bb aa ff
mov byte ptr [0xaabbccdd], 0xff c6 04 25 dd cc bb aa ff 0x25 (scale: 0b00) (index: 0b100) (base: 0b101)
mov byte ptr [0xaabbccdd], 0xff c6 04 65 dd cc bb aa ff 0x65 (scale: 0b01) (index: 0b100) (base: 0b101)
mov byte ptr [0xaabbccdd], 0xff c6 04 a5 dd cc bb aa ff 0xa5 (scale: 0b10) (index: 0b100) (base: 0b101)
mov byte ptr [0xaabbccdd], 0xff c6 04 e5 dd cc bb aa ff 0xe5 (scale: 0b11) (index: 0b100) (base: 0b101)
———————————– ————————— —————————————————
mov byte ptr [esp - 0x56], 0xff c6 45 aa ff
mov byte ptr [esp - 0x56], 0xff c6 44 24 aa ff 0x24 (scale: 0b00) (index: 0b100) (base: 0b100)
mov byte ptr [esp - 0x56], 0xff c6 44 64 aa ff 0x64 (scale: 0b01) (index: 0b100) (base: 0b100)
mov byte ptr [esp - 0x56], 0xff c6 44 a4 aa ff 0xa4 (scale: 0b10) (index: 0b100) (base: 0b100)
mov byte ptr [esp - 0x56], 0xff c6 44 e4 aa ff 0xe4 (scale: 0b11) (index: 0b100) (base: 0b100)
———————————– ————————— —————————————————
mov byte ptr [ebp + 0x56], 0xff c6 45 56 ff
mov byte ptr [ebp + 0x56], 0xff c6 44 25 56 ff 0x25 (scale: 0b00) (index: 0b100) (base: 0b101)
mov byte ptr [ebp + 0x56], 0xff c6 44 65 56 ff 0x65 (scale: 0b01) (index: 0b100) (base: 0b101)
mov byte ptr [ebp + 0x56], 0xff c6 44 a5 56 ff 0xa5 (scale: 0b10) (index: 0b100) (base: 0b101)
mov byte ptr [ebp + 0x56], 0xff c6 44 e5 56 ff 0xe5 (scale: 0b11) (index: 0b100) (base: 0b101)

In 64-bit mode:

instructions bytes SIB
mov byte ptr [rsp - 0x56], 0xff c6 45 aa ff
mov byte ptr [rsp - 0x56], 0xff c6 44 24 aa ff 0x24 (scale: 0b00) (index: 0b100) (base: 0b100)
———————————– ————————— —————————————————
mov byte ptr [esp - 0x56], 0xff 67 c6 45 aa ff
mov byte ptr [esp - 0x56], 0xff 67 c6 44 24 aa ff 0x24 (scale: 0b00) (index: 0b100) (base: 0b100)

Example with legacy prefixes Link to heading

In 32-bit mode, we can omit some legacy prefixes:

instructions bytes
add qword ptr [eax], eax 01 00
add qword ptr [eax], eax 67 01 00

Some instructions might accept one or more prefixes:

instructions bytes
nop 90
nop 66 90
nop 66 67 90
nop 66 66 67 90

Logic transformation Link to heading

Zeroing registers:

instructions
mov eax, 0x0
xor eax, eax
sub eax, eax
instructions code ModR/M
xor bx, bx 66 31 db 0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
xor ebx, ebx 31 db 0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
xor rbx, rbx 48 31 db 0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
sub bx, bx 66 29 db 0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
sub ebx, ebx 29 db 0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
sub rbx, rbx 48 29 db 0xdb (mod: 0b11) (reg: 0b011) (rm: 0b011)
mov bx, 0 66 bb 00 00 0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
mov eax, 0 b8 00 00 00 00 0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
mov ebx, 0 bb 00 00 00 00 0x00 (mod: 0b00) (reg: 0b000) (rm: 0b000)
mov rax, 0 48 c7 c0 00 00 00 00 0xc0 (mod: 0b11) (reg: 0b000) (rm: 0b000)
mov rbx, 0 48 c7 c3 00 00 00 00 0xc3 (mod: 0b11) (reg: 0b000) (rm: 0b011)

To switch between the xor and the sub, we have to switch opcodes. The switch between the xor and the mov is not supported yet.

Going further Link to heading

You can use these techniques to build more cool stuff:

  • Obfuscation / diversification pre/post compilation (think cmake module, lief dissecting, llvm pass, etc)
  • Steganography (take a look at Hydan)
  • On the fly payload/shellcode polymorphism (within your favourite engine)

Download Link to heading

Get a copy at github.com/valkheim/transasm.