Open
Description
Work environment
Questions | Answers |
---|---|
OS/arch/bits | x86_64 Ubuntu 20.04 |
Architecture | x86_64 |
Source of Capstone | git clone , default on master branch. |
Version/git commit | llvm-20git, f08278 |
minimum PoC disassembler
#include <llvm-c/Disassembler.h>
#include <llvm-c/Target.h>
int main(int argc, char *argv[]){
/*
some input sanity check of hex string from argv
*/
// Initialize LLVM after input validation
LLVMInitializeAllTargetInfos();
LLVMInitializeAllTargets();
LLVMInitializeAllTargetMCs();
LLVMInitializeAllDisassemblers();
LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
if (!disasm) {
errx(1, "Error: LLVMCreateDisasm() failed.");
}
// Set disassembler options: print immediates as hex, use Intel syntax
if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
LLVMDisassembler_Option_AsmPrinterVariant)) {
errx(1, "Error: LLVMSetDisasmOptions() failed.");
}
char output_string[MAX_OUTPUT_LENGTH];
uint64_t address = 0;
size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
output_string, sizeof(output_string));
if (instr_len > 0) {
printf("%s\n", output_string);
} else {
printf("Error: Unable to disassemble the input bytes.\n");
}
}
Instruction bytes giving faulty results
0f 1a de
Expected results
It should be:
nop esi, ebx
Actually results
$./min_llvm_disassembler "0f1ade"
Error: Unable to disassemble the input bytes.
Other cases seem to work
$./min_llvm_disassembler "0f1f00"
nop dword ptr [rax]
Additional Logs, screenshots, source code, configuration dump, ...
Instructions with opcodes ranging from 0f 18
to 0f 1f
are defined as multi-byte NOP (No Operation) instructions in x86 ISA. Please refer to the StackOverflow post for more details. It should be decoded in the following logic.
- "0x0f 0x1a" is extended opcode.
- The ModR/M byte DE translates to binary 11011110 (0xde).
- Bits 7-6 (Mod): 11 (binary) = 3 (decimal)
Indicates register-direct addressing mode. - Bits 5-3 (Reg): 011 (binary) = 3 (decimal)
Corresponds to the EBX (or RBX in 64-bit mode) register. - Bits 2-0 (R/M): 110 (binary) = 6 (decimal)
Corresponds to the ESI (or RSI in 64-bit mode) register.
- Bits 7-6 (Mod): 11 (binary) = 3 (decimal)
XED also translates "0f 1a de" into "nop esi, ebx".