Skip to content

[x86][MC] Fail to decode some long multi-byte NOPs #117304

Open
@venkyqz

Description

@venkyqz

Work environment

Questions Answers
OS/arch/bits x86_64 Ubuntu 20.04
Architecture x86_64
Source of Capstone git clone, default on master branch.
Version/git commit llvm-20git, f08278

minimum PoC disassembler

#include <llvm-c/Disassembler.h>
#include <llvm-c/Target.h>

int main(int argc, char *argv[]){
    /*
       some input sanity check of hex string from argv
    */
    // Initialize LLVM after input validation
    LLVMInitializeAllTargetInfos();
    LLVMInitializeAllTargets();
    LLVMInitializeAllTargetMCs();
    LLVMInitializeAllDisassemblers();

    LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
    if (!disasm) {
        errx(1, "Error: LLVMCreateDisasm() failed.");
    }

    // Set disassembler options: print immediates as hex, use Intel syntax
    if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
                                        LLVMDisassembler_Option_AsmPrinterVariant)) {
        errx(1, "Error: LLVMSetDisasmOptions() failed.");
    }

    char output_string[MAX_OUTPUT_LENGTH];
    uint64_t address = 0;
    size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
                                             output_string, sizeof(output_string));

    if (instr_len > 0) {
        printf("%s\n", output_string);
    } else {
        printf("Error: Unable to disassemble the input bytes.\n");
    }
}

Instruction bytes giving faulty results

0f 1a de

Expected results

It should be:

nop esi, ebx

Actually results

$./min_llvm_disassembler "0f1ade"
Error: Unable to disassemble the input bytes.

Other cases seem to work

$./min_llvm_disassembler "0f1f00"
nop     dword ptr [rax]

Additional Logs, screenshots, source code, configuration dump, ...

Instructions with opcodes ranging from 0f 18 to 0f 1f are defined as multi-byte NOP (No Operation) instructions in x86 ISA. Please refer to the StackOverflow post for more details. It should be decoded in the following logic.

  • "0x0f 0x1a" is extended opcode.
  • The ModR/M byte DE translates to binary 11011110 (0xde).
    • Bits 7-6 (Mod): 11 (binary) = 3 (decimal)
      Indicates register-direct addressing mode.
    • Bits 5-3 (Reg): 011 (binary) = 3 (decimal)
      Corresponds to the EBX (or RBX in 64-bit mode) register.
    • Bits 2-0 (R/M): 110 (binary) = 6 (decimal)
      Corresponds to the ESI (or RSI in 64-bit mode) register.

XED also translates "0f 1a de" into "nop esi, ebx".

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions