Skip to content

ghidra: automate architecture detection #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kaoudis
Copy link
Collaborator

@kaoudis kaoudis commented May 21, 2025

This is often done manually in RE. Ghidra can make a guess, but it isn't always correct, i.e., its guess doesn't handle all Bloodlight addresses correctly. This PR improves on the guess of Ghidra in the fashion one might make such a guess manually, using readelf and file. The goal of this work is to improve the accuracy of the p-code we create, therefore the accuracy of the C we produce and patch.

Bloodlight

What file thinks about Bloodlight: firmwares/bloodlight-firmware.elf: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, with debug_info, not stripped

What readelf -A shows for Bloodlight:

Attribute Section: aeabi
File Attributes
  Tag_CPU_name: "7E-M"
  Tag_CPU_arch: v7E-M
  Tag_CPU_arch_profile: Microcontroller
  Tag_THUMB_ISA_use: Thumb-2
  Tag_FP_arch: VFPv4-D16
  Tag_ABI_PCS_wchar_t: 4
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Needed
  Tag_ABI_FP_number_model: IEEE 754
  Tag_ABI_align_needed: 8-byte
  Tag_ABI_enum_size: small
  Tag_ABI_HardFP_use: SP only
  Tag_ABI_VFP_args: VFP registers
  Tag_ABI_optimization_goals: Aggressive Size
  Tag_CPU_unaligned_access: v6

What readelf -S shows for Bloodlight:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        08000000 001000 003bf8 00  AX  0   0 64 <--- means Cortex
  [ 2] .preinit_array    PREINIT_ARRAY   08003bf8 00d444 000000 04  WA  0   0  1
  [ 3] .init_array       INIT_ARRAY      08003bf8 00d444 000000 04  WA  0   0  1
  [ 4] .fini_array       FINI_ARRAY      08003bf8 00d444 000000 04  WA  0   0  1
  [ 5] .ARM.exidx        ARM_EXIDX       08003bf8 004bf8 000008 00  AL  1   0  4
  [ 6] .data             PROGBITS        20000000 005000 008444 00  WA  0   0  4 <--- means Cortex
  [ 7] .bss              NOBITS          20008444 00d444 0018a8 00  WA  0   0  4
  [ 8] .ccm              PROGBITS        10000000 00d444 000000 00   W  0   0  1 <--- means Cortex
  [ 9] .debug_info       PROGBITS        00000000 00d444 01474c 00      0   0  1
  [10] .debug_abbrev     PROGBITS        00000000 021b90 004e4c 00      0   0  1
  [11] .debug_loclists   PROGBITS        00000000 0269dc 00663e 00      0   0  1
  [12] .debug_aranges    PROGBITS        00000000 02d020 001300 00      0   0  8
  [13] .debug_rnglists   PROGBITS        00000000 02e320 001126 00      0   0  1
  [14] .debug_macro      PROGBITS        00000000 02f446 0098f5 00      0   0  1
  [15] .debug_line       PROGBITS        00000000 038d3b 00bf2c 00      0   0  1
  [16] .debug_str        PROGBITS        00000000 044c67 021234 01  MS  0   0  1
  [17] .comment          PROGBITS        00000000 065e9b 000026 01  MS  0   0  1
  [18] .ARM.attributes   ARM_ATTRIBUTES  00000000 065ec1 000032 00      0   0  1
  [19] .debug_frame      PROGBITS        00000000 065ef4 0028a8 00      0   0  4
  [20] .debug_line_str   PROGBITS        00000000 06879c 0000a8 01  MS  0   0  1
  [21] .symtab           SYMTAB          00000000 068844 003b90 10     22 605  4
  [22] .strtab           STRTAB          00000000 06c3d4 001f63 00      0   0  1
  [23] .shstrtab         STRTAB          00000000 06e337 000106 00      0   0  1

What we got for Bloodlight by default from Ghidra was: ARM:LE:32:v8:default
What the guesser function provides to Ghidra instead: ARM:LE:32:Cortex

PulseOx

What file thinks about PulseOx: firmwares/pulseox-firmware.elf: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, with debug_info, not stripped

What readelf -A shows for PulseOx:

Attribute Section: aeabi
File Attributes
  Tag_CPU_name: "6S-M"
  Tag_CPU_arch: v6S-M
  Tag_CPU_arch_profile: Microcontroller
  Tag_THUMB_ISA_use: Thumb-1
  Tag_ABI_PCS_wchar_t: 4
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Needed
  Tag_ABI_FP_number_model: IEEE 754
  Tag_ABI_align_needed: 8-byte
  Tag_ABI_enum_size: small
  Tag_ABI_optimization_goals: Aggressive Size

What readelf -S shows for PulseOx:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        08000000 001000 004ee0 00  AX  0   0  4 <--- Cortex
  [ 2] .preinit_array    PREINIT_ARRAY   08004ee0 006098 000000 04  WA  0   0  1
  [ 3] .init_array       INIT_ARRAY      08004ee0 006098 000000 04  WA  0   0  1
  [ 4] .fini_array       FINI_ARRAY      08004ee0 006098 000000 04  WA  0   0  1
  [ 5] .data             PROGBITS        20000000 006000 000098 00  WA  0   0  4 <--- Cortex
  [ 6] .bss              NOBITS          20000098 006098 0007ac 00  WA  0   0  4 <--- Cortex
  [ 7] .debug_info       PROGBITS        00000000 006098 012e51 00      0   0  1
  [ 8] .debug_abbrev     PROGBITS        00000000 018ee9 004fd3 00      0   0  1
  [ 9] .debug_loclists   PROGBITS        00000000 01debc 006d9d 00      0   0  1
  [10] .debug_aranges    PROGBITS        00000000 024c60 000aa0 00      0   0  8
  [11] .debug_rnglists   PROGBITS        00000000 025700 000d81 00      0   0  1
  [12] .debug_macro      PROGBITS        00000000 026481 002945 00      0   0  1
  [13] .debug_line       PROGBITS        00000000 028dc6 00b485 00      0   0  1
  [14] .debug_str        PROGBITS        00000000 03424b 00dad9 01  MS  0   0  1
  [15] .comment          PROGBITS        00000000 041d24 000026 01  MS  0   0  1
  [16] .ARM.attributes   ARM_ATTRIBUTES  00000000 041d4a 00002a 00      0   0  1
  [17] .debug_frame      PROGBITS        00000000 041d74 001688 00      0   0  4
  [18] .debug_line_str   PROGBITS        00000000 0433fc 0000a3 01  MS  0   0  1
  [19] .symtab           SYMTAB          00000000 0434a0 002ee0 10     20 443  4
  [20] .strtab           STRTAB          00000000 046380 0014c3 00      0   0  1
  [21] .shstrtab         STRTAB          00000000 047843 0000f6 00      0   0  1

What we got for PulseOx by default from Ghidra: ARM:LE:32:v8:default
What the guesser function instead provides to Ghidra: ARM:LE:32:Cortex

One can also test this change with the LIT tests after rebuilding the headless decomp container to use the new entrypoint function:

./scripts/ghidra/build-headless-docker.sh
cmake --build builds/default/ -j$((`nproc`+1)) --preset debug --target test    

@kaoudis kaoudis requested a review from kumarak May 21, 2025 09:34
@kaoudis kaoudis self-assigned this May 21, 2025
@kaoudis kaoudis requested a review from xlauko as a code owner May 21, 2025 09:34
@kaoudis
Copy link
Collaborator Author

kaoudis commented May 21, 2025

LIT in CI appears to fail, which is odd, since it worked locally - I shall try it locally again directly and then also with act

@kaoudis
Copy link
Collaborator Author

kaoudis commented May 21, 2025

cannot reproduce the failure locally even with release / ci builds, trying to rerun jobs

@kaoudis
Copy link
Collaborator Author

kaoudis commented May 21, 2025

for whatever reason, looks like when we run file on input.o for the insert_substring test, it comes up as a directory in CI 🤔 so I guess I'm going to try to see what on earth we're doing in the CI workflow that we don't do when we run LIT locally?

++ file /input.o
+ local 'file_output=/input.o: directory'

but on the bright side,

********************
PASS: PatchestryTest :: ghidra/array.c (5 of 27)
PASS: PatchestryTest :: ghidra/fread.c (6 of 27)
PASS: PatchestryTest :: ghidra/fwrite.c (7 of 27)
PASS: PatchestryTest :: ghidra/matrix.c (8 of 27)
PASS: PatchestryTest :: ghidra/list.c (9 of 27)
PASS: PatchestryTest :: ghidra/prime.c (10 of 27)
PASS: PatchestryTest :: ghidra/reverse.c (11 of 27)
PASS: PatchestryTest :: ghidra/sort.c (12 of 27)
PASS: PatchestryTest :: ghidra/queue.c (13 of 27)
PASS: PatchestryTest :: ghidra/struct.c (14 of 27)
PASS: PatchestryTest :: ghidra/structb.c (15 of 27)
PASS: PatchestryTest :: ghidra/sub.c (16 of 27)
PASS: PatchestryTest :: ghidra/test.c (17 of 27)
PASS: PatchestryTest :: patchir-decomp/init_mmio.json (18 of 27)
PASS: PatchestryTest :: patchir-decomp/open_memmap.json (19 of 27)
PASS: PatchestryTest :: patchir-decomp/plat_arm_get_alt_image_source.json (20 of 27)
PASS: PatchestryTest :: patchir-opt/measurement_update.json (21 of 27)
PASS: PatchestryTest :: patchir-opt/patch_spo2_lookup.c (22 of 27)
PASS: PatchestryTest :: pcode-translate/function.json (23 of 27)
PASS: PatchestryTest :: pcode-translate/help.json (24 of 27)
PASS: PatchestryTest :: patchir-decomp/register_io_dev_memmap.json (25 of 27)

c.f. https://github.com/lifting-bits/patchestry/actions/runs/15158732553/job/42630693519?pr=86#step:8:103

@kaoudis kaoudis force-pushed the kaoudis/arch-guessing-fix branch from 586a5f2 to ceb126d Compare May 27, 2025 11:12
Copy link
Collaborator

@kumarak kumarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Have a comment to fix identation.

kaoudis added 2 commits May 28, 2025 08:17
… since it looks a little funky to just check section headers for one type of arch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants