Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDA backend: include api features of renamed calls #252

Closed
mr-tz opened this issue Aug 27, 2020 · 10 comments
Closed

IDA backend: include api features of renamed calls #252

mr-tz opened this issue Aug 27, 2020 · 10 comments
Assignees
Labels
enhancement New feature or request ida-explorer Related to IDA Pro plugin

Comments

@mr-tz
Copy link
Collaborator

mr-tz commented Aug 27, 2020

Summary

After renaming dynamic calls in IDA, the backend should emit the respective API calls.

Most often the dynamic calls will be to global addresses, but calls via registers or even local variables are also possible.

Motivation

Malware often contains obfuscated API calls as shown below.

Before
2020-08-27_10-06-25

After
2020-08-27_10-06-32

Describe alternatives you've considered

Alternatively, an additional analysis engine could try to automatically recover/identify API calls, e.g. using emulation. This could then work in the standalone version as well.

@mr-tz mr-tz added the enhancement New feature or request label Aug 27, 2020
@mike-hunhoff
Copy link
Collaborator

mike-hunhoff commented Jul 2, 2021

+1 for this feature. I'm analyzing some shellcode that dynamically resolves all API calls and stores the addresses in a large structure. We should investigate solutions to include these sorts of API calls as well.

@mike-hunhoff mike-hunhoff added the ida-explorer Related to IDA Pro plugin label Jul 13, 2021
@mike-hunhoff
Copy link
Collaborator

maybe can adapt code here -> https://github.com/arizvisa/ida-minsc/blob/master/base/instruction.py#L819 to extract API calls from user-defined structures.

@mike-hunhoff
Copy link
Collaborator

mike-hunhoff commented Jul 30, 2021

After doing additional research it appears that attempting to pull structure member names from a user-defined structure using IDAPython can get messy real quick - especially when working with large, nested structures.

A simpler solution may be to check if a call/jmp references a structure offset using something like idaapi.is_stroff and then parse the structure member using a regular expression, something like (call|jmp)\s+\[.+\s*\+\s*(.+)\].

I don't like the idea of parsing the disassembly like this but it's the simplest solution and may require less overhead then attempting to go through IDAPython.

@mike-hunhoff
Copy link
Collaborator

Another problem to tackle - how do we map something like GetMessageW to user32.GetMessageW? Most capa rules specify the DLL name as part of the rule which means we can't match with just GetMessageW.

One option is to only support specific annotation formats e.g. WIN.api.user32_GetMessageW

@Ana06 Ana06 self-assigned this Jul 9, 2024
@Ana06
Copy link
Member

Ana06 commented Jul 9, 2024

@mike-hunhoff

Another problem to tackle - how do we map something like GetMessageW to user32.GetMessageW? Most capa rules specify the DLL name as part of the rule which means we can't match with just GetMessageW.

I think in the extractor we only use the API name:

yield API(name), ih.address
(I think this has been changed after you wrote this comment)

@Ana06
Copy link
Member

Ana06 commented Jul 9, 2024

Adding support for calls to global variables can be implemented by getting the name of the call/jump operand. I'll send a PR to support it. 😉

@Ana06
Copy link
Member

Ana06 commented Jul 9, 2024

@mike-hunhoff

I'm analyzing some shellcode that dynamically resolves all API calls and stores the addresses in a large structure.

Can you share the hash of the sample?

A simpler solution may be to check if a call/jmp references a structure offset using something like idaapi.is_stroff and then parse the structure member using a regular expression, something like (call|jmp)\s+[.+\s*+\s*(.+)].

I think this could work. But I understand this would need that the structure has been applied in the disassembly. If the pseudocode view work well, I often apply the structures only in the pseudocode view and it wouldn't work in that case. We could also parse the pseudocode view with code like the following which returns the pseudocode line corresponding to the call instruction:

cfunc = ida_hexrays.decompile(ea)
item = cfunc.body.find_closest_addr(ea)
coord = cfunc.find_item_coords(item)
cfunc.get_pseudocode()[coord[1]].line 

But I am not sure how efficient parsing the pseudocode is. We likely would want to ensure we only decompile every function once.

@Ana06
Copy link
Member

Ana06 commented Jul 9, 2024

Registers are also tricky, as they are often the value returned by a function that resolves the APIs (and this function could be very different). An example of a sample I analysed yesterday:
image

I often add a comment in the call instruction with the API. Not sure if this is a standard practice, but if it is, getting the API from the comment should be easy. Parsing the pseudocode could also be used for this case.

@Ana06
Copy link
Member

Ana06 commented Jul 9, 2024

Does someone has a sample where the resolved APIs are stored in local variables? I think this case could also be tricky if the variable is reused.

@mr-tz
Copy link
Collaborator Author

mr-tz commented Sep 30, 2024

closed via #2201

@mr-tz mr-tz closed this as completed Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ida-explorer Related to IDA Pro plugin
Projects
None yet
Development

No branches or pull requests

3 participants