Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR2Builder tool for converting LLVM IR files into C++ IRBuilder API #117129

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mark-sed
Copy link

IR2Builder tool can be used to convert LLVM IR files into C++ programs or code snippets, which can generate calls to IRBuilder, which will generate the original LLVM IR.

There is a RFC on Discourse where I go more into detail about this: https://discourse.llvm.org/t/rfc-ir2builder-an-llvm-ir-converter-to-c-irbuilder-calls/83122, but I'll also try to put a TLDR version here.

  • It is easier to write llvm IR and then convert it using this tool. Also this can be used for learning llvm and how to construct IR.
  • Can generate function or whole module.
  • Can create standalone runnable program (good for testing).
  • There are 2 main things which yet need to be implemented (suggestions on how to do it are more than welcome).
    • Incorrect GEP return type generation - This issue is mostly just me not knowing the correct call to get the return type that I need to provide to CreateGEP call.
    • ConstantExpr creation.
  • I had run llvm opt tests through this tool:
    • Current success rate is almost ~64 % for over 12 500 tests.
    • The GEP issue has 1 030 mentions in these failures (4.6k failed, but there can be multiple mentions in one test).
    • The missing ConstExpr has 17 316 mentions in tests (there are usually multiple constexprs in a single test).
  • The tool counts on new stuff being added to LLVM and not having it updated straight away, it generates TODO comments instead with description of what is missing and user can add it (it does not break the tool).

The discourse post has a list of most important issues/not implemented features and any suggestions or fixes are more than welcome.

If you want to try it out, you can just pass it the ll file you want to convert and you'll get the cpp code on stdout:

./ir2builder code.ll

If you want to create a fully runnable program which will also generate the input you can use the following commands:

./ir2builder code.ll --runnable > code.cpp
g++ -g -O3 code.cpp `./llvm-config --cxxflags --ldflags --system-libs --libs core` -I /home/llvm/include/ -o main
./main > generated.ll

You will get generated.ll which you can also llvm-diff with the code.ll to see if it worked correctly.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 42da81582ea5a0e5bb0e18af74e6c101f0307f36 233e25f9157f42c7fa68d78ca950dbf25b538117 --extensions cpp -- llvm/tools/ir2builder/ir2builder.cpp
View the diff from clang-format here.
diff --git a/llvm/tools/ir2builder/ir2builder.cpp b/llvm/tools/ir2builder/ir2builder.cpp
index 8583c995c3..dd13b07a37 100644
--- a/llvm/tools/ir2builder/ir2builder.cpp
+++ b/llvm/tools/ir2builder/ir2builder.cpp
@@ -198,9 +198,7 @@ public:
   void convert(const Instruction *I, raw_ostream &OS);
 };
 
-std::string IR2Builder::getNextVar() {
-  return "v0" + std::to_string(varI++);
-}
+std::string IR2Builder::getNextVar() { return "v0" + std::to_string(varI++); }
 
 static std::string to_str(bool b) { return b ? "true" : "false"; }
 

@mark-sed
Copy link
Author

Ping for those that joined the discourse conversation: @jurahul @efriedma-quic @kparzysz @danilaml @nhaehnle

@nikic
Copy link
Contributor

nikic commented Nov 21, 2024

ConstantExpr creation.

You can create ConstantExprs using the usual IRBuilder APIs for instructions. IRBuilder will automatically convert to ConstExpr where possible.

Copy link
Contributor

@nvjle nvjle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some preliminary stylistic (Coding Standards) comments.

Also, I did not see any unit tests. Please add some in a new directory llvm/test/tools/ir2builder. There are many examples there for the other LLVM tools.

} else if (isa<ConstantTokenNone>(c)) {
return llvmPrefix + "ConstantTokenNone::get(" + ctx + ")";
} else if (auto ce = dyn_cast<ConstantExpr>(c)) {
(void)ce;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary code, but use [[maybe_unused]] ce instead?

@mark-sed
Copy link
Author

@nvjle Thank you so much for these comments and tips, I'll try to implement these asap.

@mark-sed
Copy link
Author

ConstantExpr creation.

You can create ConstantExprs using the usual IRBuilder APIs for instructions. IRBuilder will automatically convert to ConstExpr where possible.

@nikic is your idea to replace all the ConstantExprs with call to getAsInstruction()?

@mark-sed
Copy link
Author

mark-sed commented Mar 5, 2025

Kind ping.
Is there something else that is needed to merge this (other than the constexpr generation fix)?

return Tmp;
}

static std::string sanitize(std::string S) {
Copy link
Contributor

@jurahul jurahul Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::string& or use a StringRef?

@jurahul
Copy link
Contributor

jurahul commented Mar 6, 2025

Reiterating some earlier comments, is there some unit test for this? Do we need something basic for smoke testing?

Also, what's the evolution story for this tool? Are folks expected to update it as IR changes happen? May be @nikic can clarify these points (need not block the merge though)

@mark-sed
Copy link
Author

mark-sed commented Mar 6, 2025

Reiterating some earlier comments, is there some unit test for this? Do we need something basic for smoke testing?

Adding some unittests is a good idea. I'll look into that.

Also, what's the evolution story for this tool? Are folks expected to update it as IR changes happen?

That would be the ideal case, but of course I expect that there will be cases where people forget about adding the new IR here. But fortunately this tool is made in a way that it will still work, it might generate integer instead of an enum value or a TODO: which will also point to the missing constructs.
What would really need updating are some major changes to IRBuilder, that is stuff that could make the generator create non-compilable code.

}
if (auto *CE = dyn_cast<GlobalValue>(C)) {
// This should not really happen as asStr for Value should be always called.
return asStr(cast<Value>(CE));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're casting to a base class you can use static_cast. cast contains an assert to check the cast is valid, that's for casting to a derived class.

// TODO: Sign has to be determined.
auto CVal = CI->getValue();
return formatv("{0}ConstantInt::get({1}, {2})", LLVMPrefix,
asStr(C->getType()), CVal.getSExtValue());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle ConstantInts not representable as int64_t. getSExtValue() will assert. You should handle this case or use trySExtValue so you can gracefully fail without asserting.

if (auto *CI = dyn_cast<ConstantInt>(C)) {
// TODO: Sign has to be determined.
auto CVal = CI->getValue();
return formatv("{0}ConstantInt::get({1}, {2})", LLVMPrefix,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ConstantInt::getSigned since you used getSExtValue(). They need to agree.

ClassName = "ConstantArray";
Values = formatv(
"{0}ArrayRef<{0}Constant *>({1}{2}{3})", LLVMPrefix,
(AT->getNumOperands() > 1 ? std::string("{") : std::string("")),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need std::string here? Can formatv take them as const char *?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants