Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C++ #532

Draft
wants to merge 36 commits into
base: master
Choose a base branch
from
Draft

Add C++ #532

wants to merge 36 commits into from

Conversation

jorg-vr
Copy link
Contributor

@jorg-vr jorg-vr commented Jul 17, 2024

This pr adds c++ to TESTed.

As the c++ syntax has a fair amount of similarities with c, I have inherited from the c language.
For this I restructured the c/generators.py into a class, instead of just methods. This made it easier to inherit and only adjust the required parts of functions.

Overwritten parts:

  • supported types: C++ adds supports for a lot more complex structures, such as objects, sets, lists and maps
  • use g++ for compilation
  • use std types instead of default c types for typing
  • The c++ code to write out values also had to be written fro scratch as this is not inheritable

Convention choices:

Right now all types are prefixed with std::. We could avoid this by adding using namespace std to the generated code file. For me it is unclear what the common approach is for this. Are students used to writing std::string or simply string?

I have also chosen to prefer std types over standard c types. So even though you could write valid c++ without using any std types, this won't work well with our judge.

I used the string key cpp instead of c++. Both are commonly used, but the first is less likely to cause issues of being parsed incorrectly.

Limitations

For custom oracles written in c++ an object evaluator is expected.
This is different to c# and java where static class methods are used.
But because in c++ static class methods use a different operator :: instead of ->. And :: is currently not supported as TESTed doesn't make this differentiation.

Support for lists, maps and tuples with varying types of content is limited. TESTed tries to define one type that matches all values. This could result in generic types such as Object in java. But c++ does not have such a generic type that could match anything.

No support for tuples within sequences such as list<tuple<int, int>> as TESTed does not provide enough information to infer the tuple's length in that case. We also don't support tuples with different types, as we can only extract one type.

Providing better support for this would require changesto the internal workings of TESTed. (which I tried to avoid in this PR) Mostly keeping track of types in more details would be required.

Observations

Adding a language to TESTed requires three main parts:

  1. a config.py with a class inheriting from the default language class
  2. a generators.py which translates the internal TESTed format into a testfile in the desired language
  3. a templates folder containing language specific helper functions to convert return values and errors to the json format expected by tested

Extending the config.py is properly done using inheretance. Filling out the required methods, which are also mostly documented.

The generators.py is badly organized. There seems to be some kind of method naming convention, as all generators.py files seem to be largely copied from each other.
This results in lots of duplicate code. Because there is no clear API to implement, it is also not always clear which cases should be implemented. This part was the hardest part to write, but is mostly skimmed over in the documentation.
I think this part of the code could really benefit with some types of abstraction, such as an abstract class with methods to implement. Maybe with some different abstract subclasses for typed and untyped languages, which could also reduce code duplication.

The templates folder was not really documented either. It took me a while to figure out what the files actually did. These files are technically not required by TESTed, but helpers for the code generated in generators.py. But we seem to use this for every language. So some form of specification/standardization would be helpful.
This was also hard to implement for me in c++, but that is just because my knowledge of c++ wasn't good enough to properly write type generics.

To end with a more positive observation: The test catch a lot of edge cases. While it was a frustrating experience to keep fixing bugs to get each test passing, it is very clear that the test have heavily improved my initial implementation.

Imran-imtiaz48

This comment was marked as spam.

@niknetniko niknetniko linked an issue Aug 23, 2024 that may be closed by this pull request
@jorg-vr jorg-vr marked this pull request as ready for review February 28, 2025 13:06
Copy link
Member

@niknetniko niknetniko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not run the code locally (NixOS 👍 ), but some comments nonetheless.

For custom oracles written in c++ an object evaluator is expected. This is different to c# and java where static class methods are used. But because in c++ static class methods use a different operator :: instead of ->. And :: is currently not supported as TESTed doesn't make this differentiation.

  • The problem with static stuff probably deserves its own issue so we can think about it in the future. Besides that, the reason Java and C# use static methods is that they do not have (real) top-level functions, but C++ does, so I am wondering why the language specific evaluator needs to be an object? Can it not be a function, like in C?

Providing better support for this would require changesto the internal workings of TESTed. (which I tried to avoid in this PR) Mostly keeping track of types in more details would be required.

  • If you still remember the issues you encountered or the missing data, this is probably also worth putting into an issue.

The generators.py is badly organized. There seems to be some kind of method naming convention, as all generators.py files seem to be largely copied from each other. This results in lots of duplicate code. Because there is no clear API to implement, it is also not always clear which cases should be implemented. This part was the hardest part to write, but is mostly skimmed over in the documentation. I think this part of the code could really benefit with some types of abstraction, such as an abstract class with methods to implement. Maybe with some different abstract subclasses for typed and untyped languages, which could also reduce code duplication.

While I agree that the situation is probably not ideal, a big reason is that introducing an API/base class for this is not that simple. While al lot of languages look very similar, there are often subtle differences, and you also do not want to create generation methods with a lot of parameters, as some languages (e.g. Java) are difficult enough as-is, with the generics stuff.

There are a few things I would change about the current implementation:

  • By inheriting from C, it is very easy to break C++ generation by changing stuff in C, e.g. in the define_write_funtions method.
  • Some code is re-used, but I feel like should be overridden in the C++ implementation to generate C++ code with more best practices:
    • I would also override the methods that generate C-style includes
    • I would also override the methods that generate code with NULL

When taking that into account, the amount of code that is actually not overridden in the C++ implementation is not that big. For this reason, I do not think having the generator code be a class with inheritance to be a good solution to this problem. That is not to say there are no possible improvements, e.g. #564 could maybe improve the code here. The main goal with the generators has always been (for me at least) to make it as easy as possible to follow along in the code to easily debug, e.g. if some type was not correctly generated (which is why this was originally done using mako templates, but those are slow).

Similarly, in general I like to keep the language implementation independent of each other (unless explicitly with helper functions), since both the language config and the code generation are logically independent things, e.g. it might be that some language follow the same logic for generating arguments, but to me that is more a coincidence than really the same behaviour (however, there are some cases where the visitors or more helper functions could indeed make things a bit cleaner).

Construct.ASSIGNMENTS,
Construct.GLOBAL_VARIABLES,
Construct.OBJECTS,
Construct.HETEROGENEOUS_COLLECTIONS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Support for lists, maps and tuples with varying types of content is limited. TESTed tries to define one type that matches all values." mean exactly? Because we could also not include this here for now, and then we do not have to worry about that case.

from tested.languages.utils import executable_name


class CPP(C):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this class overrides a lot of the parent class, I feel like it might be better to not inherit from C, but instead move the code that is actually common to both to some utility methods, similar to how there are a few in the Java/Kotlin classes (these are in https://github.com/dodona-edu/universal-judge/blob/master/tested/languages/utils.py, but perhaps they can go into a new file like c_utils or something?

If the implementation is just a few lines, I would probably even just put it in here as well, since that way they do not depend on each other.

@jorg-vr jorg-vr marked this pull request as draft March 3, 2025 09:09
@jorg-vr
Copy link
Contributor Author

jorg-vr commented Mar 3, 2025

The problem with static stuff probably deserves its own issue so we can think about it in the future. Besides that, the reason Java and C# use static methods is that they do not have (real) top-level functions, but C++ does, so I am wondering why the language specific evaluator needs to be an object? Can it not be a function, like in C?

@niknetniko TESTed provides me with a function with a namespace evaluator (the filename).
c doesn't support namespaces at all, so this is just ignored in all cases.
But in the cpp I have no good way based on which I can decide to ignore the namespace. The only way I see is passing around evaluator_names from the PreparedExecutionUnit and doing a string compare to not write out the namespace in that case. Which does not feel like a good solution.

A better solution would be telling TESTed based on config that no namespace is needed for oracles in this language, and TESTed not generating it in its internal representation in that case.
Or having a separate types for the generated Oracle statements, so languages can treat them differently.
But I would suggest doing that in a separate issue and pr.

@jorg-vr
Copy link
Contributor Author

jorg-vr commented Mar 3, 2025

TODO:

  • add stacktrace cleaner test
  • add jinja test
  • verify need for parallel test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add C++ support
3 participants