Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C extension (speeds up simpleion dump/load) #149

Merged
merged 5 commits into from
May 29, 2021

Conversation

cheqianh
Copy link
Contributor

@cheqianh cheqianh commented Apr 22, 2021

Description:

PR of C extension. (both coding and configuration parts).

Distribution part may be changed but I'll create a separate PRs for them as needed.

I added comment for myself from the feedback last time. (e.g. not use force-python-impl, check c skip list etc.)

Suggested Review Order

C extension consists of three parts: simpleion dump/load, c extension (e.g. for mac it's ionc.so) and ion-c binaries. We will go through each of them below.

C extension --- setup.py:

Highlights:
We add c extension information here, setup.py will build c extension during the build process. we can also manually build extension by python setup.py build_ext.

 

Ion-c binaries --- install.py:

Then we look into install.py where we prepare ion-c binaries and headers for us. it starts from here.

Highlights:

  1. The high level goal of install.py is: 1. checkout ion-c, 2. build ion-c binaries, 3. move required output for ionc and decNumber into ion-c-build. 4. copy ion-c-build to amazon/ion for distribution.
  2. We currently support Mac, Windows and Linux. There are still some issues need to be improve. I added comment regarding these issues. (e.g. in windows, Check if VS is available in virtual machine; In linux, the wheel is initialized with an unexpected name, but simply rename it solves issue, need more research about this. etc.)

 

Python --- simpleion.py:

Now We look into simpleion.py, how we connect ion-python and c extension.

Highlights:

  1. In order to not use force_python_impl, I rename the previous dump/load to dump_original/load_original. New dump method wraps dump_original and dump_extension (c extension dump API). So does load.

 

Distribute all required files --- MANIFEST.in:

After preparing all three parts, the next step is letting PYPI know what files we are going to distribute.

We added some required files into MANIFEST.in for distribution: ion-c binaries in ion-c-build, _ioncmodule.h header file for c module and installing script install.py. Now when we distribute ion-python, both c extension and ion-c will be included.

 
 

Next let's look into ion-c module and their test:

ion-c module --- ioncmodule.c and _ioncmodule.h

For dump (write), it starts from here, this method initialize required variable and parse the python argument to c types. It will eventually call ionc_write_value to write value. ionc_write_value will check what type the value is and use the corresponding ion-c api to write them.

Likely for load, it starts from here and eventually will call ionc_read_value to read files.

 

After knowing how the c module looks like, we move on to unit tests:

tests/test_cookbook.py
tests/test_simpleion.py
tests/tests/test_vectors.py
Refer to comments for more details.

 

Last file .github/workflows/python-publish.yml is minor,

Don't need to review anymore, going to find another way to automate distribution.

I'll temporary keep this script until we find a way to distribute.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

amazon/ion/ioncmodule.c Show resolved Hide resolved
IONCHECK(c_string_from_py(decimal_str, &decimal_c_str, &decimal_c_str_len));

ION_DECIMAL decimal_value;
IONCHECK(ion_decimal_from_string(&decimal_value, decimal_c_str, &dec_context));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced previous decQuadFromString to ion_decimal_from_string.

Refer to here for previous code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here as with int. Can we use math to translate between the types instead of roundtripping via string?

Copy link
Contributor Author

@cheqianh cheqianh May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we only support 34 decimal digits. There is a test with more than 34 digits throws a NUMBER_FLOW exception.

_IT.DECIMAL: (
(_D('1.1999999999999999555910790149937383830547332763671875e0'),   # <- value
 b'1.1999999999999999555910790149937383830547332763671875'))           # <- expected value

dump/write decimal function here.
load/read decimal function here.

amazon/ion/ioncmodule.c Show resolved Hide resolved
amazon/ion/ioncmodule.c Outdated Show resolved Hide resolved
amazon/ion/simpleion.py Outdated Show resolved Hide resolved
install.py Show resolved Hide resolved
setup.py Show resolved Hide resolved
tests/test_simpleion.py Show resolved Hide resolved
tests/test_vectors.py Show resolved Hide resolved
amazon/ion/simpleion.py Outdated Show resolved Hide resolved
@cheqianh
Copy link
Contributor Author

The Travis failed due to wheel build. Will look into this later.

@cheqianh cheqianh requested a review from tgregg April 22, 2021 18:48
Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's coming along well. I'm excited for the performance improvements.

Once this review is finished, we should read/write some large sample files and try to detect memory leaks using, e.g. Valgrind or whatever the best tool is for memory analysis with Python C extensions. The manual memory management in the C extension is tricky and it's possible we missed something.

amazon/ion/ioncmodule.c Show resolved Hide resolved
amazon/ion/ioncmodule.c Show resolved Hide resolved
amazon/ion/ioncmodule.c Show resolved Hide resolved
amazon/ion/ioncmodule.c Outdated Show resolved Hide resolved
IONCHECK(c_string_from_py(decimal_str, &decimal_c_str, &decimal_c_str_len));

ION_DECIMAL decimal_value;
IONCHECK(ion_decimal_from_string(&decimal_value, decimal_c_str, &dec_context));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here as with int. Can we use math to translate between the types instead of roundtripping via string?

install.py Show resolved Hide resolved
install.py Show resolved Hide resolved
tests/test_cookbook.py Outdated Show resolved Hide resolved
tests/test_simpleion.py Show resolved Hide resolved
tests/test_vectors.py Show resolved Hide resolved
@cheqianh
Copy link
Contributor Author

cheqianh commented May 13, 2021

1. What I did:

  • Avoid using string to improve performance. Refer to here for details.
  • Changed Unit test that able to test both ion-python and ion-c spec. Refer to here for details.
  • Removed try..catch branch for simple-ion dump/load. Refer to here for details.
  • One case failed python test but not in C skip-list. Refer to here for details.
  • Other comments.
    1. Py decimal accept string. Refer to here
    2. Null field name. Refer to here
    3. Unknown text symbol. Refer to here

2. After finishing automation part:

  • Run large files and do a memory leak check.
  • Travis CI fails.
  • Some TODOs in the code.
  • Fix an ion-c issue and more issues as time allows.

Let me know if there is anything need to be prioritized!

Comment on lines 472 to 475
if (PyObject_RichCompareBool(temp, PyLong_FromLong(0), Py_LT) == 1) {
ion_int_value._signum = -1;
temp = PyNumber_Negative(temp);
} else if (PyObject_RichCompareBool(temp, PyLong_FromLong(0), Py_GT) == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should PyLong_FromLong(0) be stored in a variable to avoid recomputing it? Or is 0 a special case that requires no computation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

// Python equivalence: digit = temp / 2^31, temp = temp % 2^31
res = PyNumber_Divmod(temp, pow_value);
py_digit = PyNumber_Long(PyTuple_GetItem(res, 0));
py_reminder = PyTuple_GetItem(res, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

py_reminder -> py_remainder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@cheqianh
Copy link
Contributor Author

cheqianh commented May 26, 2021

Summary:

  1. Avoid using string for ion/python decimal conversion: Refer to here
  2. Unit test for different order symbol IDs: Refer to here
  3. One vector not in ion-c skip list fails c extension unit test: Refer to here
  4. Unicode and raw bytes representation in unit test: Refer to here
  5. Other changes. such as typo remainder, re-use PyLong_FromLong(0) etc.

Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Do you know why the Travis builds are failing?

@cheqianh
Copy link
Contributor Author

cheqianh commented May 28, 2021

highlights:

  1. Travis failed because it builds c extension without initializing ion-c binaries.
  2. This fixes highlights 1 above: Now all the setup work can be done in setup.py and no need to run install.py separately.
  3. Travis is still failing due to two kinds of tests: decimal with more than 34 digits and different order SID with a symbol table. Working on this part right now.

@cheqianh cheqianh changed the base branch from master to c-extension-beta May 29, 2021 08:17
@cheqianh
Copy link
Contributor Author

cheqianh commented May 29, 2021

The travis failed due to known unit test failures, and manually checked that they are all equivalent. As discussed offline, I'll push it to c-extension-beta branch first and finish all the left work in a new Pull Request.

@cheqianh cheqianh merged commit 3529e2a into amazon-ion:c-extension-beta May 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants