Skip to content

Commit

Permalink
Merge pull request #148 from sfiligoi/igor_default_fp32
Browse files Browse the repository at this point in the history
Change default precision to fp32 and add explicit fp64 functions
  • Loading branch information
sfiligoi authored Dec 14, 2022
2 parents 6ebb012 + 6194526 commit 0e61d5a
Show file tree
Hide file tree
Showing 8 changed files with 884 additions and 166 deletions.
14 changes: 8 additions & 6 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,14 @@ jobs:
needs: lint
strategy:
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
os: [ubuntu-latest, macos-latest]
python-version: ['3.8', '3.9', '3.10']
os: [ubuntu-latest, macos-latest, linux-gpu-cuda]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- uses: conda-incubator/setup-miniconda@v2
with:
with:
miniconda-version: "latest"
auto-update-conda: true
python-version: ${{ matrix.python-version }}
- name: Install
Expand All @@ -59,9 +60,8 @@ jobs:
else
conda install --yes -c conda-forge -c bioconda clangxx_osx-64
fi
conda install --yes -c conda-forge -c bioconda unifrac-binaries
# TEMP HACK: Use older version of scipy to work around scikit-bio problem
conda install --yes -c conda-forge -c bioconda cython "scipy<1.9" "hdf5<1.12.1" biom-format numpy "h5py<3.0.0 | >3.3.0" "scikit-bio>=0.5.7" nose
conda install --yes -c conda-forge -c bioconda "unifrac-binaries>=1.2"
conda install --yes -c conda-forge -c bioconda cython scipy hdf5 biom-format numpy "h5py>3.3.0" "scikit-bio>=0.5.8" nose
echo "$(uname -s)"
if [[ "$(uname -s)" == "Linux" ]];
then
Expand All @@ -80,13 +80,15 @@ jobs:
shell: bash -l {0}
run: |
conda activate unifrac
export UNIFRAC_GPU_INFO=Y
ls -lrt $CONDA_PREFIX/lib/libhdf5_cpp*
nosetests
- name: Sanity checks
shell: bash -l {0}
run: |
conda activate unifrac
export UNIFRAC_GPU_INFO=Y
set -e
ssu -i unifrac/tests/data/crawford.biom -t unifrac/tests/data/crawford.tre -o ci/test.dm -m unweighted
python -c "import skbio; dm = skbio.DistanceMatrix.read('ci/test.dm')"
Expand Down
104 changes: 60 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,22 +135,22 @@ To use Stacked Faith through QIIME2, given similar artifacts, you can use:
The library can be accessed directly from within Python. If operating in this mode, the API methods are expecting a filepath to a BIOM-Format V2.1.0 table, and a filepath to a Newick formatted phylogeny.

$ python
Python 3.7.8 | packaged by conda-forge | (default, Nov 27 2020, 19:24:58)
[GCC 9.3.0] on linux
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unifrac
>>> dir(unifrac)
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__',
'__package__', '__path__', '__spec__', '__version__', '_api', '_meta', '_methods',
'faith_pd',
'generalized', 'generalized_fp32', 'generalized_fp32_to_file', 'generalized_to_file',
'h5pcoa', 'h5unifrac', 'meta', 'pkg_resources', 'ssu', 'ssu_to_file',
'unweighted', 'unweighted_fp32', 'unweighted_fp32_to_file', 'unweighted_to_file',
'weighted_normalized', 'weighted_normalized_fp32', 'weighted_normalized_fp32_to_file', 'weighted_normalized_to_file',
'weighted_unnormalized', 'weighted_unnormalized_fp32', 'weighted_unnormalized_fp32_to_file', 'weighted_unnormalized_to_file']
>>> print(unifrac.unweighted_fp32.__doc__)
Compute Unweighted UniFrac using fp32 math

['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__',
'__path__', '__spec__', '__version__', '_api', '_meta', '_methods', 'faith_pd',
'generalized', 'generalized_fp32', 'generalized_fp32_to_file', 'generalized_fp64', 'generalized_fp64_to_file', 'generalized_to_file',
'h5pcoa', 'h5unifrac', 'meta', 'pkg_resources', 'ssu', 'ssu_fast', 'ssu_inmem', 'ssu_to_file',
'unweighted', 'unweighted_fp32', 'unweighted_fp32_to_file', 'unweighted_fp64', 'unweighted_fp64_to_file', 'unweighted_to_file',
'weighted_normalized', 'weighted_normalized_fp32', 'weighted_normalized_fp32_to_file',
'weighted_normalized_fp64', 'weighted_normalized_fp64_to_file', 'weighted_normalized_to_file',
'weighted_unnormalized', 'weighted_unnormalized_fp32', 'weighted_unnormalized_fp32_to_file',
'weighted_unnormalized_fp64', 'weighted_unnormalized_fp64_to_file', 'weighted_unnormalized_to_file']
>>> print(unifrac.unweighted.__doc__)
Compute Unweighted UniFrac

Parameters
----------
table : str
Expand All @@ -166,12 +166,12 @@ The library can be accessed directly from within Python. If operating in this mo
by about 50%, but is an approximation.
n_substeps : int, optional
Internally split the problem in substeps for reduced memory footprint.

Returns
-------
skbio.DistanceMatrix
The resulting distance matrix.

Raises
------
IOError
Expand All @@ -180,7 +180,7 @@ The library can be accessed directly from within Python. If operating in this mo
ValueError
If the table does not appear to be BIOM-Format v2.1.
If the phylogeny does not appear to be in Newick format.

Environment variables
---------------------
OMP_NUM_THREADS
Expand All @@ -189,14 +189,14 @@ The library can be accessed directly from within Python. If operating in this mo
Enable or disable GPU offload. If not defined, autodetect.
ACC_DEVICE_NUM
The GPU to use. If not defined, the first GPU will be used.

Notes
-----
Unweighted UniFrac was originally described in [1]_. Variance Adjusted
UniFrac was originally described in [2]_, and while its application to
Unweighted UniFrac was not described, factoring in the variance adjustment
is still feasible and so it is exposed.

References
----------
.. [1] Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for
Expand All @@ -205,10 +205,10 @@ The library can be accessed directly from within Python. If operating in this mo
.. [2] Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a
powerful beta diversity measure for comparing communities based on
phylogeny. BMC Bioinformatics 12:118 (2011).

>>> print(unifrac.unweighted_fp32_to_file.__doc__)
Compute Unweighted UniFrac using fp32 math and write to file

>>> print(unifrac.unweighted_to_file.__doc__)
Compute Unweighted UniFrac and write to file
Parameters
----------
table : str
Expand All @@ -235,12 +235,12 @@ The library can be accessed directly from within Python. If operating in this mo
can be used to reduce the amount of memory needed.
n_substeps : int, optional
Internally split the problem in substeps for reduced memory footprint.

Returns
-------
str
A filepath to the output file.

Raises
------
IOError
Expand All @@ -250,7 +250,7 @@ The library can be accessed directly from within Python. If operating in this mo
ValueError
If the table does not appear to be BIOM-Format v2.1.
If the phylogeny does not appear to be in Newick format.

Environment variables
---------------------
OMP_NUM_THREADS
Expand All @@ -259,14 +259,14 @@ The library can be accessed directly from within Python. If operating in this mo
Enable or disable GPU offload. If not defined, autodetect.
ACC_DEVICE_NUM
The GPU to use. If not defined, the first GPU will be used.

Notes
-----
Unweighted UniFrac was originally described in [1]_. Variance Adjusted
UniFrac was originally described in [2]_, and while its application to
Unweighted UniFrac was not described, factoring in the variance adjustment
is still feasible and so it is exposed.

References
----------
.. [1] Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for
Expand All @@ -275,27 +275,27 @@ The library can be accessed directly from within Python. If operating in this mo
.. [2] Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a
powerful beta diversity measure for comparing communities based on
phylogeny. BMC Bioinformatics 12:118 (2011).

>>> print(unifrac.h5unifrac.__doc__)
Read UniFrac from a hdf5 file

Parameters
----------
h5file : str
A filepath to a hdf5 file.

Returns
-------
skbio.DistanceMatrix
The distance matrix.

Raises
------
OSError
If the hdf5 file is not found
KeyError
If the hdf5 does not have the necessary fields

References
----------
.. [1] Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for
Expand All @@ -304,7 +304,7 @@ The library can be accessed directly from within Python. If operating in this mo
.. [2] Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a
powerful beta diversity measure for comparing communities based on
phylogeny. BMC Bioinformatics 12:118 (2011).

>>> print(unifrac.faith_pd.__doc__)
Execute a call to the Stacked Faith API in the UniFrac package

Expand Down Expand Up @@ -402,14 +402,30 @@ The methods can also be used directly through the command line after install:
## Minor test dataset

A small test `.biom` and `.tre` can be found in `sucpp/`. An example with expected output is below, and should execute in 10s of milliseconds:

$ ssu -i sucpp/test.biom -t sucpp/test.tre -m unweighted -o test.out
$ cat test.out
Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
Sample1 0 0.2 0.5714285714285714 0.6 0.5 0.2
Sample2 0.2 0 0.4285714285714285 0.6666666666666666 0.6 0.3333333333333333
Sample3 0.5714285714285714 0.4285714285714285 0 0.7142857142857143 0.8571428571428571 0.4285714285714285
Sample4 0.6 0.6666666666666666 0.7142857142857143 0 0.3333333333333333 0.4
Sample5 0.5 0.6 0.8571428571428571 0.3333333333333333 0 0.6
Sample6 0.2 0.3333333333333333 0.4285714285714285 0.4 0.6 0
A small test `.biom` and `.tre` can be found in `unifrac/tests/data/`. An example with expected output is below, and should execute in 10s of milliseconds:

$ python
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unifrac
>>> d=unifrac.unweighted('unifrac/tests/data/crawford.biom','unifrac/tests/data/crawford.tre')
>>> d.data
array([[0. , 0.71836066, 0.7131736 , 0.6974604 , 0.6258721 ,
0.7282667 , 0.72065896, 0.7264058 , 0.7360605 ],
[0.71836066, 0. , 0.7030297 , 0.734073 , 0.6548042 ,
0.71547383, 0.7839781 , 0.723184 , 0.7613893 ],
[0.7131736 , 0.7030297 , 0. , 0.6104128 , 0.623313 ,
0.71848303, 0.7041634 , 0.75258476, 0.7924903 ],
[0.6974604 , 0.734073 , 0.6104128 , 0. , 0.6439278 ,
0.7005273 , 0.6983272 , 0.77818936, 0.72959894],
[0.6258721 , 0.6548042 , 0.623313 , 0.6439278 , 0. ,
0.75782686, 0.7100514 , 0.75065047, 0.7894437 ],
[0.7282667 , 0.71547383, 0.71848303, 0.7005273 , 0.75782686,
0. , 0.63593644, 0.71283615, 0.5831464 ],
[0.72065896, 0.7839781 , 0.7041634 , 0.6983272 , 0.7100514 ,
0.63593644, 0. , 0.6920076 , 0.6897206 ],
[0.7264058 , 0.723184 , 0.75258476, 0.77818936, 0.75065047,
0.71283615, 0.6920076 , 0. , 0.7151408 ],
[0.7360605 , 0.7613893 , 0.7924903 , 0.72959894, 0.7894437 ,
0.5831464 , 0.6897206 , 0.7151408 , 0. ]], dtype=float32)

2 changes: 1 addition & 1 deletion ci/linux-64.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ flake8
nose
scikit-bio
biom-format
h5py==2.7.0
h5py
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@

PREFIX = os.environ.get('PREFIX', "")

base = ["cython >= 0.26", "biom-format", "numpy", "h5py >= 2.7.0",
"scikit-bio >= 0.5.1", "iow"]
base = ["cython >= 0.26", "biom-format", "numpy", "h5py >= 3.3.0",
"scikit-bio >= 0.5.8", "iow"]

test = ["nose", "flake8"]

Expand Down Expand Up @@ -92,7 +92,7 @@ def run_compile_ssu(self):

setup(
name="unifrac",
version="1.0.0",
version="1.2.0",
packages=find_packages(),
author="Daniel McDonald",
license='BSD-3-Clause',
Expand Down
18 changes: 16 additions & 2 deletions unifrac/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@
weighted_normalized,
weighted_unnormalized,
generalized,
unweighted_fp64,
weighted_normalized_fp64,
weighted_unnormalized_fp64,
generalized_fp64,
unweighted_fp32,
weighted_normalized_fp32,
weighted_unnormalized_fp32,
Expand All @@ -20,6 +24,10 @@
weighted_normalized_to_file,
weighted_unnormalized_to_file,
generalized_to_file,
unweighted_fp64_to_file,
weighted_normalized_fp64_to_file,
weighted_unnormalized_fp64_to_file,
generalized_fp64_to_file,
unweighted_fp32_to_file,
weighted_normalized_fp32_to_file,
weighted_unnormalized_fp32_to_file,
Expand All @@ -32,12 +40,18 @@

__version__ = pkg_resources.get_distribution('unifrac').version
__all__ = ['unweighted', 'weighted_normalized', 'weighted_unnormalized',
'generalized', 'unweighted_fp32', 'weighted_normalized_fp32',
'generalized', 'unweighted_fp64', 'weighted_normalized_fp64',
'weighted_unnormalized_fp64', 'generalized_fp64',
'unweighted_fp32', 'weighted_normalized_fp32',
'weighted_unnormalized_fp32', 'generalized_fp32',
'meta',
'unweighted_to_file', 'weighted_normalized_to_file',
'weighted_unnormalized_to_file',
'generalized_to_file', 'unweighted_fp32_to_file',
'generalized_to_file', 'unweighted_fp64_to_file',
'weighted_normalized_fp64_to_file',
'weighted_unnormalized_fp64_to_file',
'generalized_fp64_to_file',
'unweighted_fp32_to_file',
'weighted_normalized_fp32_to_file',
'weighted_unnormalized_fp32_to_file',
'generalized_fp32_to_file',
Expand Down
12 changes: 10 additions & 2 deletions unifrac/_api.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ def ssu_inmem(object table, object tree,
unifrac_method : str
The requested UniFrac method, one of {unweighted,
weighted_normalized, weighted_unnormalized, generalized,
unweighted_fp64, weighted_normalized_fp64,
weighted_unnormalized_fp64, generalized_fp64,
unweighted_fp32, weighted_normalized_fp32,
weighted_unnormalized_fp32, generalized_fp32}
variance_adjust : bool
Expand Down Expand Up @@ -83,7 +85,7 @@ def ssu_inmem(object table, object tree,
met_py_bytes = unifrac_method.encode()
met_c_string = met_py_bytes

if '_fp32' in unifrac_method:
if '_fp64' not in unifrac_method:
numpy_arr_fp32 = _ssu_inmem_fp32(inmem_biom, inmem_tree, met_c_string,
variance_adjust, alpha, bypass_tips,
n_substeps)
Expand Down Expand Up @@ -196,6 +198,8 @@ def ssu_fast(str biom_filename, str tree_filename, object ids,
unifrac_method : str
The requested UniFrac method, one of {unweighted,
weighted_normalized, weighted_unnormalized, generalized,
unweighted_fp64, weighted_normalized_fp64,
weighted_unnormalized_fp64, generalized_fp64,
unweighted_fp32, weighted_normalized_fp32,
weighted_unnormalized_fp32, generalized_fp32}
variance_adjust : bool
Expand Down Expand Up @@ -241,7 +245,7 @@ def ssu_fast(str biom_filename, str tree_filename, object ids,
tree_c_string = tree_py_bytes
met_c_string = met_py_bytes

if '_fp32' in unifrac_method:
if '_fp64' not in unifrac_method:
numpy_arr_fp32 = _ssu_fast_fp32(biom_c_string, tree_c_string,
ids.__len__(), met_c_string,
variance_adjust, alpha, bypass_tips,
Expand Down Expand Up @@ -365,6 +369,8 @@ def ssu(str biom_filename, str tree_filename,
unifrac_method : str
The requested UniFrac method, one of {unweighted,
weighted_normalized, weighted_unnormalized, generalized,
unweighted_fp64, weighted_normalized_fp64,
weighted_unnormalized_fp64, generalized_fp64,
unweighted_fp32, weighted_normalized_fp32,
weighted_unnormalized_fp32, generalized_fp32}
variance_adjust : bool
Expand Down Expand Up @@ -529,6 +535,8 @@ def ssu_to_file(str biom_filename, str tree_filename, str out_filename,
unifrac_method : str
The requested UniFrac method, one of {unweighted,
weighted_normalized, weighted_unnormalized, generalized,
unweighted_fp64, weighted_normalized_fp64,
weighted_unnormalized_fp64, generalized_fp64,
unweighted_fp32, weighted_normalized_fp32,
weighted_unnormalized_fp32, generalized_fp32}
variance_adjust : bool
Expand Down
Loading

0 comments on commit 0e61d5a

Please sign in to comment.