Skip to content

yuxi-TJU/Farthest-Point-Sampling-in-Chemical-Feature-Space

Repository files navigation

Farthest Point Sampling in Chemical Feature Space

Our research introduces the farthest point sampling (FPS) strategy within targeted chemical feature spaces to generate well-distributed training datasets. This approach enhances model performance by increasing the diversity within the training data's chemical feature space. We rigorously evaluated this strategy across various ML models – including artificial neural networks (ANN), support vector machines (SVM), random forests (RF) etc. – using datasets encapsulating key physicochemical properties. Our findings demonstrate that FPS-based models markedly outperform those trained via random sampling in terms of predictive accuracy, robustness, and a notable reduction in overfitting, especially in smaller training datasets.

Fig 1

A graphic illustration of the farthest point sampling in chemical space

Fig 2

MSE compared between FPS and RS

Fig 3

MSE compared by sampling in different chemical space

Fig 4

Heatmap of MSE for different machine learning model

Fig 5

MSE for different physicochemical datasets

Fig 6

t-SNE distributions for FPS and RS

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages