Skip to content

Latest commit

 

History

History
27 lines (14 loc) · 1.72 KB

File metadata and controls

27 lines (14 loc) · 1.72 KB

Farthest Point Sampling in Chemical Feature Space

Our research introduces the farthest point sampling (FPS) strategy within targeted chemical feature spaces to generate well-distributed training datasets. This approach enhances model performance by increasing the diversity within the training data's chemical feature space. We rigorously evaluated this strategy across various ML models – including artificial neural networks (ANN), support vector machines (SVM), random forests (RF) etc. – using datasets encapsulating key physicochemical properties. Our findings demonstrate that FPS-based models markedly outperform those trained via random sampling in terms of predictive accuracy, robustness, and a notable reduction in overfitting, especially in smaller training datasets.

Fig 1

A graphic illustration of the farthest point sampling in chemical space

Fig 2

MSE compared between FPS and RS

Fig 3

MSE compared by sampling in different chemical space

Fig 4

Heatmap of MSE for different machine learning model

Fig 5

MSE for different physicochemical datasets

Fig 6

t-SNE distributions for FPS and RS