Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision

Xin Juan, Kaixiong Zhou, Ninghao Liu, Tianlong Chen, Xin Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 308-318

Abstract


The premise for the great advancement of molecular machine learning is dependent on a considerable amount of labeled data. In many real-world scenarios the labeled molecules are limited in quantity or laborious to derive. Recent pseudo-labeling methods are usually designed based on a single domain knowledge thereby failing to understand the comprehensive molecular configurations and limiting their adaptability to generalize across diverse biochemical context. To this end we introduce an innovative paradigm for dealing with the molecule pseudo-labeling named as Molecular Data Programming (MDP). In particular we adopt systematic supervision sources via crafting multiple graph labeling functions which covers various molecular structural knowledge of graph kernels molecular fingerprints and topological features. Each of them creates an uncertain and biased labels for the unlabeled molecules. To address the decision conflicts among the diverse pseudo-labels we design a label synchronizer to differentiably model confidences and correlations between the labeling functions which yields probabilistic molecular labels to adapt for specific applications. These probabilistic molecular labels are used to train a molecular classifier for improving its generalization capability. On eight benchmark datasets we empirically demonstrate the effectiveness of MDP on the weakly supervised molecule classification tasks.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Juan_2024_CVPR, author = {Juan, Xin and Zhou, Kaixiong and Liu, Ninghao and Chen, Tianlong and Wang, Xin}, title = {Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {308-318} }