Scoring Your Prediction on Unseen Data
The performance of deep neural networks can vary substantially when evaluated on datasets different from the training data. This presents a crucial challenge in evaluating models on unseen data without access to labels. Previous methods compute a single model-based indicator at the dataset level and use regression methods to predict performance. To evaluate the model more accurately, we propose a sample-level label-free model evaluation method for better prediction on unseen data, named Scoring Your Prediction (SYP). Specifically, SYP introduces low-level image-based features (e.g., blurriness) to model image quality that is important for classification. We complementarily combine model-based indicators and image-based indicators to enhance sample representation. Additionally, we predict the probability that each sample is correctly classified using a neural network named oracle model. Compared to other existing methods, the proposed method outperforms them on 40 unlabeled datasets transformed by CIFAR-10. Especially, SYP lowers RMSE by 1.83-3.97 for ResNet-56 evaluation and 2.32-9.74 for RepVGG-A0 evaluation compared with latest methods. Note that our scheme won the championship on the DataCV Challenge at CVPR 2023. Source code is avaliabe at https://github.com/megvii-research/SYP.