Abstract: Pre-trained code models are essential for various code intelligence tasks. Yet, their effectiveness is heavily influenced by the quality of the pre-training dataset, particularly ...