Using Chemical Language transformer model for Molecular Property prediction regression tasks and visualize attention weights : Part 1

Came across the <a href="https://huggingface.co/DeepChem/ChemBERTa-77M-MTR" rel="noopener ugc nofollow" target="_blank">ChemBERTa-77M-MTR</a> at Hugging face looks like it's pre-trained on 77M molecules. <a href="https://huggingface.co/DeepChem/ChemBERTa-77M-MTR" rel="noopener ugc nofollow" target="_blank">ChemBERT</a> is a large-scale pre-trained molecular transformer model based on the BERT architecture, specifically designed for tasks in chemistry, drug discovery, or materials science. The model can be fine-tuned for specific tasks, such as property prediction, molecular generation. While doing this fine-tuning for sometime with max_length , batch size and epochs I came with some good scores and these models fit comfortably well with traditional descriptor type models and performs better. Below are some of the results of some datasets i used from TDC which shows these fine-tuned models could be leaders in the TDC benchmark tables. I am quite impressed with the results what I have got, and certainly it has a good potential to consider one of the methods for SA Prediction. I have merged the train and valid dataset together and these are the test set results. <a href="https://pharmanalytics.medium.com/using-chemical-language-transformer-model-for-molecular-property-prediction-regression-tasks-5d617aba4639">Read More</a>