Foundation models in biological and chemical domains

With hundreds of large biological and chemical models being developed, it seems the field has achieved a lot. However, as the review pointed out, this area is still in its nascent stage. The authors highlight a few challenges, such as the lack of large-scale and high-quality training data, integration of domain-specific information into the model architecture, and reliable computational and experimental evaluation. The lack of high-quality data is probably the most critical issue. After all, your results are only as good as your data.

By the way, the review unfortunately missed one important data type: RNA sequences (or transcriptomic data). Many RNA foundation models can be found in this RNA-FM GitHub repository. These models could be very useful for understanding gene functions, identifying drug targets, predicting RNA structures and RNA-protein interactions, and designing RNA-based therapeutics.

Visit Now