The Falcon LLM Landing in the Snowflake Data Cloud

<p><strong><em>Opinions expressed in this post are solely my own and do not represent the views or opinions of my employer. I am using feature that are in Private Preview at the time of writing.</em></strong></p> <p>With the launch of&nbsp;<a href="https://www.snowflake.com/blog/snowpark-container-services-deploy-genai-full-stack-apps/" rel="noopener ugc nofollow" target="_blank">Snowpark Container Services</a>&nbsp;in Private Preview, any code or software that can be containerized can be run within Snowflake Data Cloud. This includes the utilization of GPUs from NVIDIA to run and fine-tune LLMs.</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:630/1*Hx39zGePpgpBHDHjvK0yVw.png" style="height:346px; width:700px" /></p> <p>Snowflake Platform for LLMs</p> <p>The rise of LLMs has increased the concern of prompting or fine-tuning them with specific business domain data that may leave the boundaries of enterprises for that purpose. At the same time, more and more open source LLMs are being available which can be fine-tuned for specific domains. Using these base models, with their own company data opens lots of new possibilities.</p> <p><a href="https://huggingface.co/blog/falcon" rel="noopener ugc nofollow" target="_blank"><strong>Falcon</strong></a>&nbsp;has been made available via Huggingface and therefore is now available to be used within Snowflake. This open model brings capabilities that may compete with close-source models. There are two base models, Falcon-40B and Falcon-7B that we will be using in this post.</p> <p>My colleague Justin also showed how to similarly use&nbsp;<a href="https://www.snowflake.com/blog/running-llama-llm-snowpark/" rel="noopener ugc nofollow" target="_blank">Llama v2</a>&nbsp;and Karuna how to use&nbsp;<a href="https://medium.com/snowflake/training-tuning-and-running-nvidia-nemo-llms-on-snowflake-c2616bfbb1bc" rel="noopener">NeMo</a>&nbsp;within Snowflake. This article centers on Falcon and how to fine-tune within Snowflake, saving and registering the model for reuse once it has been fine-tuned.</p> <p>Snowpark Container Services enables the creation of compute pools with GPUs of different families. We are going to create one pool using GPU_7 family for this Falcon LLM fine tuning:</p> <p><a href="https://medium.com/snowflake/the-falcon-llm-landing-in-the-snowflake-data-cloud-ced79c0a6c49">Website</a></p>