Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Tian Xia b36334cb8e | 3 months ago | |
---|---|---|
.. | ||
README.md | 8 months ago | |
falcon.yaml | 3 months ago | |
train.py | 8 months ago |
This README contains instructions on how to use SkyPilot to finetune Falcon-7B and Falcon-40B, an open-source LLM that rivals many current closed-source models, including ChatGPT.
Install the latest SkyPilot and check your setup of the cloud credentials:
pip install git+https://github.com/skypilot-org/skypilot.git
sky check
See the Falcon SkyPilot YAML for training. Serving is currently a work in progress and a YAML will be provided for that soon! We are also working on adding an evaluation step to evaluate the model you finetuned compared to the base model.
Finetuning Falcon-7B
and Falcon-40B
require GPUs with 80GB memory,
but Falcon-7b-sharded
requires only 40GB memory. Thus,
ybelkada/falcon-7b-sharded-bf16
.tiiuae/falcon-7b
and tiiuae/falcon-40b
.Try sky show-gpus --all
for supported GPUs.
We can start the finetuning of Falcon model on Open Assistant's Guanaco data with a single command. It will automatically find the available cheapest VM on any cloud.
To finetune using different data, simply replace the path in timdettmers/openassistant-guanaco
with any other huggingface dataset.
Steps for training on your cloud(s):
In train.yaml, set the following variables in envs
:
OUTPUT_BUCKET_NAME
with a unique name. SkyPilot will create this bucket for you to store the model weights.WANDB_API_KEY
to your own key.MODEL_NAME
with your desired base model.Training the Falcon model using spot instances:
sky spot launch -n falcon falcon.yaml
Currently, such A100-80GB:1
spot instances are only available on AWS and GCP.
[Optional] To use on-demand A100-80GB:1
instances, which are currently available on Lambda Cloud, Azure, and GCP:
sky launch -c falcon -s falcon.yaml --no-use-spot
For reference, below is a loss graph you may expect to see, and the amount of time and the approximate cost of fine-tuning each of the models over 500 epochs (assuming a spot instance A100 GPU rate at $1.1 / hour and a A100-80GB rate of $1.61 / hour):
ybelkada/falcon-7b-sharded-bf16
: 2.5 to 3 hours using 1 A100 spot GPU; total cost ≈ $3.3.
tiiuae/falcon-7b
: 2.5 to 3 hours using 1 A100 spot GPU; total cost ≈ $3.3.
tiiuae/falcon-40b
: 10 hours using 1 A100-80GB spot GPU; total cost ≈ $16.10.
Q: I see some bucket permission errors sky.exceptions.StorageBucketGetError
when running the above:
...
sky.exceptions.StorageBucketGetError: Failed to connect to an existing bucket 'YOUR_OWN_BUCKET_NAME'.
Please check if:
1. the bucket name is taken and/or
2. the bucket permissions are not setup correctly. To debug, consider using gsutil ls gs://YOUR_OWN_BUCKET_NAME.
A: You need to replace the bucket name with your own globally unique name, and rerun the commands. New private buckets will be automatically created under your cloud account.
No Description
Python SVG Shell Markdown HTML other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》