gpu_count parameter. If using Huggingface transformers, you
then just need to set the device_map to auto and the model will be automatically distributed across all available GPUs.
While you can select any type of GPU for your multi-GPU deployment, if you are looking for the best performance possible, we recommend using the A100 GPUs as these are connected with NVLINK. This means that the GPUs can communicate with each other much faster than with other GPUs.
You can look at an example of a multi-GPU deployment here