Blog Posts Process Management

Quantize and deploy Llama 2 70B on cost-effective NVIDIA A10 Tensor Core GPUs in OCI Data Science

Blog: Oracle BPM

NVidia A10 GPUs have been around for a couple of years. They are much cheaper than the newer A100 and H100, however they are still very capable of running AI workloads, and their price point makes them cost-effective. With the quantization technique of reducing the weights size to 4 bits, even the powerful Llama 2 70B model can be deployed on 2xA10 GPUs. In this blog post we will show how to quantize the foundation model and then how to deploy it.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/quantize-and-deploy-llama-2-70b-on-cost-effective-nvidia-a10-tensor-core-gpus-in-oci-data-science/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×