Blog Posts Process Management

Running PyTorch distributed data parallel jobs on OCI GPU cluster

Blog: Oracle BPM

Oracle Cloud Infrastructure (OCI) superclusters consisting of powerful NVIDIA GPUs and low-latency, high-bandwidth RoCE v2 networks provides an ideal platform for high performance computing (HPC) and machine learning (ML) workloads. In this blog, we show how easy and versatile it is to use the preinstalled SLURM from OCI cluster network solution to run PyTorch distributed data parallel jobs on GPU instances.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/running-pytorch-distributed-data-parallel-jobs-on-oci-gpu-cluster/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×