process management blog posts

Boosting ORC file reading performance in OCI Data Flow and OCI Big Data Service using Spark

Blog: Oracle BPM

Apache Spark has become the go-to big data processing engine for various use cases, including reading and processing ORC files from object storage services like OCI Data Flow and Big Data Service. Performance issues can arise when reading large ORC files from these cloud storage services. In this blog, we explore how we tackled an ORC file reading performance issue and achieved significant improvements by tuning specific Spark configurations.