Why Oracle Brings Hundreds of Its Machine Learning Experts Together Each Year
Blog: Oracle BPM
The machine learning expertise at Oracle isn’t concentrated in a single department or R&D effort, it’s dispersed all over the world. That was made clear recently when more than 250 of the company’s experts spent three days together at Oracle headquarters for a machine-learning summit. It took an hour just to get through a lightning-round of introductions at one minute per team.
“We represent the oldest profession in the world,” joked one engineer introducing his team. “I’m talking about construction, the tools that help the pharaohs build the pyramids.” Seemingly every vertical market, from healthcare to hospitality, was represented—a sign that machine learning is increasingly woven into software, and not treated as a standalone capability. People described machine learning efforts in products for measuring household electricity consumption (OPower personalized banking. And p
Are Businesses Ready for ML?
The mission of building a machine learning community dates back to a request in 2011 from Oracle Chief Architect Edward Screven, said Stephen Green, d. Earlier summits focused on search and natural language processing. Today, the tech industry has evolved beyond how to retrieve information, and now it’s all about interpretation. When he finds new data scientists, he asks them how they’re “operationalizing” ML: How do they annotate the data? How do they build, deploy, measure, and update their models? What tools do they use? Above all, however, Green’s discussions are less about tools and tactics and more about what new problems teams are solving using ML.
Green saidWe don’t want to play Go really well, because playing Go is not important to our customers.”
Below is just a taste of the topics and experiences Oracle’s machine learning experts embraced at this year’s summit.
Safe Data Practices
As machine learning becomes more central to every type of application, developers and data scientists need data—and lots of it, to test algorithms at scale. There’s a risk, however: Those records must be obtained and used with appropriate permissions. There are open source data options, but these often come with reciprocal licenses such as copyleft, which requires all code using the data be released as open source. Web scraping is another popular approach, but it too must be done with care not to violate terms of service (Twitter forbids scraping, for example), circumvent paywalls or robots.txt instructions, or even overload site servers with traffic.
“Data today is a lot like how open source software was when it first came out, and everyone thought, ‘This is great, free software—let’s use it,’” said “That’s where data is headed. It may appear to be free, but it’s not actually free.”
That’s why more than five years ago, Oracle Labs created an in-house research ata repository that teams can use to build models. Under the leadership and authority of Craig Stephen, head of Oracle Labs, this specially secured in-house repository contains large data sets suitable for a variety of challenging ML projects. All data sets in the repository are carefully tracked and managed by a rigorous process that includes vetting for IP and privacy concerns, and are made available internally for uses that are consistent with those requirements.
Java for ML
“Python is primarily used in machine learning as a way to drive libraries that are written in native code,” said Mark Reinhold, chief architect of the Java Platform Group at Oracle. “As soon as you need to do something for which you don’t have a good native language library, you’re stuck. You have to buckle down and write native functions in C or C++ or assembly code.”
That’s about to change, Reinhold said, thanks to four ongoing projects in the OpenJDK Community: Panama, Loom, Amber, and Valhalla. Panama makes it easier to call native functions and access native data from Java programs. Loom simplifies Java’s signature emphasis on concurrent programming via threads by introducing fast, low-footprint “fibers”. Amber brings in pattern-matching, among other features. And Valhalla aims to enable Java programmers to create data structures just as efficient as those that they can create in native languages such as C and C++.
“Java is a broad-spectrum language for the working programmer, and it’s already moving toward being a better foundation for ML than Python,” he said.
A Never-Ending Story
A fun exercise for future summits: Add up the decades of experience present in the room. Reinhold told stories of his early days working on Java and some of the design choices that were made, while Craig Stephen reminded the conference attendees that, “We have been doing this for a long time—we’ve been learning from data for decades.” He pointed to Kenny Gross, who has a and published a paper on power plant monitoring and fault detection in the late 1990s. Gross still works at Oracle, where he
The goal for the three-day event was simple—make connections—and based on the hallway conversations and poster sessions, it was met. That pleased Stephen, who reminded everyone that technology doesn’t evolve without talent: “What’s the most expensive thing about this conference? It’s your time.”