Security, privacy, and cloud: 3 examples of why research matters to IT
When you’re busy running around putting out fires, it’s easy to dismiss “research” as something that may be interesting for university professors and their students but doesn’t exactly merit bandwidth from a busy IT professional. While it’s almost certainly true that it shouldn’t be a primary focus, I hope to convince you that it deserves at least a little bit of your attention.
Previously, I’ve written about why quantum computing in general and quantum-resistant cryptography in particular, even in their early stages, are of more than academic interest to anyone charting the future course of a technology-focused organization. Here, I’m going to take you through a few of the forward-looking topics covered in the newest Red Hat Research Quarterly issue and connect them to challenges that IT professionals face today.
[ How can automation free up more staff time for innovation? Get the free eBook: Managing IT with Automation. ]
1. Security usability
The cryptography that underpins much of software security is critical and is certainly the subject of a great deal of ongoing research. The issue even contains an article by Vojtěch Polášek that describes research into transforming easy to remember passwords into secure cryptographic keys using derivation functions. However, of perhaps more immediate interest to IT pros is Martin Ukrop’s usability research.
For the past few years, Ukrop, a PhD candidate at the Centre for Research on Cryptography and Security at Masaryk University in the Czech Republic, has conducted experiments at the DevConf.cz open source event. These experiments revolve around X.509 certificates, their generation, validation, and understanding. Ukrop explains this focus: “Nowadays, most developers need secure network connections somewhere in their products. Today, that mostly means using TLS [Transport Layer Security], which, in turn, most likely means validating the authenticity of the server by validating its certificate. Furthermore, it turns out that understanding all the various quirks and corners of certificate validation is far from straightforward. OpenSSL, one of the most widely used libraries for TLS, has almost 80 distinct error states related only to certificate validation.”
One experiment, conducted in 2018, which would likely be relevant to many developers, involved investigating how much developers trust flawed TLS certificates. They were presented with certificate validation errors, asked to investigate the issue, assess the connection’s trustworthiness, and describe the problem in their own words. Ukrop’s conclusion was that some certificate cases were overtrusted. For example, about 20 percent of the participants considered both a self-signed certificate and one with violated name constraints as “looking OK” or better; most security professionals would disagree.
Ukrop’s work aims to improve security usability for developers; the work in progress can be found at https://x509errors.org. However, in the meantime it suggests that training developers to better deal with certain types of security errors might have a good payoff.
2. Data sharing and privacy preservation
Another area of interest to IT leaders, which I’ve written about previously, relates to the complications associated with balancing data sharing needs with privacy protection. That was the topic of an interview that Sherard Griffin, a director at Red Hat in the AI Center of Excellence conducted with James Honaker and Mercè Crosas of Harvard University. Honaker is a researcher at the Harvard John A. Paulson School of Engineering and Applied Sciences, while Crosas is Chief Data Science and Technology Officer at Harvard’s Institute for Quantitative Social Science.
Griffin lays out a common challenge faced by many organizations including his own. “The datasets we needed from a partner to create certain machine learning models had to have a fair amount of information. Unfortunately, the vendor had challenges sharing that data, because it had sensitive information in it.” In Harvard’s case, it is a challenge they face with Dataverse, which Crosas describes as “ a software platform enabling us to build a real data repository to share research datasets. The emphasis is on publishing datasets associated with research that is already published. Another use of the platform is to create datasets that could be useful for research and making them available more openly to our research communities.”
Harvard’s approach to guaranteeing individual privacy when a shared dataset like Dataverse is exposed to researchers: Use differential privacy. It’s a relatively new technique which came out of work primarily by Cythia Dwork in 2006 but is starting to see widespread use, including by the US Census Bureau in 2020. So it’s certainly not of just academic interest at this point.
Differential privacy works by adding a small amount of noise sufficient to drown out the contribution of any one individual in the dataset. Making it harder to tease out individual data points from an aggregated set isn’t a new thing of course. The difference is that differential privacy approaches privacy guarantees in a mathematically rigorous way.
As Honaker puts it: “The point is to balance that noise exactly [between making the data useless and exposing individual data points]; that’s why the ability to reason formally about these algorithms is so important. There’s a tuning parameter called Epsilon. If an adversary, for example, has infinite computational power, knows algorithmic tricks that haven’t even been discovered yet, Epsilon tells you the worst case leakage of information from a query.” Some of the ongoing research in this area involves the tuning of that parameter and dealing with cases where that parameter can get “used up” by repeated queries.
[ Check out our primer on 10 key artificial intelligence terms for IT and business leaders: Cheat sheet: AI glossary. ]
3. Open source cloud operations
The final topic that I’ll touch on here is AIOps, which Red Hat’s Marcel Hild researches in the Office of the CTO. This emerging area recognizes that open source code is only a part of what’s needed to implement and operate services based on that code. Hild argues that: “We need to open up what it takes to stand up and operate a production-grade cloud. This must not only include architecture documents, installation, and configuration files, but all the data that is being produced in that procedure: metrics, logs, and tickets. You’ve probably heard the AI mantra that ‘data is the new gold’ multiple times, and there is some deep truth about it. Software is no longer the differentiating factor: it’s the data.”
Hild acknowledges that the term “AIOps” can be a bit nebulous. But he sees it as meaning “to augment IT operations with the tools of AI, which can happen on all levels, starting with data exploration. If a DevOps person uses a Jupyter notebook to cluster some metrics, I would call it an AIOps technique.” He adds that “the road to the self-driving cluster is paved with a lot of data — labeled data.”
Fittingly, much of this research is itself taking place in the open, such as with the evolving open cloud community at the Mass Open Cloud. “ All discussions happen in public meetings and, even better, are tracked in a Git repository, so we can involve all parties early in the process and trace back how we came to a certain decision. That’s key, since the decision process is as important as the final outcome. All operational data will be accessible, and it will be easy to run a workload there and to get access to backend data,” writes Hild.
To read more about these examples, read back issues, or sign up for a complimentary subscription to Red Hat Research Quarterly (print or digital).