Exploitation and Exploration
Blog: Strategic Structures
Allocation of resources
You have discovered a gold deposit. How do you allocate your limited resources? How much should you invest in exploiting the deposit you have, versus the amount spent in looking for new deposits? You don’t know how big the lode is and the amount of gold you can extract from the current mine and you also don’t know if you’ll find a new one. If you put all your efforts and resources in exploiting the current deposit, you might not have sufficient resources left to look for new ones when this one is exhausted. But the next one you find might be bigger or have higher quality. However, it is just as likely that you invest a lot in exploring and you don’t find another deposit at all, so unlike in exploitation, all that resource will be wasted.
Companies face this dilemma all the time. For example, when there are a few successful markets (clients, consumer groups or regions), should the company come up with new products and services to sell to the existing markets, or explore new territories with current (or new) offerings? Should it improve the current technologies or explore new ones? This dilemma is present not only for the marketing and production strategy but in almost every investment decision.
Thus, the basic understanding of exploitation-exploration is as an optimisation problem. And indeed, as such, it is heavily researched. In probability theory, it is known as the multi-armed bandit problem, and there are plenty of optimisation strategies and computer algorithms that were developed in the second half of the last century that can lead to solutions. The applications are wide-ranging, from clinical trials and financial portfolio design to machine learning. Going through all of them is beyond the objective of this post. It is sufficient to say that studying them is useful, as a lot of thinking, mathematics, modelling and experimenting have been invested with impressive results. The discovered patterns, algorithms and strategies can be useful to some extent in certain organizational contexts. But overall, all these studies work with a lot of assumptions and simplifications which don’t hold in real-life situations.
A broader understanding
The exploitation-exploration dilemma is present everywhere in organizations: in strategy, marketing, sales, research, operations and projects. It appears in different guises and is communicated in various terms. Yet it is not very common to hear it explicitly discussed at meetings.
As James March wrote in 1991,
Exploration includes things captured by terms such as search, variation, risk taking, experimentation, play, flexibility, discovery, innovation. Exploitation includes such things as refinement, choice, production, efficiency, selection, implementation, execution. Adaptive systems that engage in exploration to the exclusion of exploitation are likely to find that they suffer the costs of experimentation without gaining many of its benefits. They exhibit too many undeveloped new ideas and too little distinctive competence. Conversely, systems that engage in exploitation to the exclusion of exploration are likely to find themselves trapped in suboptimal stable equilibria. As a result, maintaining an appropriate balance between exploration and exploitation is a primary factor in system survival and prosperity.
Exploration and exploitation compete for resources and so organizations have to make choices. Some of them are explicit, but most are implicit. The explicit choices are seen as decisions made comparing alternatives. A typical example is investment decisions. In comparison, implicit choices are – as March put it – “buried in many features of organizational forms and customs, for example, in organizational procedures for accumulating and reducing slack, in search rules and practices, in the ways in which targets are set and changed, and in incentive systems”.
Working with the exploration-exploitation balance does not only help in seeing the exploitation and exploration patterns in organizational communications and decisions. It also shifts the attitude to what’s going on, away from what is accepted as rational or intuitive.
For example, a quick-learning new employee starts actually contributing to the organization sooner. She does so by being able to absorb the organizational knowledge in a shorter time. That may be good for her and for the organization in the short term, but it might be bad in the long run. When a slow learner joins the organization, it will take longer for him to fit in, but that might actually improve the organizational knowledge and norms. And the same person when well established, will be slow to absorb new knowledge. This would often be a healthy conservatism, as it would reduce the risk of investing in fads, as March pointed out.
Exploration and exploitation in time
Exploitation follows exploration. We first explore the menu, select, order and then consume what we’ve ordered (or what we think we’ve ordered). A pharmaceutical company carries out a lot of experiments to come up with a new formula which will work against a certain disease. These experiments may be futile or fruitful or, in fact, come up with something that does not treat the intended disease but turns out to be useful against another. In the first case, that exploratory path does not end in exploitation, but in the other two, it does, in an expected or an unexpected way. In any case, first comes exploration and then exploitation.
Yet we can see the exploration and exploitation in such a sequence only if we focus on a particular element like choosing the food in a restaurant. But things are connected and they interact all the time. If I go to explore a forest, I can do that by exploiting my shoes. The pharmaceutical company is carrying out experiments exploiting laboratory equipment. Spacecraft explore the universe by exploiting various technologies.
In these examples, exploration and exploitation go in parallel, coordinated, but their object is different. I’m exploring the forest, but exploiting my shoes when doing that, not the forest. The pharmaceutical company and the spacecraft have also different objects of exploration and exploitation. There are some cases, however, when exploration and exploitation can be on the same object and at the same time. And this can work pretty well.
One such case is Twitter. Sharing a tweet using a so-called “re-tweet”, was neither designed nor planned as a feature in the initial releases of Twitter. People simply started tweeting others’ tweets adding “RT” for re-tweet and this was taken up and became viral. Then both Twitter and apps and services in the Twitter ecosystem added a lot of capabilities around RT. It evolved this way because people were exploring Twitter at the same time they were exploiting it. That exploration produced many other ways of using it which did not catch on, or at least not to the point of becoming one of the essential capabilities of the service.
Something similar happens in the mobile apps markets. By the end of 2018, there were over two million apps in each of the two biggest stores. By making it easy for the app writers to release new versions and simple for the app users to install and update, an ecosystem was created that was quite different from the traditional software world. When releasing new apps and features, app writers explore the market while at the same time exploiting it. Each app and each feature work as an almost unbiased market survey. At the same time, they are actual business, actual exploitation, with revenue being generated either by ads or by selling the app.
It’s similar for the users. They don’t know what will fit their needs and preferences. While trying out new apps and features, app users are also using them, in this way exploring and exploiting at the same time.
Depending on the perspective, user or providers, and the zoom level, features, apps, market participants, we can see different dynamics. At the level of a single app, this period of parallel exploration and exploitation evolves into exploitation only, but if we zoom out, we’ll see that they keep running in parallel. While being used to certain apps, users keep exploiting the app market. By utilising new apps (a level up) or new features (a level down), they co-explore with the app writers.
App writers, both individual and companies, compete by improving the existing capabilities and releasing new ones. Some of them develop new apps entering again into a mode of parallel compressed exploration and exploitation. The balance can be seen at the next level as well, the level of the developers (entrepreneurs). New ones come and some grow, others go. The invisible hand of the market produces entrepreneurs who try out new things and stay (exploit) if they are successful and leave if not.
My going to jazz festivals taught me a lesson about the balance between exploration and exploitation. Observing app markets shows what it is to explore while exploiting. But for an example of the latter, I could’ve just stayed at the jazz festival. Jazz improvisation is where exploitation and exploration run in parallel to form a compressed and precarious balance. With classical music, exploration and exploitation are separated in time and space. A composer explores by trying out different harmonies and melodies and later on an orchestra exploits the composed fixed sequence of notes with prescribed length and manner of playing. In a jazz band, composing and performing happen at the same time. Each musician is exploring and exploiting the territory marked by the main theme, their own ideas and others’ inventions and provocations. It takes a long time and hard work to reach that level of awareness, intuition and skill. Musicians put in many years of practice to produce just a few minutes of good unprepared music. The years of preparation, supply jazz musicians both with an arsenal of patterns to use when short of ideas (exploit), and the skills necessary to break out of them (explore).
For organizations, over-exploiting would either exhaust the resource they have or would make them slow to adapt, less competitive and eventually they would be driven out of the market. Over-exploring, on the other hand, would exhaust their own resources. To maintain the balance is always necessary for survival. However, depending on the level of complexity, it might not be sufficient just to maintain it, but how the balance is kept becomes crucial.
In biological evolution, sight might be considered as one of the most impressive achievements. For species that excel in seeing, perception (exploitation) happens together and is complemented by the eye movement (exploration). The evolutionary advantage of not just balancing but bringing closer exploration and exploitation, applies to organizations as well, as the examples of Twitter and app markets show.
The more uncertain the environment, the more exploration and exploitation should follow each other in shorter cycles, and the more organizations need to invent new business models where exploration and exploitation go together or are both characteristics of one and the same activity.
This is an excerpt from the chapter “Exploration and Exploitation” from the forthcoming book “Essential Balances in Organizations”.
Thanks to Rob Worth for reviewing and improving the text.