Software Product Line or Not?
Many climate models are designed to be configured in different ways in order to support the different scientific requirements of a wide range of researchers. Because of this, climate models are akin to software product lines (SPLs) which can produce a range of individual products from the same codebase. (This observation and other very interesting differences between software engineering of climate models versus other kinds of software can be seen in a post on Steve Easterbrook’s blog.)
But, how deep does this analogy go? In a recent conversations with developers of CESM, it was pointed out to me that they did not consider their software to be a product line–at least not in the formal sense. Why is this? Well, there are some good reasons to consider a model like CESM a product line, but also a few places where the SPL analogy will break down. Where am I going with this? Well, to the degree that current thinking about SPLs jives with the software processes of climate modelers, then SPL methodologies should be exploited. But, where the formal methodologies do not mesh, we need to re-think how SPLs are designed and implemented if we wish to make them useful tools for computational scientists.
In many respects, state-of-the-practice climate models have indeed evolved to resemble software product lines:
- A key motivation for developing software product lines is the savings achieved through increased software reuse: instead of writing software from scratch, new systems are produced from a basis of core assets in a preplanned way. This is clearly the case for climate models which may be accessed by hundreds of scientists running different kinds of experiments with the software.
- Product development is the activity of turning out product family members that meet a given set of requirements. The codebase of a single climate model may be used in many different configurations. In most cases, complex (and often complicated) shell scripts are used to configure the core assets into an executable simulation that meets the scientist’s requirements.
- Software product lines are often not created in a linear fashion—that is, during product development of individual family members, there is a strong feedback loop leading to modification of the core assets and possibly creation of new core assets. Likewise, as climate model developers code and validate new science into the models, the changes are eventually incorporated back into the master codebase so they are available to a wider community. The codebase may be extended with small changes to existing components or new components entirely.
There are some aspects of SPLs in that are “on the fence” with respect to their applicability to climate models:
- Software product lines include a set of core assets, primarily in the form of reusable software components that may be customized and assembled into multiple configurations. Climate models today are built by coupling components that represent geophysical processes, and many of these components have been designed for reuse in multiple contexts. For example, many components that participate in coupled simulations may also be run in standalone mode. The CESM atmospheric component CAM, may be run in a coupled mode or as an independent application. In addition to the major coupled components, new features added to the codebase are introduced in such as way that they can be “turned off” so that previous configurations can be reproduced. What is not clear is if components (or whatever we call the units of reuse) have been (or should be?!) designed in the same robust manner as the core assets in a traditional SPL. If, for example, a scientists is testing a new theory, then it seems wasteful to spend much time up front designing the very robust, loosely coupled components that you might find in a mature SPL.
- SPLs require a common software architecture that will adequately support the requirements of the individual products that will come from the product line. One source of architecture in climate models is the coupling framework (e.g., OASIS or ESMF) which supplies at least a coarse-grained architecture for the major components. It is left to be seen whether the coupling framework provides (or should provide) the architecture for more fine-grained types of variation in the models. For example–can the same variation mechanism be used for selecting dynamical cores as is used for selecting microphysics as it used for selecting which chemical species to track? Again, to Easterbrook’s blog, we see that (1) the software architecture of climate models to a large degree mimics the “real-life” architecture of the interconnected geophysical processes being modeled, and (2) that complexity of interactions, the need for tight coupling, and performance concerns make it impractical to define the highly decoupled, modular architecture that seem most natural for a SPL.
There are at least a couple other distinctions between a formal SPLs and state-of-the-practice climate models:
- Product line scoping is the process of defining which products the product line is capable of producing. The scope may be defined in terms of the features that are common to all products, the features that may vary, and how they may vary. On the one hand, precisely scoping scientific models may actually be a disadvantage because the nature of computational science is such that all of the software requirements often cannot be enumerated up front. Therefore, scoping a climate model may actually prematurely limit its possible uses. On the other hand, there is advantage to explicitly defining the current scope of an existing climate model product line in that it provides a plain, unambiguous, declarative listing of what features can vary among all the possible members of the product line. Without an explicit scope, determining the set of variation points requires carefully studying the model’s configuration files, scripts, and documentation (if any). (In a previous post, I wrote about using feature modeling to visualize the many configuration options of CAM.)
- Formal software product lines include a production plan that describes how to build concrete products from the core assets. The production plan serves as a guide that describes how variations in individual products should be achieved, for example, by selecting certain components or by parameterizing components in some way. Designing such a plan for an ever-evolving research tool seems daunting at best, especially if components are designed by different labs with different plans for the components. If development of a formal production plan were undertaken, it might change so frequently as to bring its usefulness into question.
I think the bottom line is that textbook SPL methodologies cannot be applied to climate model development, although we would like to reap the benefits of SPL-like thinking in the climate modeling community. What kinds of re-thinking we need to do before we can realize those benefits?