Why coupling should help with climate model verification but may not in reality
In this politically charged climate, much has been said recently about climate model verification and validation. The two terms are from the software engineering community. The pithy explanation of the difference between the two is usually described in the form of two questions: “Are we building the right system?” (validation) and “Are we building the system right?” (verification).
In the context of coupled climate models, software engineers (at least the ones who are non-scientists) are better equipped to address verification than validation. Typically, verification is considered the more objective process because it has to do with determining if a system’s implemented actually represents the specification. In other words, verification does not seek to evaluate the correctness or quality of the specification itself, but to ensure that the specification is correctly implemented. Verification can be done my analyzing the code itself and comparing it to the specification. Sometimes this is done manually via careful code inspection. To the degree that the specification is formalized (e.g., written in a machine-processable way), the verification step can be automated. For example, a style checker can verify that coding conventions are met. Unit tests and integration tests are also a form of verification. This involves testing individual parts of the code in isolation and combinations of coding units to verify that the behavior (often in terms of inputs and expected outputs) is as expected.
Validation, on the other hand, asks whether the specification itself (through the software) adequately meets the need for which it was written. In the context of climate models, validation is, in many respects, a synonym for “doing science”—that is, in pronouncing a model valid you are saying “the science is good.” The software specification, in this case, is in the form of partial differential equations or parameterizations that describe physical processes. The climate model, then, is an experimental tool allowing the scientist to judge the validity of the specification. I will not cover the gamut of validation techniques here (there are many), but they are all tasks requiring significant scientific expertise. This is not a far cry from validation of “typical” software systems in which the software contractor must rely on the customer to say whether or not the software really addresses the need for which it was built (i.e., user acceptance testing).
With this distinction in mind, most evaluative efforts today are focused on validation (perhaps rightly so) while verification of codes seems to be the more elusive—or at least more neglected—task. (For example, check out chapter 8 of the latest IPCC assessment report on evaluation of climate models. The evaluation techniques are overwhelmingly concerned with validation.)
Well, is this an issue? Isn’t validation enough? I argue no. The basic issue is that without verification, you don’t really know what you are validating. So, both kinds of evaluative techniques are needed. Verification on its own does not say anything about whether the model has scientific validity—only that the model actually implements what the programmer intended. On the other hand, validation by itself is not satisfactory. It raises our confidence level in the model’s skill, but makes it difficult to trace variation in output back to specific parts of the model. This is because if verification is left out, it can be unclear how the original specification of the science is represented in the code.
Okay, before you jump on me for that last paragraph, let me clarify. In fact, verification of climate models is a regular part of the software development process. Pope and Davies (2002) describe some current techniques used to evaluate atmospheric models. In the article, they are all described as validation techniques, but I would argue that many actually fall under verification instead of validation. (I freely admit that the distinction between the two can be fuzzy.) Some evaluation techniques mentioned are: reducing 3D equations of motion to their 2D counterparts and checking against analytical or reference solutions, isolating a single column of the atmosphere to see the effects of physical parameterizations without complex feedbacks to horizontal motions, isolating the dynamics component by prescribing idealized physical forcings, and running idealized “aquaplanet” tests with simplified sea surface boundary conditions.
However, what I notice is that all of these are black-box testing techniques. In other words—does the model return expected outputs based on the given inputs? This is in contrast to white-box testing in which knowledge about the internals of a program unit are exploited to design the tests. Why the focus on black-box testing? The full answer to this is likely a complex one. But, I think a large part of it comes down to this: the sheer size and complexity of climate models makes them very difficult to analyze and comprehend. White-box testing requires intimate knowledge of the code structures, data flow, and control flow of a program. The more complex the program, the harder it is to analyze. Of course, automated tools can be used to analyze the code, but those tools do not have an understanding of the encoded science.
In a very interesting paper, Johannes Lenhard and Eric Winsberg argue that climate models face a form of confirmation holism that makes an analytical understanding of climate models “extremely difficult or even impossible.” My interpretation of this is that the size and complexity of climate model source code is the number one enemy of verification.
On his blog, Steve Easterbrook relates a recent conversation with David Randall at Colorado State in which Randall points out that 1) there is a lack of people who understand how to build climate models and 2) that “climate models are now so complex that nobody really understands the entire model” (quotation from Easterbrook). To back this up, see a previous post about the kinds of questions asked in the CESM forum.
Frankly, not being able to understand an entire model is not that surprising. From my perspective, the inability to have a complete understanding of the entire model might not really be a problem anyway. After all, we probably could not find a single person smart enough to design the wings of airplane, program the flight controller software, and also design the rubber manufacturing process for the landing gear. Nonetheless, we are generally pretty good at putting together airplanes. Why does this work? Because while no one understands all parts of an airplane in intimate detail, the points of interface can be abstracted and comprehended. The flight controller developer might not understand the intimate details of the fuel system, but as long as adequate abstractions are provided, she as no trouble programming the controller.
What can be done to address complexity of climate model source code? Well, believe it or not, I think at least part of the answer lies in leveraging the fact that climate models are built on the notion of coupling. The coupling points represent the points of interaction among the software modules in a climate model. Just as the flight controller designer has an abstracted understanding of the other systems connected to the controller software, the coupling points should represent abstracted interfaces that are semantically clear and help to promote understanding of the model as a whole.
The good news is that most of today’s General Circulation Models (GCMs) and Earth System Models (ESMs) are built in a modular fashion. The modules, which typically represent separate geophysical processes, are composed (coupled) into a single logical application. From the perspective of a software engineer (who likes clean, organized code) this is excellent news! Modular code has a lot of advantages over non-modular code. It is typically easier to maintain. It promotes separation of concerns (grouping related pieces of code together) and it therefore easier to understand. It promotes the definition of interfaces so that modules have well-defined interaction points. For all of these reasons, verifying modular code is much easier than verifying non-modular code. So, the logical conclusion here is that the modular nature of ESM codes should lead to improved ability to do verification.
But, here’s the rub. Coupling interfaces themselves are very complex. Often, it is hard to maintain a completely clean separation between modules. For this reason, the coupling interface is often handled by an entire software component (the coupler) which is itself highly complex (i.e., there is a lot going on inside—data transfer, parallel communications, interpolation, etc.). Furthermore, in any given model there are multiple coupling interfaces, and not all coupling interfaces are the same. They have different properties based on the two modules that are interacting. For example, the interface between the atmosphere and the ocean is one in which each component provides a boundary condition for the other. On the other hand, within the atmospheric component, the interface between the “physics” and “dynamics” components exhibits a different kind of complexity—the dynamics component is primarily concerned with horizontal motions while the physics component deals with parameterizations in the vertical. Furthermore, coupling protocols are often multi-phased even within a single timestep. This is due to complex data dependencies between the coupled components.
So, the challenge is the following. Verification and comprehension of climate models is intimately tied to our ability to represent complex coupling interactions in a manner that can be analyzed for correctness and comprehended readily by model developers. At the same time, performance cannot be sacrificed or codes will be too slow to be useful. We want abstraction and efficiency. We want our cake and we want to eat it, too. Am I asking too much?