De-coupling the Science and Infrastructure in Coupled Models
Coupled models are created to test and explore scientific theories. However, in the process of building coupled models, much code must be written (or borrowed) that does not directly implement the underlying science, but implements technical capabilities in support of the science. These two kinds of code are often called the “science” and the “infrastructure” and there is a high amount of inter-dependence between the two. You could say the two are tightly coupled.
Here’s a thought experiment: How do we separate the science code from the infrastructure code? Why should we do this? Well, because we want scientists to focus as much as possible on their field of expertise, not on the accidental aspects of writing code. Can this really be done? Maybe, maybe not.
But for now, let’s assume we could distinguish the infrastructure from the science and actually separate them to a large degree. One way to approach this is to generate the infrastructure and let the scientists write the rest. There are at least two ways to think about doing this:
- Write “infrastructure-neutral” code using only plain programming language constructs and wrapping and/or adapting that code to whatever infrastructure is desired. This approach is appealing because it minimizes the software dependencies of the scientific code–ideally, the only dependency would be a standard compiler. (Note the contrast with the SIDL/Babel work which attempts to divorce the science from the programming language.) While this approach does not automatically guarantee interoperability, it does at least ensure that two potentially coupled models do not have incompatible dependencies (e.g., one uses coupling framework A, another uses coupling framework B and there are no readily available adapters that translate between A and B). A pithy, but not entirely accurate statement made an the recent coupling workshop sums it up this way: “the programming language is the coupling framework.” External metadata plays an important role here because the code generator needs to know how to interact with the existing code–that is, how the infrastructure should be woven in around the science code.
- Write code using high level abstractions that can be mapped onto specific infrastructure implementations. The high level abstractions could be formalized in a domain specific language. The language constructs must be general enough to apply to all target infrastructure implementations. (Could Jay Larson’s paper on principles for coupling be a start for defining the set of constructs?)
A major distinction between the above two approaches is what is formalized. For the “infrastructure-neutral” approach, the external configuration metadata must be formalized. For the “high level abstractions” approach, the abstractions themselves must be formalized. At some point the two actually blur together. As pointed out in my previous post, the first approach corresponds largely with the philosophy behind BFG. The second approach, to my knowledge, has not been tried.
Kinds of Infrastructure Code
It is helpful to divide up what kinds of code might be considered “infrastructure.” A high level breakdown is the Superstructure/Infrastructure distinction from ESMF and FMS. In this case the term “Infrastructure” takes on a more specific meaning than how I have been using it previously, so I’ll use a capital “I” when speaking of this more specific kind of infrastructure.
The Superstructure represents the larger architectural pieces that are incorporated into a coupled ESM. These are the high level structures that we expect the ESM users and developers to refer to. These are the structures that have names, are connected together, and have interfaces. This is closely related to the software architecture, although it will be helpful to think of the architectural pieces as having some domain-level semantics, not just arbitrary software modules. We want the Superstructure to have high visibility.
The Infrastructure represents capabilities commonly required when building coupled ESMs. This includes things like grid-to-grid interpolation, parallel domain decomposition, and time management. Interestingly, these represent things that we want to to have low visibility–that is, we want them to be there working in the background, providing utility functionality, but they should not be too prominent or consume a substantial amount of development time. To be sure, they are highly important, but they are a distraction from the real work.
What is the relationship between the two? I think it’s a hard question to answer, but here’s one take. The Superstructure can be viewed as an information-broker for the Infrastructure. In other words, the Superstructure provides the information that the Infrastructure needs to do its job. Let me be more concrete here by giving an example. One of the primary functions of ESM couplers is to communicate data from one model to another. Of course this may be handled in a number of different ways (memory-to-memory copies, message passing, argument passing). The data communication utility is part of the Infrastructure. However, this function must first know something about where the data is coming from and where it is going. Where the source data is located and how it is encapsulated is part of the Superstructure, as is the location of the target and how the delivered data should be encapsulated.
How is information communicated from the Superstructure to the Infrastructure? Either the Infrastructure queries the Superstructure (pull) or the information is provided to the Infrastructure (push) when the Infrastructure is invoked. If the Superstructure and Infrastructure are part of the same package, then the relationship between the two can be hard coded into a reusable framework (i.e., the Infrastructure can assume that a certain Superstructure is there). In this case, the Infrastructure can pull architectural information from the Superstructure. On the other hand, if the Superstructure is separate from the Infrastructure (e.g., because the code has been architected separately), then the knowledge required by the Infrastructure must be parameterized and passed (pushed) to the Infrastructure.
To generalize the notion of Superstructure, we can identify a coupling technology as either an architecture provider or architecture neutral. Architecture providers inform and/or constrain high-level structural aspects of the coupled model. These technologies are an example of architectural reuse. How does this reuse occur? A common method is for the coupling technology to provide a set of abstract classes with interfaces that the user implements. The architecture, therefore, is encoded into the abstract classes and their predetermined interactions (this is the basic idea behind object-oriented frameworks).
Architecture neutral coupling technologies say little about what the high-level components should be and what kinds of interfaces they should have. This means that the coupling technology cannot make architectural assumptions about the constituent models, but must instead be informed of relevant architectural characteristics using some external mechanism.
Another way to break down infrastructure code is by what requirement or capability the infrastructure code fulfills. The coupling technologies feature model communicates the kinds of features that coupling infrastructures support. (Arguably, “infrastructure” is much broader than “coupling” although at the current time there is a blurring of the two–for example, the coupling technology may provide a haloing operation which is used within the bounds of a single model. It is not a “coupling” task per se, but handling the halo operation here is natural because some of the same abstractions used for data transfer can also be exploited for haloing.) Using the feature model as a guide, we can see several kinds of infrastructure capabilities:
- Data transfer — via message passing, subroutine arguments, shared memory, etc.
- Interpolation (regridding) — these all have to do with the fact that models often have different grids. This includes weight generation and global conservation.
- Accumulation and averaging in time — because coupling frequency differs from model timesteps
- Domain discretization — breaking up the physical domain into grid cells
- Domain decomposition — distributing those grid cells onto computing resources
- Driving — moving the whole coupled model forward in time, scheduling execution order, handling concurrency
- Setup and configuration — preparing for a run
- I/O — generally considered separate from the coupling technology, but clearly a part of the infrastructure
What is involved in generating these parts of the infrastructure? To answer this, we must first ask how each of these are manifest in a coupled model. It would also nice to know which ones are dependent on which other ones. How localized is the capability (e.g., can the capability be implemented in one location in the code, or is it dispersed throughout the code)? What is the relationship of the science code to each of these? If that relationship is deep, then it will be harder to tease out the infrastructure. For example, scientific constraints may determine the sequencing of the components. But, since defining the sequence is a common requirement among all coupled models, we’d like to pull out as much of it as possible (perhaps representing it declaratively) while still respecting that the sequence is determined by the science–it’s not an arbitrary technical decision. Domain discretization is also a common requirement in coupled models, so we’d like to offload it as much as possible to reduce the number of custom implementations. However, the field calculations at each timestep are completely dependent on the discretization method. Herein we see the tightly coupled nature of the infrastructure and science parts of the code.
To get a better handle on how to de-couple the infrastructure from the science, I’d like to do an analysis of how these various infrastructure pieces are represented in real models and characterize the relationship between the infrastructure and the science parts of the code. This is a beast of a task, frankly, because the models are so large and complex. I’m not sure if the analysis could be automated in any way. However, the results could be very interesting and might open the door to better separation of concerns and improved composition of coupled models.