Coupling Workshop – First Round Table
Here I want to distill my notes from the first round table discussion from the recent coupling workshop CERFACS. The official topic on the program is “How do the different coupling technologies fit the different application needs and constraints?” but most of the discussion focussed on interoperability and the scope of couplers. From my perspective it was a very interesting discussion and many important points were made. The discussion oscillated between deeply technical points and broadly philosophical questions–the two interplaying off of each other. My notes in raw form are available on the coupling workshop wiki. Here, I want to distill them into several main discussion areas.
Interoperability and Interfaces
Much of the discussion focussed on interoperability. This is a natural question due to the proliferation of coupling technologies and philosophies out there. To what degree do these technologies interact with each other? How much effort is required to link components that use different coupling technologies? During the discussions, it was widely recognized that achieving interoperability is about defining interfaces. The word “interface” is a funny word. Like other words used in this community (e.g., “loose” vs. “tight” coupling), it has dual meanings: scientific interface and technical interface. There have been efforts to standardize scientific interfaces–that is, the fields that are exchanged between model components (and what they mean) and some decisions about which model components are doing which calculations and how. Initially, the PRISM project wanted to standardize scientific (or “physical”) interfaces for each of the model components. However, it was later found that while standardization of the technical (software) interfaces was possible, it was difficult for the geoscience community to settle on standardized scientific interfaces. Now, standardization at the scientific level is not a high priority within PRISM because there is widespread recognition that those interfaces will eventually change.
OpenMI is another example of a technology focussed on standardizing technical interfaces but not scientific interfaces (see workshop talk by Stef Hummel). This standardization has been achieved by requiring “minimal” interfaces–that is, OpenMI is not really concerned with the content of information exchanged between components. It is general enough that it can be adopted by a fairly wide range of applications. The tradeoff here is that components sometimes require certain a priori knowledge about each other that is not specified as part of the interface (e.g., a way to specify grid staggering is not part of the OpenMI standard). Technically speaking, components should only have explicit context dependencies (to increase compositionality) but a practical choice was made by OpenMI to simplify the interfaces and encourage greater adoption. To borrow concepts from the Earth System Modeling Framework, OpenMI provides a “superstructure” (i.e., overall component architecture with standardized interfaces) but does not provide an “infrastructure” (i.e., tools for interpolation, time management, domain decomposition, etc.). OpenMI does not support HPC applications and OpenMI applications typically employ much less parallelism than the big geoscience models.
The notion of “plug & play” was also central to the discussion. Although no one formally defined it, there was general understanding that this is the ability to exchange one model component for another (similar) component with minimal or no changes to the rest of the application. By and large, OpenMI has achieved plug & play with the two caveats already mentioned: support for less complex models and the need for components to have extra knowledge about each other. Even for the “big” HPC-style models, there was agreement (but I don’t think consensus) that a kind of plug & play could be achieved at the level of the technical interface. In fact, existing technologies such as OASIS already do this. However, the real issues with plug & play crop up at the scientific level. An example was given concerning a chemistry module coupled to an atmospheric component. In this case an explicit coupler is not necessarily used. This kind of coupling is “high sensitivity” meaning that even if a new atmosphere was easily “plugged in,” there would be a substantial amount of recalibration required before your model would produce good results. So, while plug & play capability is a respectable goal, the hard work still remains in making sure the resulting synthesis makes a valid scientific whole. Whether or not scientific “plug & playability” is achievable is still an open question. At the very least, making progress toward this goal would require making scientific choices more explicit in the models.
If defining explicit scientific interfaces for model components is beyond the state of the art, perhaps other possibilities exist to ease the burden of future, unanticipated couplings. Mick Carter from the UK Met Office pointed out that coding conventions and best practices might be a “halfway house” to enabling interoperability. This is not a new idea, and may be a more viable approach than trying to write general purpose tools for scientific software with constantly changing requirements. On the other hand, many existing conventions are more akin to “common sense” software engineering than to specific guidelines based on the nature of the underlying scientific requirements. The convention of dividing code into initialize, run, and finalize methods has been helpful, but even this basic level guideline is not suitable for many European models that leave control within each model component and use implicit sequencing to synchronize and exchange data between model components. It was further pointed out that for interfacial couplings, it is perhaps easier to get away with good coding conventions, but, in situations when volumetric coupling is required (e.g., chemistry + atmosphere), the large amounts of data that must be communicated (or at least shared) require more precise standards to enable interoperability of components. This means, for example, that the specific kinds of data structures used and their access paths would be agreed upon up front (e.g., what the array indices mean, shared memory locations, etc.).
The Scope of Couplers
At this point, the discussion took a rather philosophical turn. Some in the room disagreed that the chemistry + atmosphere example even qualifies as coupling. Often the two are so tightly intertwined that they are highly localized and share data structures. So, if this does not qualify as coupling, what does? How do we draw the line? Or, what should be the scope of couplers and coupling technologies?
Others pointed out that the chemistry + atmosphere example could in fact be counted as coupling depending on how you modularize the code. Furthermore, it was argued that the fact that you have to load balance the chemistry makes it a coupling problem. Also, in making it a subroutine (instead of a separate component) you immediately tie it to a single atmosphere. Rupert Ford and Graham Riley of BFG then pointed out that the decision on exactly how to modularize and interface scientific parts of the code does not have to be made hastily. A scientific function could be represented as a subroutine or a separate executable if the relevant infrastructure code could be generated and wrapped around the scientific code. An ideal technical solution should support both “loose” and “tight” kinds of coupling in a flexible way. However, it was pointed out that for tight interactions, an intermediate layer is not required or desired (presumably due to potential performance issues). Finally, a comment was made regarding the existence of a coupling “spectrum.” This was not expounded, but presumably the extremes of the spectrum are “loose coupling” and “tight coupling.” There may be other ways to define the extremes, though. While there was no consensus on whether the chemistry + atmosphere example should actually be called “coupling,” there was general agreement that at a minimum coupling involves data exchange, data structure translation, and grid interpolation.
With respect to creating a flexible framework that would support both ends of the coupling spectrum, Mick Carter pointed out that there are other design considerations besides just how quickly the code can be produced. Other concerns such as code maintenance, modularization, abstraction, and cleanness should also take priority. This is near and dear to my heart as I view increased complexity is an enemy to model verification and a significant roadblock to building good, intentional scientific software.
Interoperability and Metadata
Bob Oehmke of ESMF rightly pointed out that in general interoperability can be achieved by “description” (i.e., providing enough metadata to do the translation) or “restriction” (i.e., creating limited, precise technical interfaces). BFG is an example of the former approach–as much as possible is described outside of the application as metadata (e.g., how the communication is done, the sequencing of components, initialization of variables). OASIS sits in the middle because the field hookups are described in metadata, but other parts remain within the models themselves (e.g., the PUTs and GETs are explicitly coded into the model at certain locations). ESMF could be considered a “registration-based” framework because everything is registered with the framework at runtime instead of being described outside the code as metadata (e.g., registration of init, run, and finalize methods, domain decomposition, grids, fields, etc.).
The Way Forward?
The final portion of the round table dealt with the question of how to move forward in light of the many different kinds of couplers out there and the proliferation of coupling methodologies. There was general agreement that coupling technologies and frameworks are good because they provide an external layer that constrains the models and prevents monolithic coding styles (lots of custom code jumbled together into one big model). Furthermore, there is no need to have a hundred different interpolation engines–much of the infrastructure and utility code can be reused. Coding standards and best practices were seen as a start with the recognition that you have to start with the more general standards and move to the more specific. Intentionality was recognized as important–even if plug & play cannot be fully realized, at the very least you should be able to make sense of the code. Important baby steps.
It was also suggested that reducing the number of couplers would be advantageous. There are too many players in the arena now. Someone suggested maybe four separate couplers would be sufficient and much of the underlying infrastructure could be shared (e.g., interpolation, I/O).
Future work should be focussed less on the communication technique itself and more on the content of the communication. Wrappers can easily be used to adapt different communication architectures to each other. To me this is a call to think a bit more about how coupling technologies can be made more science-aware. I’m not talking about hard-coding science into the infrastructure, but looking for abstractions that might serve as higher-level building blocks.
Looking forward, what should the role of the coupling layer be? Most agreed on the basics such as interpolation and data transfer. However, there are still different philosophies regarding the “driving” aspects of coupling. Who has responsibility for time keeping? Is there a separate driver or should the models remain autonomous? How should components call each other to get needed data? The implicit sequencing employed by OASIS means that the user must ensure that the models run for the same amount of time and that field connections have been set up correctly in the code (deadlock is possible with OASIS, just as it is for MPI). The Community Earth System Model has a top level driver that manages the time stepping of the coupled model. As models grow in complexity, the user should not be expected to maintain timestep consistency so the driver does it for them. The tradeoff here is that it is harder to integrate new modules that have their own time management code. On the other hand, CESM argues that the resulting model is less error-prone.
Finally, the comment was made that each modeling producer must decide the level of expertise expected from the users. Is the model “expert friendly,” suitable for graduate students, policymakers, etc.? This choice determines a lot about how you design the code.
From my perspective, the round table was extremely satisfying. This was perhaps the first time so many experts on building coupled models were in the same room. It further solidified for me that building coupled models is a very complex enterprise and there are many right ways of doing it depending on where you start and what design goals you have in mind. Furthermore, the many pieces that go into building a coupled model (science, numerics, infrastructure) are themselves tightly coupled and very difficult to separate cleanly from each other.