The Costs and Benefits of Modularity in Climate Models

The Many Dimensions of a Climate Model

Separation of concerns has long been recognized as an important goal when building software. In an well-cited paper, Peri Tarr et al. point out that most software formalisms limit our ability to truly separate out all concerns due to what they call the “tyranny of the dominant decomposition” [1]. In other words, a single dominant dimension is typically selected as a decomposition rule, such as the object structure most evident in the application’s domain. Tarr et al. argue that single dimension separation is not ideal because the concerns of software systems are almost always overlapping in nature–that is, single objects or modules are often responsible for implementing multiple concerns. Therefore, the elements present in the dominant decomposition are responsible for handing multiple concerns such that separation of concerns is not really achieved. They propose an abstraction called a “hyperslice” which is a kind of module that encapsulates a concern other than the dominant one. Hyperslices are powerful because a given programming element (e.g., class, method) may appear in multiple hyperslices. Hyperslices are combined using composition rules to form the complete system.

I do not find it hard to argue that climate model software development has indeed fallen prey to the “tyranny of the dominant decomposition.” Informal architecture diagrams of climate model software almost always depict a single kind of functional decomposition: geophysical domain such as atmosphere, ocean, land, etc.  To see this, check out some excellent work comparing climate model architectures. (Take a look at the linked diagrams and notice what each of the bubbles represent regardless of the particular climate model in view.)

To be sure, it is hard to argue against decomposition by geophysical domain since it divides up the code nicely based on scientific expertise (i.e., those contributing to the code), encapsulates the data fields related to each of the domains and because scientists often want to run one of those components as standalone application (e.g., to isolate the atmospheric model for sensitivity analysis). The dominance of this decomposition rule can also clearly be seen by studying the reusable coupling technologies out there–to my knowledge, every one of them assumes a functional decomposition based on geophysical domain.

Traditionally, one-dimensional separation of concerns is seen as a Bad Thing. The issue is that the other concerns are diffused throughout the primary decomposition structure. This diminishes the ability to locate, comprehend, verify, and reuse the diffused concerns.

To make the discussion concrete, what “concerns” do we see in climate models that are candidates for isolation? Answering this requires some degree of expert input or at least careful consideration of the software requirements of a climate model. Some general purpose guidelines can be found in the literature about how to select concerns (e.g., definitive membership, finite domain, etc.) but these do not really help to identify the concerns of a particular domain or software system. Based on my experience thus far studying climate model software, here are some possibilities:

  • grids / numerics / domain discretization
  • interpolation / accumulation / averaging
  • coupling
  • domain decomposition
  • driving / execution schedule
  • configuration
  • I/O
  • error handling
  • logging

What’s not included in the list above are the myriad scientific choices involved.  Is the ocean dynamic? Are cities modeled? What kind of dynamical core is used to drive the atmosphere? All of these scientific choices are indeed software concerns because they have to be implemented in software and their implementations are deeply inter-connected.

Different Meanings of Modularity

At this point, I should mention a recent discussion paper regarding modularity of features in software systems by Kästner, Apel, and Ostermann [2]. The paper describes an ongoing debate among members of software engineering and programming language communities about the costs and benefits of modularizing software features. (For now, let’s just assume that software “features” and the “concerns” I mentioned above are roughly equivalent, although traditionally a “feature” is a more user-centric term while a concern is a more developer-centric term. To fully appreciate the paper, read up on Feature Oriented Domain Analysis first.)

Part of the debate revolves around two different notions of modularity: Let’s call the first type informal modularity.  The paper describes this kind of modularity as meaning cohesion and locality. In other words, all code artifacts related to implementing a particular feature are placed into a separate structure, such as its own class, file, folder, or module. This kind of informal modularity seeks to provide many of the practical benefits (ease of maintenance, comprehension, evolution, etc.) without incurring the costs of an overly burdensome interface mechanism. As an example of this type of modularity, imagine gathering all the #ifdefs related to a particular configuration option into a single place instead of leaving them scattered throughout the codebase.

The second kind of modularity, which we’ll call formal modularity, seeks to achieve true information hiding by separating modules into an internal part and an external part. The external part, called the interface, establishes a contract between the module and the rest of the system. Details about what’s behind the interface are hidden. As pointed out in [2], this more formal modularity makes modular reasoning possible such as automated type checking (i.e., ensuring that module compositions are safe and correct) and separate compilation (ensuring that the module itself is at least syntactically valid). If semantic interfaces are available, more advanced kinds of reasoning can be used to test properties of composed modules.

There are tradeoffs involved with these two views of feature modularity:  The informal approach provides practical development benefits, but lacks the rigor required for automated reasoning. The formal approach provides many benefits such as reuse (e.g., an open-world view) and independent development and testing [2]. The benefits of formal modularity have to be weighed against the costs: Kastner et al. point out that granular and/or crosscutting feature modules may end up with interfaces that essentially contain the whole of the feature implementation (so that nothing is hidden). Furthermore, a feature may have a lot of interactions with other features requiring a large number of micro-modules to encapsulate pair-wise (or higher cardinality) feature interactions.

To Modularize or Not?

Some questions then:

  • Would climate science benefit from modularity of the kinds mentioned above?
  • If yes, which definition of modularity is most appropriate: modularity as cohesion and locality or modularity as true information hiding?
  • What would a feature-modularized climate model look like? What climate model architecture would support multi-dimensional separation of concerns?

Some community discussions need to take place on these issues.  Probably the issue of most general interest is that of climate model verification. In a previous entry, I pointed out that software complexity is the enemy of climate model verification. As more sophisticated science is introduced into the model, the problem will only get worse. Now is the time to make some decisions about how to tame model complexity and I argue that improved modularity is perhaps the most important step the climate modeling community can take to increase verifiability of the models.

There is very interesting software engineering research on whether the cohesion/locality or information hiding viewpoint is most appropriate. In a previous experiment, I attempted to separate out the coupling interface from the rest of the model in the style of Robert DeLine’s Flexible Packaging. This was only a very simple case, but in reflecting on it I realize that much of the difficulty lies in the fact that the scientific parts of the model are highly dependent on the coupling infrastructure. In other words, were we to define a formal “information hiding” type of interface for the coupling infrastructure, it would be a large interface with many parameters. One way to characterize this would be to count the number of API parameters to coupling technologies like ESMF and OASIS/PSMILe.

For now, there are many open questions about how to best realize the benefits of modularity in climate models while still negotiating the costs of modular implementations. More to come on this topic!

[1] Peri Tarr, Harold Ossher, William Harrison. “N degrees of separation: multi-dimensional separation of concerns.” ICSE 1999.

[2] Christian Kästner Sven Apel Klaus Ostermann. The Road to Feature Modularity? SPLC 2011.

Advertisements

About rsdunlapiv

Computer science PhD student at Georgia Tech

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: