Ph.D. Thesis Proposal

Principled Composition in Coupled Earth System Models

Designing and implementing coupled Earth System Models (ESMs) is a challenge for climate scientists and software engineers alike. Coupled models incorporate two or more independent numerical models into a single application, allowing for the simulation of complex feedback effects. As ESMs increase in fidelity and sophistication, model developers are increasingly faced with the issue of software complexity. Furthermore, the desire to add more components to ESMs means that the complexity problem will likely become worse unless steps are taken now to improve how independent models are coupled.

Although coupling models is fundamentally a scientific endeavor, a substantial software engineering effort is required to compose the underlying software components into a single application that is at once efficient, numerically correct, and scientifically valid. Much of the difficulty comes from the fact that every coupling is unique—each coupling is intimately dependent on properties of the constituent models that are to be linked together. Therefore, when a model developer begins work on coupling models, there is a tendency to create a context-specific, ad-hoc solution. Unfortunately, the literature describing geophysical coupled models and their challenges says little about how one should go about designing and implementing couplers—the software components used to link together constituent models. The lack of coupler design principles reflects the relatively immature state of the ESM coupling domain and results in duplication of effort and continuous re-learning of the same design principles.

The large amount of legacy code, the wide range of coding conventions, and the breadth of architectural variability among numerical model software means that a completely automated solution to coupling is well beyond the state-of-the-art. However, in lieu of complete automation, a set of rigorous design heuristics can serve as a kind of handbook for building couplers that are likely to meet the desired set of functional and performance requirements. We propose to perform an architectural analysis of the state-of-the-art in ESM couplers and encode the expert knowledge found therein into a resource for coupled model developers.

To achieve this goal, we first identify a coupler design space, a multidimensional space of functional and structural design choices relevant for devising and constructing ESM couplers. Formulating the design space itself is a non-trivial task due to the range of different coupling philosophies and architectures currently employed in production quality ESMs. Furthermore, the design space dimensions reflect the design choices of primary importance when building couplers. Therefore, there is increased pressure to identify the “right” dimensions in order to provide the best possible guidance in coupler design. A point in the design space represents a particular instance of a coupler design.

Secondly, we perform an architectural tradeoff analysis of ESM couplers to assess software qualities of couplers within the design space. Although architectural analyses can be used to evaluate a wide range of software qualities, this work focuses on two fundamental qualities that are often antagonistic: performance and modularity. High performance translates to higher resolution simulations, increased capacity for multi-model ensemble runs, and decreased time to dataset analyses and publication of scientific results. Modularity is a proxy for other software qualities that are hard to measure directly, such as modifiability, flexibility, reusability, and intentionality. Although modularity in software systems is linked to a number of advantages, there are performance costs involved with modular designs due to potential efficiency loses at module interfaces. The antagonistic relationship between modularity and performance has been recognized as a general principle in many kinds of engineering systems and it is a fundamental tradeoff that must be considered when designing coupler architecture.

Thirdly, the results of the architectural analysis form the foundation of a set of coupler design guidelines. The design guidelines are an extrapolation from the architectural tradeoff analysis that can be used by model developers to predict the effects of making architectural choices when designing ESM couplers. They are presented in a practical and problem-oriented form that can be applied quickly by model developers who are faced with a particular coupling problem.

The design guidelines are evaluated by encoding them in a code generator capable of automatically generating ESM couplers with desired functional and non-functional properties. Generated couplers can be fed back into the architectural analysis allowing for iterative refinement of the design guidelines themselves. On a practical level, the automated coupler generator serves as a valuable tool for the geophysical modeling community by reducing development costs and time-to-solution for building couplers.


About rsdunlapiv

Computer science PhD student at Georgia Tech

3 responses to “Ph.D. Thesis Proposal”

  1. Robert Muetzelfeldt says :

    1. It’s a good idea to have a title: it helps the reader, and it’s a good discipline in its own right, to help focus your goals.

    2. You say (para 2):
    “Unfortunately, the literature describing geophysical coupled models and their challenges says little about how one should go about designing and implementing couplers—the software components used to link together constituent models.”
    I’m not sure why you say this, given that you have elsewhere blogged on the various coupling systems available, the value of the recent CERFACS meeting in getting different coupling approaches together, etc. Are these each not written up in the literature?

    3. If it’s *your* thesis proposal, I think you should eschew the use of the ‘we’ and instead use the ‘I’ (e.g. the start of paras 4 and 5). I have never liked the use of the ‘we’ in scientific writing, esp in PhD theses, and I think it’s time we move on to more accurate and less confusing terminology.

    4. There is a discussion to be had about your various assertions relating to modularity, esp with respect to the trade-offs on performance, too long to start here. Hopefully you’ll be blogging on this as your work progresses, and I can pick up points then.

    5. Your final para has a resonance with BFG2, as you recognise in other postings. It might be worthwhile actually addressing this directly in your proposal, saying whether you will be developing that approach, or exploring an alternative way of achieving its ‘framework generator’ capabilities.

  2. rsdunlapiv says :


    Thanks for your comments. I added a title as you suggested and here are some additional thoughts below.

    Regarding the lack of literature about “designing and implementing couplers”–my point is that if you do a literature search about coupled Earth System Models and zero in on the text specific to coupling, it is almost always a description of the science (e.g., we integrate model A for N hours then pass fields X, Y, and Z to model B….). This says nothing about how the underlying software is actually implemented–that is, there are myriad ways to “pass field X, Y, and Z to model B.” From a software engineering/software architecture perspective, those descriptions are highly unsatisfying and they do not help the model developers with the nitty-gritty of actually coding the coupled model.

    But, your point is well taken, the technical documentation about the different couplers out there do talk about how to build a system using their software. However, most of this documentation is more akin to a user manual than a discussion about the viable approaches to coupling and the trade-offs therein. As you point out, the coupling workshop report will begin to fill this gap.

    Per your point on BFG2, that work is by far the most closely related work. I think there is at least moderate interest in the climate community in doing some program generation when it saves time and resources. However, there is difficulty is determining how the generated code interacts with the science. At this point, it seems that infrastructure code is the primary target of generation–not the science itself. I know this will not completely satisfy you as you seek to push communities towards more declarative approaches…

    My work has much of the same goals of BFG2, but I’m taking a different tack on how to do the code generation itself. They are using a series of XSLT transformations. From my experience, this works well for smaller generation tasks, but it can get ugly really quickly as the complexity grows. I’m considering what other mechanisms are out there (e.g., code weaving, Flexible Packaging, architectural connectors) that could be used for automatically generating couplers.

    • Robert Muetzelfeldt says :

      “At this point, it seems that infrastructure code is the primary target of generation – not the science itself.”
      This is probably why it was suggested that there was not much point in me coming to the CERFACS meeting…

      In any case, I have very little feel for how feasible it would be to represent the science in Earth System models declaratively. My guess is that it is feasiblee – basically it’s just maths and physics, right? – but it could take a fair bit of work to come up with the appropriate notation.

      My point about BFG2 was to encourage you to mention it in your proposal, just because it is so closely related and the goals are so similar, then to draw out how your approach would differ (as you have done, above). Otherwise the reaction of other people might be: Why not just use BFG2?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: