CESM Forum Analysis – Scientific Configuration

There are several forums available off of the CESM website for questions of a scientific and/or technical nature.  At the top level, the forums are divided up by component including forums for the Coupled Model, CAM (atmospheric model), CLM (land model), POP2 (ocean model), and CICE (ice model).  There are other forums for Data Models, Coupler, Biogeochemistry Modeling, and Whole Atmosphere Modeling with WACCM.

The forum posts contain issues and questions affecting users of CESM (and its predecessor, CCSM).  An analysis of the forums is somewhat informal by necessity.  I do not claim that what we see on the forums is necessarily a representative cross section of the entire CESM user base.  Users who work in larger labs might have colleagues nearby who can answer questions about the software.  Expert users are likely in touch with some of the model developers and may email questions to them directly.  These kinds of observations may actually lead us to believe that the forums are more likely to represent novice users who are not already tied into the CESM user community by some other means.  Furthermore, from personal experience, posting to a forum is, in many cases, a last resort option after other ways of finding information are exhausted.

With these caveats freely admitted, an analysis of forum posts will nonetheless show real issues and questions that users are dealing with, so we see significant value in this undertaking.  So far, I have looked at only a single forum—the General Discussion forum of the Coupled Model.  As of October 18, 2010, there are a total of 124 threads (266 total postings) dating as late as September 28, 2010 and as early as October 28, 2004.  For this analysis, I only considered posts that dealt specifically with configuration.  For now, let me define configuration to mean the process of setting up the software to meet the user’s resource, numerical, and/or scientific requirements.  That being said, I have defined configuration rather broadly, including threads that mention any of the XML configuration files, namelists, how variables are set in source files, modifying code to achieve scientific requirements, turning certain features on or off, migrating configuration changes into new contexts, questions about input files, questions about what configuration options mean, and questions about where to set certain configuration choices.  Despite this rather broad definition, I should point out that there are some threads that would traditionally be classified as dealing with configuration that I have not included in the analysis.  I have excluded issues of a highly technical nature that do not directly relate to setting up the software to achieve scientific goals such as compiler issues, software dependencies, bug reports, etc.  The goal here is to tease out those special distinctions affecting computational scientists.  I do not want to focus much on the kinds of configuration issues that are common to all kinds of software (although there is inevitably some gray area here).

For this particular forum, 66 of the 124 threads (~53%) had configuration-related content.  I hope this analysis will lead to insights about how CESM users view the configuration process, what kinds of issues and questions are common, and how those issues are resolved.

To get a handle on the 66 configuration-related posts, I categorized threads into one of 12 categories.  The categories were developed as I read through the threads and began to see common themes among the threads.  In many cases, a thread might fall into multiple categories, but I forced myself to choose the single best possible fit.

 

Category

Description

Example Posts

Where to make change? Unclear where to make changes to meet a scientific requirement 1.        How to turn off the extra chemistry parameterization CHM 

2.        How to change the ocean model timestep

3.        How to change the start/stop years

Is feature supported? Unclear whether a certain feature is supported by the software 1.        Is there support for dynamic/transient CO2 fluxes instead of fixes or ramping? 

2.        Is there a way to turn off flux adjustments?

3.        Has anyone modeled solute transport through soil in CLM?

4.        Will CLM automatically detect time intervals in atmospheric forcing data input files and interpolate?

5.        Can I disallow freezing water in a CAM + CLM coupled configuration?

Purpose of configuration option? The purpose or effect of making a particular configuration choice is unclear.  The user may be just seeking information about a configuration option.  There may also be confusion about how configuration options interact with each other. 1.        Do I need to modify co2vmr_rad to use time-varying CO2 concentrations for the transient simulations? 

2.        What is the meaning of the CCSM_BGC options in the env_conf.xml file?

How to set configuration option? The location and general purpose of a configuration option is known, but it is unclear how it should be set. 1.        Do I use “restart” or “hybrid” start option (without an exact restart) to reset the start date when initializing several runs from different years?
Unexpected behavior Making a particular configuration choice did not have the expected effect or behavior.  In some cases the user experiences a build or runtime error.  In other cases, there is no error, but the user’s requirements were not met. 1.        I set several compile-time parameters to use an offline dynamics reanalysis with 42 levels; now I get a runtime error. 

2.        After setting STOP_OPTION and STOP_N parameters, the model does not run for the expected length of time.

3.        After switching to a different grid, I get an error that says the model is “blowing up.”

Need scientific explanation More information is requested to determine whether a scientific requirement is met 1.        Is this temperature output from the SRESB1 scenario of CCSM3 adjusted for elevation in the region of the Alps? If so, how is this done (which formula is used)?
Is this configuration valid? More information is requested to determine if a particular configuration or change is scientifically valid and/or supported and/or the “best” way to accomplish a given requirement.  In some cases the user is extending the software’s behavior.  In other cases, the user is asking if a certain configuration has been validated. 1.        After running CCSM with CSIM sucessfully on one PE, I read that the configuration is not supported.  Are my results valid? 

2.        I want to apply time-varying tropospheric aerosol forcing and I have the data.  Should radiative forcing be applied as negative deviation from the solar constant at first?

3.        I’d like to vary ice albedo parameters during a run by changing them upon restarting; if I change them in the namelist will they be applied correctly on each restart?

Is configuration choice applied? More information is requested to determine if a particular configuration choice has actually been applied. 1.        I’m not sure whether the solar forcing is applied because the atm.log* file does not contain any information about it.  How can I confirm the solar forcing is indeed added to the model?
Need technical explanation More information is requested of a technical (non-scientific) nature. 1.        Where is the variable cam_landfrac initialized in the land model? 

2.        Cannot find file  map_T42_to_gx1v3_aave_da_010709.nc when building the coupler.

Saving / migrating / generating configurations How to save, migrate, or generate configurations.  These questions are at the “meta” level because they ask specifically about how to deal with configurations as first class objects with identities.  This involves moving a configuration from one context into a different context, automatically generating configurations, and saving configurations for later retrieval. 1.        How do I perform perturbation growth tests (currently documented for CAM) for the coupled CCSM3? 

2.        I have been running CAM3.1.4 standalone and made code modifications; now I want to port my changes over to a fully coupled CCSM3.

The following table shows how many threads fell into each category, sorted by the total number of threads in that category.

 

Category Number of Threads
Unexpected behavior 14
Need technical explanation 12
Need scientific explanation 11
Where to make change? 8
Is feature supported? 6
Saving/migrating/generating configurations 5
How to set configuration option? 4
Is this configuration valid? 3
Purpose of configuration option? 2
Is configuration choice applied? 1

 

The top three categories, accounting for over half of the configuration-related threads, are “Unexpected behavior,” “Need technical explanation,” and “Need scientific explanation” with 14, 12, and 11 threads respectively.  The three are clearly related as those who experienced an unexpected behavior are looking for explanations as to why the model did not perform as expected.

An overwhelming number of the “unexpected behavior” posts deal not with unexpected scientific results, but unexpected technical behaviors.  For example, the model did not run for the expected length of time, did not restart correctly from the restart files, failed to assign processor resources as expected, or resulted in a runtime error.  In the “need technical explanation” category, a large number of threads dealt with I/O issues such as missing input and restart files and how to output files with the desired variables.  Two threads dealt with generating input files (e.g., interpolations weights using SCRIP).  One thread concerned performance issues and one thread asked where a certain variable in the source code is initialized.

In the “scientific explanation” category, many threads contained basic questions about the scientific properties of the model and the forcings used.  What is the reference CO2 level for control runs?  What is the solar constant used?  Trace gas concentrations?  How high does the atmosphere model resolve in terms of pressure?  What kind of ocean is used (dynamical, swamp, data) for certain control runs?  One question asked to identify a particular unnamed coefficient value used in an equation in the code.  Two slightly more sophisticated questions asked about stability of the SST value after introducing a temporary perturbation and the time until the climate settles back to equilibrium.

The next most frequent type of thread is the “Where to make a change?” category.  A majority of the threads asked how to make fairly basic numerical modifications, such as changing the model’s timestep, start/stop date, or grid resolution.   Of scientific nature, one question dealt with turning off an extra chemistry parameterization, one with adding a tracer capability, and another with an error when making modifications to the implementation of the saturation vapor pressure equation.

The “Is feature supported?” category contains almost entirely science related questions:  Is there support for a transient CO2 flux from the land model?  Is there a way to use constant flux adjustments?  Is there a switch to disallow freezing water?  Has anyone modeled solute transport through soil in the land model?  Does the model support a paleo climate mode?

The “saving/migrating/generating category” deals with meta-level questions about the configuration process itself.  One thread deals with creating a custom compset.  Two threads ask about porting changes from a CAM standalone setting into a coupled CCSM setting.  One thread asks if there is an automated script for generating a series of branch runs.  The last thread requests a change to a namelist parameter in order to turn off a certain ocean computation for paleo climate runs.

The “How to set configuration option?” category contains four threads.  Two deal with assigning processor resources.  One asks for more information about the difference between a “restart” and a “hybrid” run.  The final thread asks for advice in setting several parameters for a custom paleo configuration that is giving negative values for salinity.

The “Is configuration valid?” category contains three threads.   Two are technical in nature.  One asks if a particular run on a single processor should be considered valid.  Another asks whether a certain namelist value will be re-read from the namelist on every restart.  The third thread is scientific in nature and asks how a time-varying tropospheric aerosol forcing should be applied.

Two threads ask for the purpose of a configuration option.  One of them asks about a dependency between two parameters.  The other just asks for a basic explanation.

There is a single thread in the “Is configuration choice applied?” category.  The user asks how to confirm that the ramped greenhouse gas setting has actually been applied because there is no indication in the log output.

I plan on doing a similar analysis on (at least) one of the model-specific forums in the near future as a source of comparison.  I will hold off any observations/results until then.

Advertisements

About rsdunlapiv

Computer science PhD student at Georgia Tech

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: