Massively Parallel Collaboration

The face of science is changing as more and more experiments are moved out of the lab and onto the Grid. As the number of processors available for computation increases, scientists are able to simulate physical phenomena with higher spatial and temporal resolutions. But what is to become of all the data produced by computational Grids around the world? While much effort has been put into parallelization of computations for generating scientific data, there is much work left to be done on the other side of the fence where the data is analyzed.

Along with science, the rest of the world is changing, too. The Internet is becoming more dynamic than ever (see my Web 2.0 post) and the Web has become the place for social interactions. Folks such as James Surowiecki, author of “The Wisdom of Crowds,” have noticed the power and intelligence of large groups that–given the right set of circumstances–are able to solve problems, make decisions, and even predict the future much more accurately than an individual could.

You need not look far to find examples of the wisdom of crowds on the Web. A fairly obvious one is Wikipedia. This site is enormously popular for finding information about just about any topic, but it is not centrally maintained like a traditional encyclopedia. In fact, anyone can edit an entry as they please. And, maybe surprisingly, the result of thousands of people contributing in their own independent, unsupervised way is a very useful resource! Other sites such as Flickr, YouTube, del.icio.us, and Facebook also show the trend toward online collaboration of literally millions of people.

The question for e-science is: how do we leverage this technological and cultural trend toward massive collaboration? One possibility is to move much of the scientific analysis done by individual scientists out into Web space. As it stands today, the steps required to find some new trend in the data or some interesting plot are almost always done by a single scientists working at his own machine. The final results are published in a scientific journal, or presented at a conference for many to see, but by and large, the analysis itself is done by an individual or a very small group of individuals.

There is another way to think about scientific analysis. Consider a recent site I stumbled upon called Many Eyes. The idea of the site is simple: you upload your own data (in tabular format) and it can be visualized by anyone on the web using a large number of visualization types (bar chart, scatterplot, world map, pie chart, etc.). According to the Many Eyes website, the goal of the site is to “bet on the power of human visual intelligence to find patterns… to ‘democratize’ visualization and to enable a new social kind of data analysis.”

Check it out. Here are two examples of visualizations that I was able to create in a matter of minutes. The first one shows the number and total valuation of residential building permits issued for Boulder, Colorado from 1993 to 2003. This visualization uses a standard bar chart.

The second visualization is a tag cloud of all the content currently on my blog.

Once a dataset is uploaded, it is public. Users can view existing visualizations of datasets (like the ones I created) or they can create entirely new visualizations. The philosophy of Many Eyes is that you can tap into the “wisdom of crowds” by allowing many people to create their own kinds of visualizations of the same dataset. Users can elect to “watch” datasets or visualizations to be notified of new activity. Additionally, users can post public comments about datasets and visualizations.

Can this type of massively parallel collaboration be harnessed for sophisticated scientific analyses? I think so, and I think this is where we are heading. I had a conversation today with two students attending the Numerical Techniques for Global Atmospheric Models workshop at NCAR. When I proposed to them the idea of social scientific data analysis, they were very interested. In particular, I explained to them the Many Eyes concept and asked if such a site would be useful to atmospheric modelers if the site supported netCDF (a popular data format for atmospheric data) and more sophisticated visualizations. They agreed that such a site would be helpful if it could actually work over the Web. One of the students commented that a big win for such a site would be the ability for scientists to easily find and repeat the post-processing steps of another scientist. (See the El Nino scenario here.)

Surely, many questions remain to be answered. Will the Web infrastructure support sophisticated scientific analyses? Does the sheer size of datasets prevent scientists from working in online Web spaces? What are the cultural impacts of massively parallel collaborations? Would scientists even care to participate for fear of someone else “stealing” their discoveries?

At the end of the day, though, it is clear that the Web has enabled a whole new level of socialization and collaboration that was previously impossible. It’s up to us to determine whether science will embrace this new cultural shift and embrace the “wisdom of crowds.”

Advertisements

Tags: , , ,

About rsdunlapiv

Computer science PhD student at Georgia Tech

One response to “Massively Parallel Collaboration”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: