Rocky Dunlap’s Weblog

Entries tagged as ‘cloud computing’

Who owns your Facebook profile?

February 3, 2009 · 3 Comments

Increasingly, our online presences define who we are.   Our lives have a sort of virtual counterpart as we report on what happens in our “real world” lives to the rest of the world in online social forums such as Facebook, MySpace, and blogs.  As our lives become increasingly exposed to the online world, you can’t help but wonder which is more important:  your real life, or the life the world sees through your online presence.  Just as your credit score (a number), not your real-life financial habits, is the primary mechanism for determining your creditworthiness, for better or worse, your online presence in many contexts is the true essence of who your are, and the side of you that matters most.  For example, to what degree is your LinkedIn network used by hiring managers to decide whether or not you are a good fit for the company with whom you are seeking employment?

Despite differences in the perceived influence of our online profiles, most of us will at least go so far as to say that the content you create and post online is at least an important part of  your life and is content that you wish to keep and control.  But, as important as our online profiles are, we are generally happy to give up rights to our data and transfer control of it to third parties.  As an example, let’s take a look at the Facebook Terms of Use.

When you post User Content to the Site, you authorize and direct us to make such copies thereof as we deem necessary in order to facilitate the posting and storage of the User Content on the Site. By posting User Content to any part of the Site, you automatically grant, and you represent and warrant that you have the right to grant, to the Company an irrevocable, perpetual, non-exclusive, transferable, fully paid, worldwide license (with the right to sublicense) to use, copy, publicly perform, publicly display, reformat, translate, excerpt (in whole or in part) and distribute such User Content for any purpose, commercial, advertising, or otherwise, on or in connection with the Site or the promotion thereof, to prepare derivative works of, or incorporate into other works, such User Content, and to grant and authorize sublicenses of the foregoing. You may remove your User Content from the Site at any time. If you choose to remove your User Content, the license granted above will automatically expire, however you acknowledge that the Company may retain archived copies of your User Content. Facebook does not assert any ownership over your User Content; rather, as between us and you, subject to the rights granted to us in these Terms, you retain full ownership of all of your User Content and any intellectual property rights or other proprietary rights associated with your User Content.  (http://www.facebook.com/terms.php).

According to this, “Facebook does not assert any ownership over your User Content.”  While this in many respects takes care of the legal side of things, it does not really address the practical issues of data ownership and control.  Legally I “own” my Facebook profile, but how do I “get” it?  How do I “save” it?  If Facebook servers went down tomorrow (perhaps an unlikely scenario), would I be able to retrieve my profile?  What about the hundreds of pictures that I have uploaded?  Or my messages?  So, while I do retain ownership of content I create, Facebook does not guarantee anything about accessing it.  On the other side of the coin, what if I want to remove some or all of my profile?  Let’s say I log in and delete some of my messages.  Are they really gone?  How many backup copies exist on Facebook servers?

Before going on, allow me to interject  a couple of things at this point.  First, I realize that the last paragraph is starting to sound a little conspiracy-theory-esque.  I do not think that Facebook is out to get us or that anyone is planning on using our profiles against us, etc.  Nor is this intended to be a rant against Facebook and I do not have any issues with the way Facebook has handled my own content.  On the contrary, I imagine that the original creators of Facebook had no idea that the size of its user base would become so incredibly massive and questions of data ownership and control probably seemed relatively inconsequential in its early phases of development.  Further, the privacy controls of Facebook seem quite reasonable insofar as you can decide which people get to see what content.  The data policies of Facebook are in line with the data policies of almost every other service that hosts user-generated content.  In fact, you can make the same observations of many other sites, such as LinkedIn or your favorite blog site.

The underlying issue here is bigger than just control over your social networking profile.  What I am exploring here is whether we need a technological and cultural shift in the way we think about user-generated data–including who owns it, who controls it, how it is accessed, and where it is stored.  The typical approach for architecting a site that delivers user-generated content is for the site to host both the application and the data.  The reasons for this are many.  For one, there is much technological inertia in that direction.  It fits the typical design pattern for building a web site:  get a web server, get a database server, get them to talk, and presto–you are ready to go.  Having the data close to the application is perhaps the basic premise for ensuring efficiency of data operations.  Consider the fact that Facebook serves over 15 billion images per day.  On average, that’s over 170,000 images per second.  You absolutely have to have the data close at hand to get that kind of throughput.  Also, most users are not really interested in managing their own data to begin with.  And, if site developers wish to make a change to the application (such as adding a new field to the profile) they can do so with ease because they have control over both the application and the data schema.  So, there is clearly good reason for sites like Facebook to manage the data for you.

But, let’s imagine another scenario.  Let’s say you are signing up for a new Facebook account.  After putting in some basic information, you are presented with a prompt:  “Where would you like to store your profile information and other user-generated content?”  You are then given a couple of choices:  1.  Have Facebook maintain my profile data.  2.  Allow Facebook to access my personal “cloud” storage area.  You select option 2.  At this point you provide Facebook with credentials to access part of your personal storage area “in the cloud.”  Facebook would then access your storage area and configure it as required for the application.  All of  your Facebook user data would be stored there and accessed by Facebook as needed.  To be clear, the user experience on the site would be no different than if Facebook stored all of your data locally.  But, in fact, your data is now sitting inside a storage area that you own and control.

Is such a thing technically possible?  Would Facebook ever agree to it?  Is there really a need or a demand for this?  I have much more to say on this subject, but let’s leave it here for now.

Some related links:

http://www.eweek.com/c/a/Enterprise-Applications/Who-Owns-Your-Social-Data-You-Do-Sort-of/

http://www.dataportability.org/

Categories: Everything Else · Research
Tagged: ,

Will cloud computing change the face of e-science?

November 21, 2008 · 2 Comments

First, a bit about cloud computing, and then some extrapolative thinking on what its impact will be on e-science.

Cloud computing is a buzz word that we are hearing more and more recently.  It’s one of those terms that people latch onto because they know there is really something lurking there, but they can’t really place their finger on what it actually is.  Wikipedia says cloud computing is “a style of computing in which IT-related capabilities are provided ‘as a service’, allowing users to access technology-enabled services from the Internet … without knowledge of, expertise with, or control over the technology infrastructure that supports them.”  The term cloud is presumably used as a metaphor for the Internet since it is usually depicted that way on network diagrams.

While I’m not sure if that definition would jive with everyone, it’s in line with Amazon’s Elastic Compute Cloud offering (EC2).  Amazon describes EC2 as “a web service that provides resizable compute capacity in the cloud.”  Essentially, you can design the computational architecture that you want and Amazon will provide it to you as a service on a pay-as-you-go basis.  Need 100 Linux nodes, but only for a week?  No problem–you only pay for what you use, and when you are done, just terminate your nodes and forget about them.  You choose the machine image that you want, the software, the memory size, and the required storage capacity.  Apparently, it can be configured very quickly so you can quickly scale your computational capacity with a very small incremental cost.  I admit the EC2 model is very impressive if it works as they state on the home page.

Assuming cloud computing services such as this come into the mainstream, there will be huge impacts in many domains that rely on IT infrastructure.  E-science is one area that might be radically transformed.

Much of what is impeding scientific progress are the computational and technical issues involved with conducting large scale simulations.  Incompatibilities among computational environments hinder the sharing of experiments and results.  Repeatability, a key tenet of the scientific method, is nearly impossible with respect to e-science computations (at least repeatability by other scientists in other labs using a different computational environment).  The cloud provides a needed layer of abstraction so that scientists can think about science and not about computer science.  Therefore, portability is a prerequisite to repeatability in the realm of e-science.

In almost all domains of e-science, results are disseminated by scientific publications in conferences, journals, and the like.  While many journals have moved to an electronic format, the underlying paradigm is still the same:  results are presented in a summarized format (e.g., plots and averages), but little information is provided on how to reproduce the computations that led up to the results.  And this is understandable.  It might take literally months of tweaking configurations followed by months of processor time followed by months of post-processing and analysis before the results are finally in.  How could you possibly provide enough information for someone else to reproduce the same experiment?  And even if you could, how do you get around the fact that everyone’s computational environment is different and your code might not even run on another platform?

The cloud computing platform sees the computational environment (e.g., operating system + compiler + processor + software + …) as a first class object that can be created, registered, shared, searched, and otherwise manipulated.  For example, Amazon’s EC2 service provides a registry of “Amazon Machine Images” that anyone can access and instantiate.  Custom AMIs can be added to the registry.  This is a paradigmatic shift because what used to be the “infrastructure” has been ripped out and parameterized.  (Imagine being able to change the foundation of a building with ease).  The computational environment becomes another configuration parameter to set along with your experiment’s scientific parameters.  In this sense, the cloud computing platform can be viewed as the “meta-infrastructure.”  Sure, it is an infrastructure at the same time, but for the first time it is an infrastructure that we can safely ignore.

The advantage to e-science?  With a parameterized infrastructure afforded by the cloud platform, we are well on the road to sharing much more than just scientific results.  Instead, we will share the experiments themselves–descriptions of scientific computations that anyone can execute and examine to validate results and extend them.  Admittedly, we have much work to do before this vision becomes a reality.  But, maturing cloud computing services like those offered by Amazon are a big first step toward a better way of doing science.

Categories: Research
Tagged: , ,