Cloud computing and the issue of privacy
Could Google’s Achilles’ heel be privacy? Cloud computing asks an enterprise to cede more control to the vendor. That applies not just to Google but also to Amazon, AT&T or Pageflakes—any company offering cloud-based services. A licensee must believe the vendor who says, "We don’t keep track of any personal or proscribed information."
I’m a trusting soul, but I have worked in and around online systems for more than 30 years. I have sitting not 10 feet from me a person who can write a script and suck content from any system to which she can get or has access. Privacy just like security is only as good as its weakest link. Cloud computing increases the risk that a breach could occur. Note, please, I am not saying will occur. We’re dealing with risk here and one’s definition of acceptable risk.
The muddled information in the April 21, 2008, Financial Times’ story "Google Resolve Crumbles on ‘Cookies’ Pledge." The journalist, Richard Waters in San Francisco, points out that Google is not able to resolve some of the tracking and monitoring issues associated with the small text files placed on a user’s computer. Those "cookies" make it trivial for a cloud-based vendor to know who does what, when, how and even where the user navigates when leaving one site to visit an unrelated URL (uniform resource locater). My colleague—the one who can script dance with the best of them—said, "You can do lots of interesting tricks with cookies like sucking data off the user’s computer. Snort. Snort. Giggle."
Privacy and Google
One newspaper story doesn’t mean Google is going to take any chances with its enterprise customers. I think Google is much better about privacy than some other firms with which I am familiar. But Google faces some pushback on privacy from the European Commission. Google’s public policy Web log does a good job of keeping me current on Google’s policies. The Web log is here: http://googlepublic
In an April 7, 2008, post on its public policy Web log under the headline "The European Commission’s Data Protection Findings," Google said:
"The European Commission’s Article 29 Data Protection Working Party—named after the rules they are monitoring–-has been conducting a lengthy inquiry into the question of online privacy. While the working party has welcomed our decision to anonymise [sic] data logs after 18 months as a positive privacy protective step, it suggested in findings released today that this period might still be too long. We believe that data retention requirements have to take into account the need to provide quality products and services for users, like accurate search results, as well as system security and integrity concerns. We have recently discussed some of the many ways that using this data helps improve users’ experience, from making our products safe, to preventing fraud, to building language models to improve search results. This perspective—the ways in which data is used to improve consumers’ experience on the Web—is unfortunately sometimes lacking in discussions about online privacy."
The one piece of information that is not widely known that I find helpful when thinking about cloud-based computing and privacy is the mundane data schema for a user’s behavior. In Google’s patent and patent application corpus, I identified more than a dozen public documents that explain the systems and methods disclosed by the company for usage tracking. The first patent application is dated 2002, US 2002/0123988. Work on patent applications typically consumes several months, so it’s reasonable to conclude that usage tracking was of interest prior to this patent application’s filing in March 2001 to Google in its earlier days. The most helpful information to me on the issue of privacy is the diagram on page 15, KMWorld, Vol 17, Issue 7, which appears in Google US 2006/0224583, titled "Analyzing a User’s Web History."[This data model, which appears in the Google US 2006/0224583 document available at uspto.gov, shows fine-grained tracking of a wide range of user actions. The record structure keeps track of the user by a "user identifier."]
It’s clear that this record structure keeps track of the user by a "user identifier." That record structure is also designed to monitor advertising. Of particular interest to me were the two column heads "Derived Data" and "Additional Data." It occurred to me that with a little editing, that record structure would be useful in tracking the activities of a user of other types of Google services—for example, enterprise cloud-based services. I don’t think that will happen, but it is interesting to look at that fine-grained data model in the context of the other Google inventions that track, manipulate, aggregate and analyze stateful and stateless user actions.
My thought is that as Google moves into enterprise cloud-based services in a more significant way than it has to date, Google will want to reassure its enterprise customers that the concerns about "crumbling cookies," the European Union’s nagging, and the powerful usage tracking inventions disclosed in Google’s patent documents are lined up with soldiers, polished and outfitted for battle in the enterprise wars ahead. IBM, Microsoft, Oracle and SAP may find that Google’s own policies, procedures, statements and inventions are sometimes at sixes and sevens.