Privacy and Data Protection in the Cloud [For CloudCon 2011]

This Wednesday I’ll speak in CloudCon 2011, instead of a regulatory lecture, I decided to focus about a technological solution to a legal problem, which I believe might be elegant. I’d appreciate it if you could join me at CloudCon or just come over to say hi.

0. The Cloud and Your Information.
On the verge of the Age of Intelligent Machines, Cloud Computing brings a new era for data processing. The Cloud holds more and more information, where data owners and data subjects lose physical control over it. If the old-world model was that data was about the end-user was held by the service provider, which processed and brought the data to the end-user, the cloud model allows the service provider to hold the information for the end-user at the quarters of 3rd parties. For this brief lecture, we’ll use Dropbox as an example, but when Dropbox’s examples fail, we’ll move on to others. In brief, Dropbox is a storage service which remotely backups your information on Amazon’s S3 Servers automatically. When you Install Dropbox, you use at least one more CSP (Cloud Service Provider) and are subject to its terms.

1. Shared Hosting, Shared Computing, Shared Control [meaning: The Problem];
Now, who has control over your information? Dropbox’s privacy policy suggests that “Dropbox cooperates with government and law enforcement officials and private parties to enforce and comply with the law. We will disclose any information about you to government or law enforcement officials or private parties as we, in our sole discretion, believe necessary or appropriate to respond to claims and legal process“; also, Amazon S3’s privacy policy which states that “We release account and other personal information when we believe release is appropriate to comply with the law; enforce or apply our Conditions of Use and other agreements“. Meaning, both Amazon and Dropbox shall abide to law enforcement requests and provide information if a court says so. Generally speaking, this is a good thing.

Let’s take this into proportions, however: Let’s say that I produce Lemonade and have a trade secret: the recipe; I store it in my Dropbox folder, as i need to provide access to several employees and I want it to be backed up securely. Now, my biggest competitor wants to access my Lemonade recipe. He goes to court, and with a good attorney gets an Anton Piller Order (an order allowing him to seize my assets held by a third party before any legal process is in progress); the order is based on his claims that I stole the recipe and the court rules, ex-parte that Dropbox should grant him access to my files. This is done because my competitor’s claim was that Dropbox itself holds the files. Dropbox receives the order and does not know how to treat it: it is unable to understand whether I am the actual owner of the file or stole it, and has to provide the files to my competitor: an order is an order.

There are two material differences that come to mind between cases where I hold the information and where the ISP holds it, and such difference explain the problems of using cloud storage for such sensitive information: (1) If I held the material, the execution of each order had to be with knowledge of such order because the files were stored at my quarters and under my control [see, for example, RCA 1810/10 PCIC v. Kaplan, where a shared hosting provided was requested to reveal the email accounts of one of its users without their knowledge]; (2) The CSP has a rational indifference as to disclosing my information, as if it does not, it might incur liability. Israeli Courts ruled in several cases that active participation and interest in not removing content even after knowledge of infringement may incur liability [For example, C 176992/09 Eti Abramov v. Aviv Frenkel, C 32986/03 Buschmitz v. Refuah]. Therefore, the when you post information on the cloud, you are at risk that your information might be sought by other parties.

The question is whether it is technically possible to do so? meaning, could CSPs access your files? let’s say that, legally, Dropbox’s terms allow such use, and that other CSPs (such as google as providing email services) already ordered to reveal a user’s IP address (C 4854/07 Berlomenfeld v. Google) and disabled access to other accounts. Moreover, Dropbox (and let’s see Dropbox as an example) designed the architecture, it has the ability to recover my files and to recover my password, meaning that it can always bypass its internal security mechanisms.

2. Loss of Centralization;

Now, as we see it, when we discuss CSPs, we know that the control has to move from one centralized user to many distributed players, where each has the ability to disclose the information. At least prima facia, the CSP is considered as a 3rd party that either retains the information or processes it. In such cases, the Israeli Law, Technology and Information Authority has issued a draft set of regulations regarding processing by 3rd parties or outsourcing services.

Now, if I hold sensitive information on 3rd parties, and some of it is held on the cloud, then I have to make sure that my CSPs adhere to a privacy policy that protects my information. For example, if I am a lawyer, I have to notify Dropbox that I am one and that all my information is protected under an attorney-client privilege so that when they receive such Anton-Piller orders, they’ll refuse and defend me. Moreover, I have to make sure that my CSP shall not divulge any personal, private or sensitive information to any 3rd party either with or without my consent.

3. Protecting Yourself from Your CSP;
How can one protect himself from his CSP? Theoretically, there are a few suggestions for Encrypted Cloud Storage (for example, Kamara et al, “Cryptographic Cloud Storage“) which offer theoretical, yet to be implemented, method of encrypting information on the cloud. Generally speaking, their proposal is that “Before uploading data to the cloud, Alice uses the data processor to encrypt and encode the documents along with their metadata (tags, time, size, etc.), then she sends them into the cloud. When she wants to download some documents, Alice uses the TG to generate a token and a decryption key“.

Another technological option is to encrypt the virtual machine’s drive or to use encrypted file systems on cloud storage. Another option is to use an encryption software, such as TrueCrypt on your cloud storage service (such as Dropbox); however, such a solution may be problematic as Dropbox cannot access your filesystem and might have to back up your entire folder each time you change each and every one of your files.

A different approach may be to establish a secret sharing mechanism where the information may be distributed on several different clouds, each holding only a portion of the information (such as in Parakh et al, Recursive Secret Sharing for Distributed Storage and Information Hiding).

However,  these solutions are theoretical and have yet to be implemented by organizations or storage services as an integral part of their scope of services (maybe, apart from this one).

4. Solution[s];

Let’s discuss solutions as well. We need to form a strict set of rules of how to define a cloud system as privacy enabled. Our requirements are that the CSP shall allow: (1) seamless access to the set of files; (2) indexing and searching; (3) sharing parts of the information with 3rd parties; (4) reporting on each authorized and unauthorized access.

Mounting an encrypted virtual filesystem allows three out of the four: access, indexing and reporting. However, in order to share the information with 3rd parties, access to the filesystem has to be granted to the CSP (especially in order to allow sharing, see Y unqi Ye et al, Dependable and High Performance Cloud Storage). The other option is to encrypt each file differently (with different symmetric keys for each file so that no problems with sharing the files exist); however, such option shall not allow search and indexing (or require a central key database), therefore allowing three out of the four conditions.

Even if we assume that the encryption is symmetric, and that each sharespace between  users receives different symmetric keys, then we cannot define the solution as seamless, as in order to convert files from a privatespace to sharespace a client-side conversion of the files is required, as well as when files are copied from a private folder to the shared folder (also, a keyserver is required).

Let’s take, for the solution, Adi Shamir‘s secret sharing mechanism (Shamir, How to share a secret) and for the purpose of this solution define our efficient threshold as one (1) user. In such case, we define the shared folders with at least three cryptographic keys (one for the folder, to be shared with anyone, and one for each user) in such way, each user could read or write to the folder seamlessly, he could also index and search using his key (and the shared key), share the information with others (by adding another key).

Implementing secret sharing in such a case (which was yet to be tested) may allow enhanced privacy with the flexibility of sharing the information through networks and users.

5. Conclusions.

We have yet to implement a technological solution to a legal problem we might face in the near future. The much unrequired loss of control over data stored in the cloud, especially sensitive information, is inevitable nowadays due to current architecture, CPU and bandwidth limits and other problems.

However, theoretically and with a little hassle, an encryption based model may be implemented in order to allow storage of information on remote servers (i.e cloud) where the CSP cannot access the files but the end user may share such files with 3rd parties of his choice.

4 thoughts on “Privacy and Data Protection in the Cloud [For CloudCon 2011]

  1. For our solution to the problem raised here :
    http://www.sentry-com.net/CloudComputing.html.
    It addresses all the issues raised here , namely :
    1. Protecting yourself from CSP.
    2. Seamless access to files.
    3. File Sharing
    4. Audit.
    One extra thing : it is essential that no decryption will occur on the cloud itself , since no information may be deleted completely.

Leave a Reply

Your email address will not be published. Required fields are marked *