Obscuring data from Service Providers

In the midst of my toughts to enable people to own their content, I stumble upon the problem of limiting the Service Providers ability to cross users data. In our current situation, Service Providers are able to generate statistics from users data. It enable them to sell contextualized ads, check relations between people, analyze photos, etc. Users have absolutely no possibility to prevent that from happening. Even if people store content out of those sites, they will lose control of their data once they delegates them to a given site.

It could happen that some users may accept that exploit because they get services or a direct earning from this. As of today websites like Facebook or others get that for free.

One can argue it is their precise business model to do so, but I would not agree. Let’s take a metaphor to compare that situation to other business situations. Let’s imagine you are a large petrol company and you want to exploit an oil field you discovered abroad. Could you image you might be able to exploit it without paying any license fees to the state government it belongs to ? Could you imagine not to pay fees or taxes related to the volume you extract from that field ?

If you have answered No to both questions then you have understood my point and yes you are presently an open oil field with absolutely no fees or taxes paid. It is in fact an unfair / unbalanced business and it gives you a simple explanation of the current leap forward in profit of those large service providers. In a way, users could ultimately be considered as unpaid employee but no, guess it, you are only users :).

I hope the reason I am thinking about that is clear to you now (even if the digression was a bit too long for most of you…).

One can ask: how much worth my data ? I would answer: not much, taking yours alone but a lot putting all users data together.  Some can precisely answer that question: your online services providers and ad broadcasters.

So now, I am able to come to my idea for today: content filtering browsers.

Imagine for a second the whole Internet suddenly speaking Latin with blank medias everywhere !? A stupid browser would only display that fake content to you but a smart one would retrieve much more information. It could query your content provider to retrieve your real content using metadata provided in the fake content (latin and blank media).

Yes, the idea is that services providers would only store fake contents containing metadata and only the people with the right identity could retrieve the real content from your personal store.

What difference does it make ?

Basic answer is: by putting your data under a content service control, your are able to control who is accessing it. You may put authorization on it and you are even able to monetize them (logging who is requesting them and relying on a definite contract with to monetize). This monetization could end in real money or any other kind of money that you may consume using services. You may authorize your current service providers (like Facebook) to gain access to those real resources but your are also able to remove the authorization (answering fake content or an error if you are fair).

How could we do that ?

Simple: remember how you publish content on twitter? yes, short urls. In our case, those short urls would point to your real media or textual content (eventually zipped) using a content provider URL.

At the editing stage, your brower should provide you a function to type or upload data when editing content from your service provider. It then should submit your real content to your content provider and retrieve a short url in return. it will then incorporate the short url in a basic latin text or inside the metadata of a fake image or media you upload. The good news is that, even when you are editing a complex content, it is basically contained in the browser, so it is always able to get it and replace it with a fake content (even google apps…).

Why do you require a change in browsers ?

Basic answer is: the only way to prevent other to read your data without your permission, and your service providers in particular, is that they never receive the real content. There no other way than extending the browser to achieve that…

How much work does that require ?

  • Define a new standard for data indirection: simple.
  • Implement open-source libraries to bring client and server reference implementation: easy.
  • Upgrade the four main browsers with the new protocol for media and data editing: medium to complex. Open-source browser like Firefox should come first… Extensions may be developed to demonstrate principles.
  • Bring the standard content services into infrastructure: new service providers will be happy to run new business (even current ones would like that)

Provided the fact that the new protocol would enable you to export your data when you want and to import them to another content service provider. It enable people to get control over their content and the service provider they choose to trust.

For a presentation of the content service provider idea, please read my OpenSafe article.

Free your profile and your data: maturity in the web of services (SaaS)

Nowadays social websites monopolize the Internet efforts around user expression and social networking. The main problem users are facing now is that their personal data and identity are own by those social websites they choose to register to. A main concern for people is now to recover freedom from those distributed local “prisons” and to enable them to capitalize on their data and profiles independently from the service provider websites. To date, it is solely the motivation of services providers that enables transversality between services providers and it is not freedom.

This article promotes a vision of the web where Identity, Data and Services get clearly separated. As users, we really need to break the current trend of large corporations and ‘one fits all’ model that we progressively face on the Internet. The willingness of existing service providers to integrate between themselves is not sufficient because it will simply recreate the spaghetti plates that we face inside our local systems and we will all lose the benefits of the SaaS paradigm when it will come to integration. The vision I promote offers an opportunity to really do better than that and it can be done with simple solutions.

We should consider this separation (Identity, Data, Services) as a new state of maturity for the Internet of services (SaaS). This vision may take time to achieve so we promote a solution that brings a progressive path of transformation for SaaS providers. It should help them to shift smoothly. This transformation will bring real gains for everybody.  Those gains will be of different nature depending on the actor:

  • Users: Freedom of ownership; pervasive accessibility and control of information and identity; syndication of their content across websites; ability to shift from one service provider to another; ability to exchange data between service providers and make them cooperate; ability to own their data;
  • Service Providers: Virtuous transparency around user’s data and identity; clean contracts with users based on an identity that is managed elsewhere; progressive standardization of the information; interoperability and composition of services with limited integration; reduced infrastructure investment (data & identity may be stored elsewhere).
  • Identity Providers: Standardization of their responsibility; extension of their role as a safe place where users store their identity description and points to their contracts with service providers and their personal data store. Users should be able to have a unique identity across services providers if they want to. I think that OpenID is the right path on this matter but it should narrow its ambition to a definite role. Metadata are in fact data and I doubt that duplicating responsibilities with data providers is a good idea. Only bootstrap problems should be addressed with metadata stored inside Identity Providers infrastructure.
  • Data Safe providers: a global new business model implying trust and openness by nature. Those providers keep data on behalf of their users and respond to authorized access from Service Providers. To let people understand the shift, I may use a simple metaphor. I would take the historical experience of personal money management: in the good old days, people were keeping their money under their pillow, then banks appeared and everybody understood the value of a bank compared to their own ability to keep money safely. They gain security and services around it (I know lots may argue currently on that point :)). It is solely the same principal with data. People need to trust intermediaries that would enable them to use their data as they want, get simple connectivity with Service Providers, get long-term availability, get strong security. Offering Data Safe Provider connectivity should be an obligation for Service Providers in the end and using them should stay an option for users. Data Safe Providers should have absolutely no intelligence about the data themselves: they store, they serve and they enable syndication. As Tim Berners-Lee has taught us:  on the Internet, we live in a domain of representation. Data can always be exported: For media, we have defined mime-types, for structured data we have RDF triples. Which service providers could argue that he cannot export your data? The point is not on formats (maturity on that matter will come from the community), it is on the protocol and a true willing to implement it.  Enabling dynamical publication and notification about data is not a fundamental problem because it can be done asynchronously (no performance problem). If we expand our vision to the enterprise space, being able to share information between service providers is a fundamental problem to solve (as for Identity by the way). Even if the current trend for enterprise seems to turns from SaaS to virtual clouds (virtualization is simpler than replacing existing applications), enterprises still need simple solutions to expose information.

In this model, we should define two things as progressively enforced standards:

  • The Identity Provider: One standard should be available for every Service Provider and Data Provider. It will enable users to manage their profile across the providers they choose to work with and to trust. They should manage two kind of information: the identity itself and some standard metadata associated to the identity. Those metadata should have precise characteristics to ensure user privacy and to enable other providers to work with. As previously mentioned, OpenID seems in the right path for that. Identity Providers should not offer other services directly. If a corporation wants to bring other services, it should provide other websites and another infrastructure with absolutely no relations with its Identity Provider activity (strong regulation should take place here to ensure a progressive separation of concerns).
  • The Data Safe Provider: one standard should be available to manage contracts of trust with Service Providers and to describe a protocol giving access & notifications about data in the store. If not, data provider will never exists because no service provider will accept to lose control and to integrate with all declared data providers using specific protocols. Data Providers should not offer other services directly. If a corporation wants to bring other services, it should provide other websites and another infrastructure with absolutely no relations with its Data Provider activity (strong regulation should take place here to ensure a progressive separation of concerns).

For the time being, I am working on the Data Provider standard side, I call it “OpenSafe” because we all need a personal strongbox that is open and independent of service providers. The principles I have defined around it are the following:

  • Our content is our property.
  • We may delegate their use to multiple websites provided that we stay aware of their life cycle (CRUDMG: create, read, update, delete, metadata, upgrade format).
  • Content use is negotiated with each service providers and we share a contract to trace that.
  • OpenSafe delegates operations to Service Providers and offers no other services than content life cycle operations and notification services with definite ACL.
  • Each content asset will be associated with only one Lifecycle Manager that will be responsible of its whole lifecycle. It could be either the Open Safe Provider or one definite Service Provider called the Main Service Provider in that case.
  • Lifecycle Manager must offer all asset operations and a lifecycle notification service. Those operations will be made available externally only to the OpenSafe Provider.
  • Service providers that are not the Lifecycle Manager of an asset must use the OpenSafe Provider to do operations and stay aware of content lifecycle. The OpenSafe Provider plays the role of a proxy to the Lifecycle Manager and it ensures separation of concerns and traceability. It filters access with respect to policies negotiated within each Service Providers contracts.
  • As a consequence, OpenSafe ensures syndication of content and content lifecycle notification (asynchronously) to all Service Providers but the Lifecycle Manager.
  • OpenSafe may store a shared representation that could be expressed differently inside each Service Providers infrastructure.
  • Service Providers may keep copies of an asset but must offer life cycle aware services to keep track of content changes.
  • When OpenSafe delegates asset storage to the main Service Provider, it must be kept aware of the asset lifecycle and get a definite URL to retrieve the content standard representation at any moment.

I will publish updates regarding that matter in the coming weeks.

I would be pleased to get your thought about the present description.

Data providers should be long-term trustees, Identity providers should be long-term trustees but services providers may be not long-term trustees.

Identity Providers must be long-term trustees because they represent the fundamental link between the user, its data and all the services he has registered to (contracts). Who is relevant for this job? Large companies with specific regulation, country administration, … All may have a role but it is a difficult matter and, for the time being, I didn’t spend sufficient time to bring a thoughtful point of view (suggestions are welcome !).

Data Providers are also special and long-term trustees because they should ensure long-term storage in particular. With current service providers (most of them), we have absolutely no way to trust the availability of our data on the long-term. The fact that Service Providers are compliant with a definite data protocol ensures that users may shift from one to another. It is important to note that this fact should also be true between Data Providers themselves!

Regarding data, I think that bank and insurance sector should have an eye on this. They may have a role as traditional long-term trustees despite the fact that it is a fundamental new business.

Regarding contract management with all those providers, we have fundamental problems here. Each day, individuals are accepting conditions from service providers on which they have very little power of influence or legal recourse. Only communities are able to put pressure on that today. International laws are unbearable way of doing as an individual, that it not a good thing when you come to think of tomorrow (more and more services delivered using Internet). Something has to be done here, something deep around principles. I think there will be two parts for that: contracts around things only related to internet and users (sharing information in particular), contracts around things that have side effects in the real world (classical regulation would partially fit on that side).

I fundamentally think that Internet should be a space of freedom of choice and freedom to act by default. As in the real world, part of the space is regulated and the rest relies on personal behavior and community pressure on the way things should happen. Freedom enables creation, regulation organize the trust when and where needed. If regulation is mandatory everywhere you lose freedom and creation capability (decision point becomes centralized)… If you let it go with no rules, you never get organized and people face private service providers as individuals with great difficulty to free their data and profile once they sign for a service. We have to find equilibrium.

Multiple sources of influence can be reviewed to find a path to that transformation. Let’s examine the various alternatives available today:

  • Free Software: This paradigm would promote the ability to own your proper infrastructure and software for identity, data and services. It would enable people to be independent. The problem is that Free Software does not help people to keep independence if they want or need to use private sector services that lock them. Free Software will simply offer technical alternatives and put the end user responsible to integrate and make them work. It is not a real choice… In the web of services, the true value of Free Software may be more the community (people, their ideas and their ability to put pressure) than the software.
  • Open Source: Software, coming from Open Source, is a strong mean to put pressure on existing service providers by enabling new ones to compete. It brings a momentum of transformation. To put it simple, Open Source, and Free Software also, should bring the innovation & standards and the private sector will bring finalized products and industrial power. By choosing the right license model, we will also enable people to own their infrastructure if they want.
  • User communities: For the time being, when something goes wrong with a service provider, people may rely on law to solve the problem but using the community is usually a stronger way to put pressure. If communities decide to get their data and identity back using our principles, they can bring a tremendous power of influence. Data being outside of Services Providers infrastructure, it would enable people to shift to another Service Provider in a glimpse of an eye and everybody should appreciate that power of choice and the flexibility to share. Users communities seem to be the ideal force to influence their service providers. Will they want to be free? That is a philosophical question I would love to test!

To achieve those gains at the scale of the Internet, we need a way to influence the current business. The fundamental shift can surely be made a reality through three main cornerstones and a fourth one that, hopefully, will come in time: communities that will put pressure on existing service providers, standards that will formalize the principles, software that will prove the concepts & enable new comers and after that legal regulation. We may even hope for an international regulation on this matter but it will take some time.

I hope those ideas will make their path to communities and organizations, even country leaders, because we now need to organize the Internet as people live more and more with and within it. The real world is our basis of comportment – we are coming from it and it has forged the way we act – and it should be used as an inspiration of what we want on Internet: freedom and ability to organize. But Internet is different from the real world so different rules must be found. Its computational nature is a potential threat to human freedom if not managed.