The Orange Pill Sto...
 
Notifications
Clear all

The Orange Pill Storage System: A point-by-point and exhaustively detailed answer to "What Decentralization Requires"

4 Posts
7 Users
3 Likes
2,105 Views
DoctorAjayKumar
Posts: 10
Topic starter
(@doctorajaykumar)
Active Member
Joined: 4 years ago

This is collaborative work between myself and Craig Everett (@zxq9).

Introduction

Hi, my name is Craig Everett. I am a distributed systems engineer.

Currently I am working with @DoctorAjayKumar on software to solve the Big Tech censorship problem. That software is called the Orange Pill Storage System (OPSS).

This is a detailed response to Larry's article called What Decentralization Requires

In brief, I agree with most of what Dr. Kumar wrote in his response. This post contains more or less the same points, with more reference to technical details.

Dr. Kumar wrote the following

By analogy, I came up with the following: that article is proposing a set of traffic laws for flying cars. It's not possible for me to meaningfully agree or disagree with it, because I don't know what constraints flying car technology imposes, because flying car technology doesn't exist yet.

Postulating about high-level rules for what a decentralized internet might look like is not a useful exercise until we know what constraints the low-level technology imposes. That insight cannot come until the technology exists and is widely used.

The basic technology required to create a truly decentralized internet does not currently exist.

Many of my responses to each bullet point include the phrase "application-level issue", which I believe is what Dr. Kumar meant by "traffic laws for flying cars". OPSS is that low-level basic technology that Dr. Kumar wrote about.

"Application-specific issues" are issues which may be important, but that we can't discuss meaningfully at this point. We either do not have adequate knowledge about constraints of the low-level technology, or the issue is something that is decided on an application-by-application basis. OPSS is infrastructure, it can't and shouldn't solve application-specific issues.

All this being said, I have a lot of domain knowledge about what constraints OPSS is likely to impose, and which constraints are independent of OPSS.

So without further ado,

Bullet points from the article

Here I will address each bullet point laid out in that article, and later circle around to re-articulate the central point.

Principles
  • Self-ownership. Each user owns his own identity in the network.

OPSS achieves this by making presence on the network tied to a node, and each node is owned by a user (their "home" on the network)

  • Data ownership. You own your own data; you control your own data, within the bounds of controlling law.

All data has a canonical origin at a specific node

  • Platform-independent following. You control your friend/follower list independently of all platforms. Hence, once a friend follows you on one platform, he should follow you forever everywhere until he unfollows you or you block him (or there is a lawful government order compelling a change).

This is an application-level issue, not an infrastructure issue. The critical thing is that the infrastructure layer support creation of applications that have this trait. No current distributed system or set of distributed systems provide the features required to implement applications that can do this, but OPSS does.

  • Platform-agnostic posting. Posting on one platform means posting the same thing on all platforms that are part of one big decentralized network.

This depends on the definition of "platform". In the case of OPSS-based applications, "posting" means posting to a node, and as the base case is for nodes to equate to a user's (or users') network presence, this criteria can be said to be satisfied by OPSS-based applications.

  • Decentralized monetization. Content monetization, which is ultimately an absolute requirement, cannot be performed by a single, central, controlling body or system, providing identical outcomes. So it, too, must be decentralized.

I misread "moderation" as "monetization" when I first read through the article, and was writing this up. However, this is an interesting point to cover, so it is included anyway.

This is also an application-level issue, not really an infrastructure layer one (in the same way that TCP and UDP are not monetized, but plenty of things built atop them are). OPSS is flexible enough to permit any sort of monetization scheme one might decide to set up, to include directly tacking on existing web-based advertising, proof-of-work systems (Ethereum contracts would be relatively easy to tie in), or whatever else one might come up with.

  • Decentralized moderation. Content moderation, which is ultimately an absolute requirement, cannot be performed by a single, central, controlling body or system, providing identical outcomes. So it, too, must be decentralized.

This is an issue you don't have if everyone owns their own data.

  • Single conversation. Therefore, there is one giant integrated conversation, but parts of are not shown to people who don’t want to see it (or in places it’s literally illegal). Of course, it is still legal for people to run closed, walled gardens; but they’re not for general broadcast.

This is the opposite of decentralization

  • Anti-monopoly. Therefore, also, no corporation has anything like a monopoly over the means of social media broadcasting, as at present.

The beta version of OPSS would still rely on, at the very least, a node to make contact with a central registry or handful of canonical registry nodes before discovering the rest of the network. There are workarounds possible for this and a complete version of OPSS would suffer far less from the initial reference problem. All peer networks suffer from this problem and it represents the single point of potential vulerability to the system. The good news is that any number of registry alternatives can be set up in parallel, so it is not possible for a central authority to knock the canonical registry offline and break the network and it is also not possible for the central registry to become a rent-seeking monopoly for the same reason.

Requirements
  • User exportability. Platforms should permit users to export a complete and unadulterated copy of their user data from the platform and host it elsewhere. Moreover, public user data that is edited by the user in one place must be brought current with all other copies made elsewhere as well, in a timely fashion.

In OPSS the originating node owns all data, so all data is already physically held by its owner. Exporting the data becomes then an application-level issue in the same way that exporting one's email user data via VCS or JSON is an application level issue. Public data (whether about a user or not, again, the concept of users is itself an application-level issue) is optionally subject to a versioning scheme, so while data cannot be magically synced across all nodes (consistency), a category of data is defined in OPSS that must check with the originating node to retrieve the current version before initiating a torrent of public data stored in the distributed cache.

  • Data exportability. The user’s data must be easily exportable in a common, easily machine-readable format, according to a widely-used standard. This is an absolute minimum. Not many actually support this yet. This isn’t enough, though, because you need to be able to export your followers, too, and to do that:

At the level of OPSS this is the same issue as the above: an application implementation detail. The storage layer itself cannot know anything about this or else the system will be conflating concepts. This point (as well as the user exportability above) is a requirement that applies to an application or applications suite but not the data layer itself.

  • Interoperability. The social media platform must be made as interoperable as possible (at the user’s option). So I should be able to subscribe and follow someone who is posting on his own blog, or Mastodon, or Gab, or Parler. I should be able to post and read from any of these networks, and the data should appear in a timely fashion in all the rest.

This is purely an application-level issue, but a nice goal. I would be careful about saying this is a requirement for any distributed system, as interoperability has been the siren song of many a dead project and is simply not possible in any pure form because computers do not know the meaning of the bits they manipulate (humans give them meaning, as computers lack any knowledge of "self" -- the central problem of strong AI). Having a standard is good, but in practical terms it is more important to encourage application authors to write data converters that can do things like request data from the Gab or Minds web API and handle that data in a sensible way.

  • Data inalienability. If the user’s data is not actually served from outside of a platform—which should be possible—then it is treated by the platform as if it were. The platform is merely holding the data on behalf of the user, as a service. The platform must not treat the data as “theirs.” This is still a rather vague requirement, but it has specific consequences. One of them would be that the platform is absolutely not permitted to delete or edit a post from your data, although they can of course opt not to post it on the platform. Twitter and Facebook violate this principle when they fail to retain copies of posts that they delete.

Each node holds its own data. Nodes could be rented out to users, of course, in a sort of paid VPS scheme. In either case it becomes necessary to determine whether the responsibility falls on OPSS itself to be capable of data export or a given application based on OPSS to be capable of data export. My feeling is that both are true: on OPSS node should be able to clone its data as an opaque bundle that can be immediately read in as current state by any other OPSS node (OPSS can only think of data as opaque as regards the actual contents of data -- it is only aware of the metadata applicable to making the data storage, access and retrieval functions work), and application authors should be encouraged to make a variety of export options available that deal with the data at a semantic level.

  • Moderation. Individual users, or whole platforms (if users should wish to use them), should be able to select their own moderators. Moderation data, or metadata—such as that a certain user should be blocked, or that a certain post should be hidden or flagged in some way—should be shared in a way similar to how the user data and content itself is served (so, across the network in a decentralized way), and independently of the user’s canonical copy of the data.

This is entirely an application issue. OPSS itself, much like a filesystem, cannot have any awareness of the concept of moderation, but applications certainly would. That said, each node would be responsibile for policing its content, so per-node moderation would most likely become the model that applications would adopt.

  • Text representation. The user’s public data must be syndicated in a lo-tech text-based (more human-friendly) format such as JSON or XML, even if they have an API (maybe I don’t want to be forced to use their API, maybe because it’s too restrictive). The purpose of this is to enable the user to more easily exert control over the source or original version of his own tweets. This text stream, if it still exists and the author’s control can be proven, becomes the user’s personal assertion or attestation as to how the state of his personal feed should be represented; this human-friendly data representation of the content becomes the controlling, “canonical” version of the data. No other representation, in no other data medium (blockchain, IPFS, bittorrent, or otherwise), is to be regarded legally or operationally as “the canonical version.”

This is also an application level issue. OPSS knows how to get bits from A to B and determine that user or node X is allowed to retrieve the data at or keys needed to access whatever is referenced by address Y, but the concept of users themselves does not exist at the OPSS level (much like how the concept of users is built atop a filesystem and set of kernel primitives, but is not a concept native to either by themselves).

  • Permanence (or uncensorability). By network policy, the user’s public data must also be able to be made available forever (so a particular platform couldn’t delete it on behalf of everyone else, even if they wanted to) via bittorrent or IPFS or the like. Maybe the blockchain is OK, but frankly due to the financial complexities involved in blockchain, I don’t trust blockchains as bittorrent-type “decentralized public cloud” storage.

Data in the public cache could become "permanent" in the event they are requested frequently (and therefore refreshed constantly), but it is not possible to claim permanance as a core feature, and this is also not desirable as it tends to result in a freenet type situtation where early saturation of the space by content considered unfavorable by most people (typically content commonly censored for moral reasons in various jurisdictions, "morality" itself being a highly variable concept) and this discourages adoption of the system by the majority of people as the system itself gets a bad name. That said, as long as a node can contact the network and nodes being the physical canonical origin for any given piece of data, it is not possible (or at least not easy) to censor or remove content its publisher does not want to remove. That is, a 2nd or 3rd party cannot arbitrarily remove content from the network, though law enforcement possibly could by tracing the originating node (and this is a feature, not a bug: ultimately accountability for what is posted lies with the poster, not any 3rd party, amplifying node, or search engine).

Circling back

The most important of the principles is data-ownership. Enforcing that forces all of the other design choices.

Here's the data-ownership principle, as far as OPSS is concerned:

If you like your data, you can keep your data. But as soon as you put your data on someone else's computer, it's no longer your data.

Now let us walk through some examples to illustrate how this forces design choices.

Usage case 1: sharing a graduation video

Your kid is graduating from high school. You don't want to make the video public, but you do want to share it with your family.

At the moment, the only way to do that is to put the video on a social media site like YouTube or Facebook and then share the link. That means Facebook or YouTube owns the data.

From the technological standpoint, there is currrently no way to share data without surrendering ownership of the data. We can't completely fix that. However, we can certainly improve the situation a lot.

What we can do is give you a way to streamtorrent the video, on-demand to your friends, with access control, and without ever having to upload it to some central system like YouTube or Facebook. There's a distributed caching system so that your home internet connection doesn't get overloaded.

We can use a pretty straightforward cryptography scheme to implement access control. So even if someone intercepts the stream, if they don't have the correct key, they can't read it.

We can't stop people on the receiving end from recording the stream, or compel them to delete the stream. Nor should we attempt to, as the only way to do so is to implement decentralized DRM.

There's a ton of issues in making a system like this user-friendly, such as letting your audience know that the video exists. But it's doable.

Usage case 2: Joe Rogan Podcast Stream

Joe Rogan wants to stream his podcast, and he doesn't want to worry about censorship.

In this case, what OPSS can offer is streamtorrenting. Rogan streams from his lair. Viewers get data both from Rogan's lair and from each other.

Usage case 3: The Decentralizers Forum

This case also covers something like Twitter or Facebook.

These types of situations are more challenging, because it's not as clear who should own what data. There's tradeoffs to be made. One goal of OPSS is to make these tradeoffs plainly visible, and allow an application developer to cleanly choose which side he wants to take.

The tradeoff here is performance versus privacy. Here's the two extreme ends of the tradeoff

  • Performance side (make things fast). You would basically copy what Wordpress or BBS does: have all the data on one central node. Users don't own the data they input into the system. Maybe you use OPSS to make the website load faster and to solve the Slashdot effect.

  • Privacy side. All posts by $user are served by $user's home machine. So if his power is out, or he turns his machine off, his posts aren't visible. The central server stores the comment tree data structure, which merely points to addresses where people's comments come from, but doesn't store the comments. Users can delete/edit posts or go dark simply by turning their computer off (plus or minus cache delay).

What these all have in common (the problem that OPSS solves)

At the core of all of these systems, there's some very basic problems:

  1. Where do bits live?
  2. How do you get bits from point A to point B?
  3. How do you handle the Slashdot effect (servers melt from going viral)?
  4. How do you implement access control?
  5. How do you make the system user-friendly?
  6. How do you implement social network features like aggregation, content recommendation, and "you might like this person/channel/account"?

OPSS solves issues 1-4. No existing open-source distributed system can make that claim. Issues 5 and 6 need to be solved on an application-by-application basis.

The best OPSS can do on #5 is to make OPSS as invisible to end users as possible, and make itself as easy to use by application developers as possible.

Issues 1-4 have the properties that

  1. They are difficult to do correctly
  2. Most developers find them boring
  3. Making a mistake is very costly
  4. The requirements are exactly the same in every application
  5. Everyone using the same solution is concave: the whole is greater than the sum of its parts.

Property (5) is true of OPSS for the same reason that it's true of BitTorrent: the more people who are torrenting something, the faster it is to torrent it. The more people who are using OPSS, the faster it is. It would be quite the waste for a each application to implement its own distributed data store, because it's faster if they're all using the same system.

There's a lot of ways to skin a cat. We're selling knives.

3 Replies
zxq9
Posts: 23
 zxq9
(@zxq9)
Eminent Member
Joined: 4 years ago

Bingo! Good job integrating it all into a single document. Nice.

Reply
Posts: 15
(@blademccool)
Active Member
Joined: 4 years ago

regarding the manifesto claiming that IPFS is deficient in canonical origin of data and access control, I am not sure this is really the case. It seems trivial to sign data with a private key that the peerID/public key is derived from. If the content claims to be from peerid xyz and is signed by id xyz and referenced by the result of looking up xyz IPNS name ... then is that not satisfactory? An access control should be as simple as putting a frontend in front of the node that denies pin access from anyone unauthorized. Or am I missing something?

Good luck to you and hope to see a link to try out whatever frontend ui you end up putting out on a test server. 🙂

Reply
1 Reply
zxq9
 zxq9
(@zxq9)
Joined: 4 years ago

Eminent Member
Posts: 23

@blademccool Identifying who put the data up is not what "canonical origin" means. Canonical origin means being able to locate the *network* origin of the data, not just whose key signed what. Without canonical origin you cannot write applications, you can only distribute data. Data is only 1/3 of an applications platform.

Reply
Share: