Pulling At A Federated Thread

July 7, 2023

Perhaps the most consistent experience of the average Internet user is the growth of our favorite platforms into “all-encompassing” services. Facebook grew from a place of status updates and vacation pictures to a single-service provider for everything you could need on the Internet. News, entertainment, video streaming, art portfolios, chat rooms, community organizations, yellow pages, dating service, classifieds, and even financial services. At this point, if you can do it on the Internet, you can probably do it on one of Meta’s services. Meta makes video games, develops hardware, integrates into auto infotainment systems and is now developing AI. And if you can do it with Meta, you better believe you can do it with a Google service or an Apple service, or, if recent news is to be believed, soon a Twitter or Reddit service.

The walled garden, centralized services are merely extensions of the mega-conglomeration. In its lifetime, General Electric sold everything from light bulbs to power generators, from microwaves to generalized computers. The tech conglomerates of today develop or acquire each of these diverse services under one roof, generating what we call “platforms”. They do so to extract data from users to sell advertising and AI services to other businesses. The best method for extraction, goes the current wisdom, is get more and more users and keep users on your platform generating data only you own. The platform, therefore, is the the combination of the extracted, visible, user data (i.e. the user generated content), combined with the hidden data (i.e. metadata and information generated while users interact with the platform called behavioral surplus) and the tools which limit how users interact or generate data. I call this last element the viewport, as this element of the platform dictates how the views, interacts with, and generates new data for the platform.

Unfortunately for the tech giants, the conventional wisdom is showing cracks fail. Most tech platforms, even the largest, are not profitable in their existing models. Some platforms have started pushing limited value-generating users off their platform or into higher-value, premium plans, such as Twitter driving users to their premium service or Google threatening to delete “inactive” accounts. At the same time, platforms like Twitter and Reddit are bleeding users due to their efforts to cut costs and increase revenues, clashing with the fundamental philosophies that grew the platforms. As investment funds dry up, and users chafe at new revenue generating schemes, the largest platforms are looking for novel ways to generate new, valuable data streams with limited investment costs.

Lucky for the platforms, the Fediverse is ascendant.

The Fediverse Unravels Data From Viewport

If you are reading this article, shared on the Fediverse and published on a Fediverse blogging platform, you likely do not need much explanation about the fundamentals of decentralized, federated networks. As a brief, high-level explanation, the Fediverse represents large pools of data generated by a variety of interoperable services deployed in instances by thousands of independent people, organizations, and businesses. Each instance can connect to other instances through a common protocol such as ActivityPub which allows data generated on one instance or service to be communicated with other, connected instances and services. These connections form networks called federations. Instances are free to federate or defederate with each other, which can create independent federations, and thus, independent data pools.

To be clear, with the ActivityPub system, the data is stored and duplicated across the federation. This is distinct from the ATProtocol employed by BlueSky where data is stored in centralized pools distinct from the user viewports.

This federated model presents a paradigm shift in user interaction. In the days of chat rooms and web forums, each room or forum represented its own, walled platform. Users could only interact with other users of the same room or forum and data portability between forums or chat rooms rarely, if ever, happened. As such, communities rapidly grew and collapsed and few communities were able to grow to stable sizes.

With the growth of the current social media environment, the platforms developed one-directional data portability, in which users outside the platform could direct data to the platform. For example, the Digg “Digg It” button allowed users to submit links to Digg straight from a blog or news website. The Facebook “like” button functioned much the same way. Twitter developed APIs to embed tweets in blog posts, giving users a route back into the platform to interact with the tweets.

Such one-directional data portability helped the platforms expand by invading the territory of other websites, but did not allow for meaningful interactions across platforms. If you share a tweet on Facebook, your friends will either be able to leave comments on your Facebook post or go to Twitter to reply to your Tweet, but not both. YouTube comments are not automatically added to the Instagram comments when a creator shares in both locations. Users are effectively isolated by the high walls of the platform and any “interoperability” built into the platform is designed to increase engagement on the platform and decrease reasons for the user to leave.

This String Goes Both Ways!

The Fediverse offers true bidirectional communication. This is key to creating a viewport agnostic approach. A user can post a toot on a Mastodon instance and another user can view and interact with that post on a Kbin instance. Users can comment on PeerTube videos using Lemmy (or may soon be able to as the platforms continue to develop). Different Mastodon servers can also provide wildly different experiences based on content moderation, server defaults, and federation decisions, without impacting the underlying data generated. In essence, the data the users generate is distinct from the viewport in which the users generate and interact with the data. The viewport reproduces, arranges, or filters the data generated by users in ways that are meaningful to the users of a particular service; however, the same data can be reproduced, arranged, or filtered in different and unique ways on another instance or another service.

This interoperability opens the world of user interaction to an incredible scale. With a viewport agnostic approach, users can choose a viewport they enjoy and interact with the world at large with limited need to go elsewhere. All of the content I’m interested in brought into one place. Consume my videos, share my photos, chat with my friends, and browse the recent global conversations, all in one viewport on one server managed by real people who share a since of community with me.

It is a nice dream, but it comes at a cost. This interoperability goes both ways, both for the good of the user, and for the good of the surveillance capitalist.

The Threads Are Loose

Imagine this: you run a multi-billion dollar company that largely exists to extract data from users so you can analyze it and provide predictive analytics to advertisers. Your platform is ubiquitous, and your expenses are legion. You have lost a lot of money on silly projects like virtual reality or streaming video games or implementing a video platform from scratch or taking on billions of dollars of debt because a billionaire thought it would be funny. You need to increase revenue or cut costs. What do you do?

If you are Google, you declare manifest destiny and claim ownership over all the data in the world. This conquest will explode costs for Google, but they are gambling that becoming the sole emperor of Internet data will pay dividends. And such an undertaking requires Google data storage and compute capabilities. You are just a small, multi-billion dollar platform. Well, you could drive unpaying users away, by shutting down free access to the website, and try to maximize profits from your existing users while simultaneously firing all of your employees or community moderators that keep your platform running. I understand that to be working wonders for Reddit and Twitter.

Or you do something truly radical. You develop a plan to access an untapped reserve of user data, and you don’t need to care if they are on your platform or not. You tap into the Fediverse.

Meta’s Threads launched this week to explosive results. Users are encouraged to essentially port their accounts, content, and networks from Instagram to Threads. Within a day, Threads had 30 million sign-ups. At some point in the future, Threads will join the Fediverse. When it does, it will be the largest viewport in federated space.

When it joins, Facebook will have easy access to every instance that chooses to federate with Threads, and, more importantly, easy access to the data of non-Threads users. As more people are able to engage on the Fediverse, the Fediverse will likely grow, increasing the pool of data Meta can mine. People may not stay on Threads or will not want to engage with a Meta product, so they will go to other services. This grows the scope of the Fediverse, hopefully developing a massive and thriving network of independent, but interoperable, services and instances. Whether they stay with Meta or go on to other services, the federated data pools will grow, or so Meta is betting.

It won’t matter to Meta if you use Threads or not, as they will get the data regardless. Just by accessing the Fediverse, Meta will be able to mine an ever growing pool of data, and easier than their prior methods of mining non-Meta user data. Meta has a history of tracking people outside of the platform, even those who do not use a Meta product. While tracking cookies and other fingerprinting strategies may get limited by Google and other browser developers, Meta will need new avenues to access untapped data streams. Data that may have been otherwise difficult to get. With the Fediverse, the door to those data streams is wide open.

Closing the Loop

So what does this mean for the average user? Should we abandon the Fediverse now and simply remain in the walled gardens of before. I don’t think so. Before the Fediverse, everything you did or created on the Internet was scraped, collated, and analyzed by the big tech companies. They created tools to make it easier for you to give them your data, but they were going to access your data in any way they could. Simply because big tech companies will access Fediverse data through the the interoperable approaches does not materially change our role as data raw materials.

What does change, however, is our ability to limit how we interact with the tech platforms. If I can access the viewport agnostic data pool, I do not have to choose to use the tech platform’s data extraction maximizing viewport. For example, I can stay on my Mastodon instance and assuming my instance does not defederate from Threads, eventually access the content through my chosen tools. I will not have to use Meta’s viewport with nearly more mandatory permissions than can fit on the screen. Nor will I have to subject myself to Meta’s advertising heavy and algorithmically dependent viewport. I can access the data in the way that best suits my needs.

Or I can join an instance that defederates from Threads entirely. This is truly the nuclear option, and carries some downsides. Communities only thrive when people want to be in them and are willing to make them work. If we aggressively isolate instances in the Fediverse, we will may crack the network into separate federations, limiting the effectiveness for everyone. If the smaller federations are too damaged, they will wither as users migrate to other, larger instances, and communities can suffer.

And either way, Meta can scrape the data themselves.

With federation, we have more options on how we engage online. We are no longer beholden to the moderation and management decisions of a large platform and can instead form smaller, closer nit, and healthier communities. The decision to federate or defederate from Threads when the option comes will be a difficult one and should be taken with care.

For what it is worth, I believe the benefits of the Fediverse outweigh the potential harms of the tech platforms joining. I’ve waffled on this issue many times. I want the Fediverse to grow and to do that, it needs users. That said, I do not wish to be on a platform that makes it too easy for Meta, or any tech company, to exploit or claim ownership over data that is not freely given to the platform. We need better data ownership controls in the Fediverse, such that Meta cannot profit off my content without my consent, but such controls would come from a legal rather than technical solution.

Given the risks and benefits of federation, I’m not sure how any one instance should act. Each instance should evaluate what is in the best interest of the community members, and go from there. For my two cents, I hope my instance chooses not to federate, at least not at first. We need more information about how Meta and Threads users will act in the Fediverse, and to protect what has been built so far, I think a default position of nonfederation gives users time to understand Meta’s actions and impacts on the Fediverse, without initially risking the health of the community.

In any event, the future will be many things, but most of all, I believe it will be federated.