The Matrix protocol and its implementations do not provide the privacy you expect from them (matrix notes by Anarcat) (anarc.at)

submitted 2 years ago* (last edited 2 years ago) by astramist@lemmy.sdf.org to c/privacy@lemmy.ml

40 comments fedilink hide all child comments

Some interesting notes on the Matrix protocol, its limitations and comparison with IRC.

A few crucial quotes, as the article itself is voluminous (but very exhaustive!):

Compare this to Matrix: when you send a message to a Matrix homeserver, that server first stores it in its internal SQL database. Then it will transmit that message to all clients connected to that server and room, and to all other servers that have clients connected to that room. Those remote servers, in turn, will keep a copy of that message and all its metadata in their own database, by default forever. On encrypted rooms those messages are encrypted, but not their metadata.

In a federated network, one has to wonder whether GDPR enforcement is even possible at all. But in Matrix in particular, if you want to enforce your right to be forgotten in a given room, you would have to:

Enumerate all the users that ever joined the room while you were there

Discover all their home servers

Start a GDPR procedure against all those servers

Overall, privacy protections in Matrix mostly concern message contents, not metadata. In other words, who's talking with who, when and from where is not well protected. Compared to a tool like Signal, which goes through great lengths to anonymize that data with features like private contact discovery, disappearing messages, sealed senders, and private groups, Matrix is definitely behind.

This is a known issue (opened in 2019) in Synapse, but this is not just an implementation issue, it's a flaw in the protocol itself. Home servers keep join/leave of all rooms, which gives clear text information about who is talking to. Synapse logs may also contain privately identifiable information that home server admins might not be aware of in the first place. Those log rotation policies are separate from the server-level retention policy, which may be confusing for a novice sysadmin.

Combine this with the federation: even if you trust your home server to do the right thing, the second you join a public room with third-party home servers, those ideas kind of get thrown out because those servers can do whatever they want with that information. Again, a problem that is hard to solve in any federation.

So while you can workaround a home server going down at the room level, there's no such thing at the home server level, for user identities. So if you want those identities to be stable in the long term, you need to think about high availability. One limitation is that the domain name (e.g. matrix.example.com) must never change in the future, as renaming home servers is not supported.

As a developer, I find Matrix kind of intimidating. The specification is huge. The official specification itself looks somewhat digestable: it's only 6 APIs so that looks, at first, kind of reasonable. But whenever you start asking complicated questions about Matrix, you quickly fall into the Matrix Spec Change specification (which, yes, is a separate specification). And there are literally hundreds of MSCs flying around. It's hard to tell what's been adopted and what hasn't, and even harder to figure out if your specific client has implemented it.

Just taking the latest weekly Matrix report, you find that three new MSCs proposed, just last week! There's even a graph that shows the number of MSCs is progressing steadily, at 600+ proposals total, with the majority (300+) "new". I would guess the "merged" ones are at about 150.

I'm also worried that we are repeating the errors of the past. The history of federated services is really fascinating:. IRC, FTP, HTTP, and SMTP were all created in the early days of the internet, and are all still around (except, arguably, FTP, which was removed from major browsers recently). All of them had to face serious challenges in growing their federation.

IRC had numerous conflicts and forks, both at the technical level but also at the political level. The history of IRC is really something that anyone working on a federated system should study in detail, because they are bound to make the same mistakes if they are not familiar with it.

top 33 comments

sorted by: hot top controversial new old

[-] drkt@feddit.dk 16 points 2 years ago* (last edited 2 years ago)

The history of federated services is really fascinating:. IRC, FTP, HTTP, and SMTP were all created in the early days of the internet

Except SMTP in a roundabout way, I don't understand how those are considered federated?

[-] cstine@lemmy.uncomfortable.business 25 points 2 years ago

IRC is extremely federated: building a network of linked servers sharing the same channels was done pretty early in it's existance.

If anything, IRC is more decentralized than ActivityPub-based services, because there's no 'home' server for a given IRC channel, and if thus if a server goes down, you don't lose all the channels that were created on it.

[-] drkt@feddit.dk 9 points 2 years ago

I had no idea IRC channels could live on multiple servers. That's cool

[-] DmMacniel@feddit.de 15 points 2 years ago

The magic word is netsplit

[-] poVoq@slrpnk.net 11 points 2 years ago

IRC used to be fully federated in the early days, but for various reasons, which are eerily similar to some of the much more recent discussions around AP, this detoriated over time and these days, IRCd have various incompatible s2s protocols that are only used for load-balancing more or less.

I like IRC, but this is a bit of a cautionary tale, what not to do.

[-] astramist@lemmy.sdf.org 6 points 2 years ago

The author's explanation using HTTP as an example:

HTTP has somehow managed to live in a parallel universe, as it's technically still completely federated: anyone can start a web server if they have a public IP address and anyone can connect to it. The catch, of course, is how you find the darn thing.

[-] drkt@feddit.dk 7 points 2 years ago

Have I misunderstood federation or doesn't it seem a little generous to call anything that can be hosted by yourself federated?

[-] astramist@lemmy.sdf.org 12 points 2 years ago

Here, the author refers to protocol as federated, not application. That is, he is comparing Matrix, IRC, SMTP, ActivityPub, etc. If a protocol can be used to develop an application that is decentralized and distributed, then such protocol can be called a federated protocol. I agree with you that labeling HTTP and FTP as federated is bizarre. But the author compares them because they are all from the same OSI model layer - application layer.

I'm not the author, just trying to give an explanation of how he was thinking (and I'm most likely wrong 😄).

[-] SkyNTP@lemmy.ml 3 points 2 years ago* (last edited 2 years ago)

Wether servers talk to each other or not is a technical detail related to the nature of the service (a chat application exclusively relays data between two end users, HTTP and FTP applications relay data between you and the host). This detail matters little to end users. What matters to end users is wether the service they are using is controlled by a single entity, or is instead controlled by multiple entities, which enables competition, and user choice.

This is the definition that "Federation" has come to mean. HTTP is "federated" because users can use different browsers and can talk to different servers operated by different people and organizations, alough it's essentially, from the user's perspective, one service with different, interchangable service providers.

It could not be though. Websites could all be hosted on Google servers, and the only way to view those pages is by paying google a subscription to "browse channels".

We (technical people) need to understand and accept what end users actually care about.

[-] XTL@sopuli.xyz 3 points 2 years ago

Maybe the writer imagines The Web as a single service.

[-] Harry_h0udini@lemmy.dbzer0.com 2 points 2 years ago

Thoughts on Session?

[-] lengsel@latte.isnot.coffee 2 points 2 years ago

Can anyone name a federated service that has built-in encrypted messaging enabled for privacy?

[-] poVoq@slrpnk.net 9 points 2 years ago

Some XMPP clients can be configured to have e2ee enabled by default.

[-] astramist@lemmy.sdf.org 4 points 2 years ago

Like Snikket

[-] lengsel@latte.isnot.coffee -5 points 2 years ago

XMPP is decentralized but it is not federated.

[-] poVoq@slrpnk.net 6 points 2 years ago

? This is blatantly wrong. It was one of the earliest federated protocols.

[-] wildbus8979@sh.itjust.works 1 points 2 years ago

SMTP is like twenty years older....

[-] wildbus8979@sh.itjust.works 4 points 2 years ago

Definitely wrong.

[-] honk@feddit.de 1 points 2 years ago

What do you mean by built in?

Built in to the protocol? Built in to the application?

There is a couple of XMPP clients that implement omemo and/or otr encryption.

Matrix supports encryption on a protocol level. But it‘s relatively flawed.

[-] possiblylinux127@lemmy.zip 1 points 2 years ago

Session?

[-] astramist@lemmy.sdf.org 1 points 2 years ago* (last edited 2 years ago)

I think it's unlikely this kind of service exists or is going to appear. There's a blog post by developers of the present implementations of XMPP. It explains the difference between decentralized services and centralized ones, and why the Signal messenger is more popular than all other messengers. A must-read.

[-] lengsel@latte.isnot.coffee -3 points 2 years ago

XMPP is decentralized but XMPP has never been federated. I'm a fan of OMEMO but it's decentralized.

Anybody looking for privacy from a federated service will never find it. It seems SimpleX is implementing more decentralized capabilties and it has superior privacy over anything else.

While Signal is the gold standard, it is not at all the best app or service for privacy.

[-] wildbus8979@sh.itjust.works 6 points 2 years ago* (last edited 2 years ago)

XMPP is decentralized but XMPP has never been federated.

That is completely wrong though. Anyone can run an XMPP server and talk to any user on any other server. XMPP is fully federated.

[-] astramist@lemmy.sdf.org 1 points 2 years ago

Agreed, it's a contradiction to be privacy and federated at the same time. The federated protocol helps the network to be fault-tolerant and cooperative. In other words, it's easier for us to find each other, and afterward it's harder to lose each other. It obviously doesn't condone privacy 😄

[-] poVoq@slrpnk.net 2 points 2 years ago* (last edited 2 years ago)

This is not true. In a federated network like XMPP your server anonymizes a lot of metadata that is generated by you connecting to the server, but not passed further on to other servers. Of course more meta data is shared than in a system that doesn't talk to other servers at all, but it is definitely less than in a system that relies on direct p2p connections or multiple relays.

[-] astramist@lemmy.sdf.org 1 points 2 years ago

If a server is hosting our data, albeit in encrypted form, there is always the risk of the server being compromised. You know the history of PGP and why OpenPGP was created, don't you?

One of the options, where every user device is a server, is a blockchain. But I think you'll also agree that this scheme doesn't give complete privacy.

The issue of privacy in this case is a convenience issue. To me, federated is not a checkbox type property: it's either there or it's not. To me, it's a spectrum: some protocol is more federated, some less so. We could design a fully privacy-aware protocol and service that can only partially be considered as federated. You may disagree with me, but I haven't seen a clear definition with a complete list of federated protocol properties 😉

[-] poVoq@slrpnk.net 3 points 2 years ago

I think ultimately it is a trust issue. There is no such thing as trust-less communication and you need to carefully consider who you trust with relaying your communication data.

In the classical federation model that XMPP uses you need to put a fairly high amount of trust on the server you have an account on, but then that server can shield you from most of the privacy violations and you can interact fairly privately using a pseudonym that can not be easily linked to your real identity by anyone other than that trusted server.

If you move to more p2p or relay based models, then you directly share a lot more metadata with more 3rd parties and those 3rd parties are usually completely in-transparent about why and how they participate in the network. If you are a info-sec expert, you can in theory optimize such networks to get a high level of privacy, but it is full of hidden footguns that can easily make you more vulnerable to privacy invasive tracking of metadata like IP addresses that can be easily linked to your real identity.

On the other end of the spectrum you have centralized services like Signal, that (if you trust them) can also offer a high level of privacy, but due to their centralisation they have a large target painted on their back and there are a lot of hidden incentives/forces for such centralized service providers to compromise your privacy often without you even realizing it.

[-] astramist@lemmy.sdf.org 1 points 2 years ago

Sounds reasonable! 👍

[-] possiblylinux127@lemmy.zip 1 points 2 years ago

The thing matrix has going for it is its interoperability

load more comments

this post was submitted on 29 Jul 2023

71 points (92.8% liked)

Privacy

48828 readers

846 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Posting a link to a website containing tracking isn't great, if contents of the website are behind a paywall maybe copy them into the post
Don't promote proprietary software
Try to keep things on topic
If you have a question, please try searching for previous discussions, maybe it has already been answered
Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
Be nice :)

Related communities

much thanks to @gary_host_laptop for the logo design :)

founded 6 years ago

MODERATORS