WEBVTT

00:00.000 --> 00:12.360
Next up, we have David Ratzel from the Master of Project, who's going to be talking about

00:12.360 --> 00:17.200
the fetus-covory work they've been doing with Master of the Project.

00:17.200 --> 00:18.760
Thank you very much for having me.

00:18.760 --> 00:20.080
Welcome, everyone.

00:20.080 --> 00:21.080
My name is David.

00:21.080 --> 00:27.560
I work as a web developer at Master of the GGMBH, which is the non-profit organization

00:27.560 --> 00:31.240
that oversees the development of the Master of the Software.

00:31.240 --> 00:34.080
And actually, I'm a longtime fostering fan.

00:34.080 --> 00:38.440
It's my first fostering for 10 years due to different reasons.

00:38.440 --> 00:44.840
But before that, I came here all the time, as evidenced by the stack of t-shirts I had

00:44.840 --> 00:47.400
to go through a couple of days ago.

00:47.400 --> 00:52.120
I never imagined that I would one day stand here in front of all of you, so this is really

00:52.120 --> 00:57.520
amazing, but also very nervous, so please bear with me.

00:57.760 --> 01:02.960
I'd like to talk about a search and discovery on the fetiverse, and with this topic, there

01:02.960 --> 01:10.240
is one huge problem that we have, and this problem can easily be illustrated.

01:10.240 --> 01:15.240
Let's say you are technically inclined, and you would like to set up a Master of the

01:15.240 --> 01:17.120
Server for our friends and family.

01:17.120 --> 01:21.520
By the way, I'm using Master of the GGMBH as an example here, just because that's what

01:21.560 --> 01:28.400
I am familiar with, but similar things will happen with all the different activity

01:28.400 --> 01:31.560
pub-based software projects.

01:31.560 --> 01:35.720
So you have this new Master of the Server, and the first time you look in, you will

01:35.720 --> 01:38.400
probably see something like this.

01:38.400 --> 01:40.160
This is an empty timeline.

01:40.160 --> 01:44.800
This is not nice, but maybe, I mean, you are technically inclined.

01:44.800 --> 01:48.280
Maybe this is not totally unexpected.

01:48.280 --> 01:52.600
So the first thing you will probably try to do is find follow.

01:52.600 --> 01:55.760
And you might have heard about this, this garg-run guy.

01:55.760 --> 02:00.640
He seems to be very popular on MasterDone, so you want to follow him.

02:00.640 --> 02:06.760
And the obvious thing to do is just enter the name in the search bar, and if you do that,

02:06.760 --> 02:11.760
you will see, this here, no joy.

02:11.760 --> 02:14.640
But honestly, you don't care so much.

02:14.640 --> 02:19.840
You only came to MasterDone because you are promised cute cat follows.

02:19.840 --> 02:23.080
So the obvious thing to do is search for cat follows.

02:23.080 --> 02:32.160
And if you do that, you will see this here, again, no joy, sorry.

02:32.160 --> 02:39.680
This example is extreme, of course, it's an extreme example, but it's one that everyone

02:39.680 --> 02:45.960
on small and even mid-sized instances will be very familiar with.

02:45.960 --> 02:50.800
And even on MasterDone.Social, which is the largest instance of MasterDone, there is, it's

02:50.800 --> 02:52.240
the one we operate.

02:52.240 --> 02:58.560
We regularly get reports of people leaving the platform because they couldn't find the

02:58.560 --> 03:01.360
people and content they were looking for.

03:01.360 --> 03:05.560
Even though it's clearly there.

03:05.560 --> 03:11.800
So the reason this happens is pretty clear, with all the February servers, happily

03:11.800 --> 03:17.800
federate with each other using the shared protocol, activity pub, when it comes to search

03:17.800 --> 03:26.800
and discovery, every server is on its own.

03:26.800 --> 03:35.320
Before I introduce you to our idea, what could improve this situation, and you take a

03:35.320 --> 03:39.960
little detour, and talk about service providers for a minute.

03:39.960 --> 03:45.440
Many February servers today already use one or more external service providers.

03:45.440 --> 03:51.280
The most obvious example being storage providers, something like S3, or compatible services

03:51.280 --> 03:55.880
to store media files.

03:55.880 --> 03:57.080
But keep that idea in mind.

03:57.080 --> 04:01.080
We have external service providers helping out with search and discovery.

04:01.080 --> 04:08.000
Once we have that, a February server could use one of those external service providers.

04:08.000 --> 04:14.160
And it might, some service might use a single one, service could also use more than one

04:14.160 --> 04:19.520
external search provider to help with search and discovery.

04:19.520 --> 04:26.960
And while every single server only has a very narrow view of the full Fediverse, as soon

04:26.960 --> 04:34.360
as two or more separate servers use the same search provider, this search provider has a chance

04:34.360 --> 04:41.480
to get a much broader view of the large Fediverse.

04:41.480 --> 04:46.880
And this is an idea that we are working towards.

04:46.880 --> 04:53.360
We called our project, had this discovery, and we have a couple of goals for this project.

04:53.360 --> 05:01.080
First of all, we want to try out this provider idea, the idea of having external service

05:01.080 --> 05:04.160
providers solving problems for the Fediverse.

05:04.160 --> 05:09.040
This is our first proof of concept of this idea.

05:09.040 --> 05:16.480
We don't want to build a project that's only useful for MasterDone.

05:16.480 --> 05:22.760
On the contrary, this project is only successful if it will be useful to others as well.

05:22.760 --> 05:25.880
To that end, we are not just writing software.

05:25.880 --> 05:31.840
We are writing specifications first, open specifications that anyone can implement.

05:31.840 --> 05:39.160
So everyone will be able to write their own provider, and also every developer of Fediverse

05:39.160 --> 05:45.080
software of existing Fediverse software projects will be able to integrate this into their

05:45.080 --> 05:48.440
project.

05:48.440 --> 05:52.240
And to make this all work, we need to work together with other projects.

05:52.240 --> 05:57.920
We've tried to reach out to other projects, but we probably didn't do a very good job of this.

05:57.920 --> 06:04.000
So if you are an implementer of an activity-pop-based Fediverse software, and you think

06:04.000 --> 06:10.800
this idea might be useful to us, to you, please come talk to us.

06:10.800 --> 06:15.560
We will also build an open source reference implementation of this.

06:15.560 --> 06:16.560
And I said it before.

06:16.560 --> 06:22.440
We also have this open specification so everyone can build their own.

06:22.440 --> 06:27.920
We are very lucky to secure funding for this project, the NGI search organization, helped

06:27.920 --> 06:29.880
us out with that.

06:29.880 --> 06:35.320
If you have a grant from some of these organizations, you know, that you are working on

06:35.320 --> 06:36.320
a timeline.

06:36.320 --> 06:38.800
We have some very tight deadlines to meet.

06:38.800 --> 06:42.120
The very next one is actually at the end of this month.

06:42.120 --> 06:48.200
So I said please reach out to us, but I want to apologize in advance if I'm not as responsive

06:48.200 --> 06:53.920
as I should be in the coming four weeks, because that's the next deadline.

06:53.920 --> 06:57.680
The end of this project is scheduled for June.

06:57.680 --> 07:02.360
But the important thing here is we won't stop working on this just because this grant

07:02.360 --> 07:03.960
has ended.

07:03.960 --> 07:10.120
We are in this for the long run, and we hope to build something useful together with others.

07:10.120 --> 07:16.920
I said we have specifications, and we put them on GitHub, or rather the first drafts

07:16.920 --> 07:18.760
are on GitHub.

07:18.760 --> 07:22.000
This repository here, it's this very long name again.

07:22.000 --> 07:26.120
So you can't miss the repository if you look for it.

07:26.120 --> 07:32.520
In this repository, you will find a couple of drafts, and the first one is concerned with

07:32.520 --> 07:37.760
general interaction between providers and favor service, things like registration authentication

07:37.760 --> 07:39.240
and so on.

07:39.240 --> 07:42.960
These can be reused for other purposes.

07:42.960 --> 07:50.040
Then we have data sharing, which I will talk about in a minute, and we have an open pull

07:50.040 --> 07:57.360
request for the first user-facing specification, which is Trends.

07:57.360 --> 08:02.280
Very soon you will find a draft for account search as well, and until the end of June

08:02.280 --> 08:06.040
we will also have account recommendations.

08:06.040 --> 08:13.400
Because not in this NGI search project is the big one post search, but it's on our internal

08:13.400 --> 08:14.880
roadmap anyways.

08:14.880 --> 08:20.200
We needed to cut the scope of the NGI search project, so this was the one that we had

08:20.200 --> 08:26.240
to leave out, but we will work on this after June.

08:26.240 --> 08:32.320
So data sharing, this is probably the big one, because for the search provider to be able

08:32.320 --> 08:38.440
to return results, it needs to index content from the Fediverse.

08:38.440 --> 08:40.320
And this is of course a difficult topic.

08:40.320 --> 08:47.600
There are certain privacy expectations on the Fediverse, and we are very well aware of them.

08:47.600 --> 08:54.600
So the concept we came up with, and that we want to run with a little bit, is the following.

08:54.600 --> 08:59.760
This master on server on top here, this is not using a search provider, but it is using

08:59.760 --> 09:03.080
activity paths, so it's federating with other servers.

09:03.080 --> 09:11.480
This generic activity paths server on the right, this is a server that is using a search

09:11.480 --> 09:12.480
provider.

09:12.480 --> 09:18.720
And once new content arrives, and it knows of a new post made on this master on server,

09:18.720 --> 09:23.760
it will notify its search provider of that new content.

09:23.760 --> 09:27.320
But it will not send the actual content.

09:27.320 --> 09:33.680
It will only send the URI, the idea of the activity path object.

09:33.680 --> 09:39.080
So the search provider is then responsible for actually fetching that content.

09:39.080 --> 09:43.760
And it will do so using a signed request.

09:43.760 --> 09:51.080
If the master on server on top wants to check the signature, the search provider will actually

09:51.080 --> 09:59.760
pose as an activity path actor, so the signatures can be very tight.

09:59.760 --> 10:04.080
We take privacy and content very, very serious.

10:04.080 --> 10:09.680
And I'd like to point out, first of all, that we will only ever index public content.

10:09.680 --> 10:14.600
And if you know a bit about master on, we actually have different levels of public content.

10:14.600 --> 10:16.640
We have something called quiet public.

10:16.640 --> 10:20.840
It's used to be called unlisted, where you say, hey, you want to publish something on

10:20.840 --> 10:24.640
the web, but you don't want to announce it in any way.

10:24.640 --> 10:27.800
And we will not index those posts.

10:27.800 --> 10:31.080
This is only for public public content.

10:31.080 --> 10:39.760
And the Fenver server is responsible for only sending public public content, but also the

10:39.760 --> 10:43.760
search provider needs to double check that this is really the case.

10:44.200 --> 10:46.320
The same goes for consent.

10:46.320 --> 10:53.480
If you want to check if an author of content has opted in to being discovered and being

10:53.480 --> 10:57.560
indexed, we already have that in master on actually.

10:57.560 --> 11:01.920
We have these two flags on the actor called this coverable and indexable.

11:01.920 --> 11:03.720
And they exist today.

11:03.720 --> 11:09.760
And I know of other Fediver software projects that have implemented them as well.

11:09.840 --> 11:16.560
This coverable means you as a person or an actor have opted into being discovered on the

11:16.560 --> 11:17.560
Fediver's.

11:17.560 --> 11:26.560
While indexable means you have opted into your content to be indexed, so it can be found.

11:26.560 --> 11:30.360
And this is there today, and we will of course respect that.

11:30.360 --> 11:38.000
And again, the Fediver server that shares URIs, we need to make sure that these both are

11:38.000 --> 11:46.960
set true and the receiving search provider will have to double check this again.

11:46.960 --> 11:51.080
You might remember that I said a search provider would do signed requests.

11:51.080 --> 11:56.920
And this means that Fediver's service that support this can check the signature, check

11:56.920 --> 12:03.000
the actor, and that means those requests can be blocked.

12:03.000 --> 12:10.880
You can either have a server level block or even an individual user could decide to block

12:10.880 --> 12:12.720
search providers.

12:12.720 --> 12:18.280
So we have different layers of security built into this concept.

12:18.280 --> 12:24.280
We hope that this is enough, but if you can think of any ways we could improve that,

12:24.280 --> 12:27.360
please let us know.

12:27.360 --> 12:31.720
The first use of facing capability of a search provider that we are currently implementing

12:31.720 --> 12:32.720
will be trends.

12:32.720 --> 12:39.360
So we define an API for Fediver's service to ask a provider what is currently trending

12:39.360 --> 12:46.120
on the Fediver's, which posts, which hashtags, which links are currently trending.

12:46.120 --> 12:50.600
This is still an early draft, but you saw the timeline we will probably have to implement

12:50.600 --> 12:55.080
what's there today, which doesn't mean this will be finished at the end of the month.

12:55.080 --> 13:03.880
We will continue to improve this, but don't be surprised if we merge this open PR without

13:03.880 --> 13:07.800
much further discussion.

13:07.800 --> 13:13.000
And similar thing goes for a counter search, the specification is not quite ready yet,

13:13.000 --> 13:19.840
but we expect the first draft to be quite simple, a full text search of all the public

13:19.840 --> 13:25.560
information on actors that describe them.

13:25.560 --> 13:32.360
Again, feedback is very welcome, even if we will probably have a first implementation

13:32.360 --> 13:37.840
of this draft, we will continue improving that in the future.

13:37.840 --> 13:44.040
In order to be able to implement this, and we are working on a reference implementation,

13:44.040 --> 13:47.400
I decided to pull out a couple of things.

13:47.400 --> 13:52.360
You may know, Macedon is a Ruby shop, we are using Ruby on Rails, and so our reference

13:52.360 --> 13:57.360
implementation will also be based on Ruby on Rails, and if you are familiar with Rails,

13:57.360 --> 14:04.000
you might know that it has a plugin system, and we extracted two plugins from our reference

14:04.000 --> 14:08.240
implementation, so everyone can start their own provider project.

14:08.240 --> 14:13.000
So if you don't want to bother with all the authentication stuff and registration stuff,

14:13.040 --> 14:21.160
we built into this, you can simply use these ready-made plugins if you use Ruby on Rails.

14:21.160 --> 14:26.760
I just opened up this repository two days ago, it's still a little sparse on documentation,

14:26.760 --> 14:30.880
but if you're feeling adventurous, please give it a try.

14:30.880 --> 14:37.960
The reference implementation is not there yet, it will be at this repository, and well,

14:37.960 --> 14:41.720
you saw the timeline, it will be there soon.

14:41.720 --> 14:46.120
Before I finish, I would like to address some questions that we got over the past months.

14:46.120 --> 14:50.920
And the first question, I think I answered this already, but it's the most common one,

14:50.920 --> 14:55.720
is this only for Macedon, and I said it, no, of course not.

14:55.720 --> 15:03.920
On the contrary, this project will only be successful if it will be useful to others as well.

15:03.920 --> 15:08.800
Another common concern is it doesn't just lead to centralization, and when I first heard this,

15:08.800 --> 15:15.800
I was actually offended by this, because we are going out of our way to write specifications

15:15.800 --> 15:21.440
to help others implement competing implementations of this.

15:21.440 --> 15:29.480
So we hope to have several different implementations of this and many different installations of this.

15:29.480 --> 15:38.760
But I said it myself, it will start becoming useful once more than one,

15:39.000 --> 15:42.200
server uses a single provider.

15:42.200 --> 15:47.360
So yes, there is some kind of centralizing force, but I, personally, I am not worried.

15:47.360 --> 15:54.280
I know a lot of Macedon admins over the past couple of months, and they are well aware of this problem.

15:54.280 --> 16:03.200
So I don't expect a single player to barge in and say, hey, everyone, please use my central search provider.

16:03.200 --> 16:07.320
I don't see this happening anytime soon.

16:07.360 --> 16:13.040
And another concern is of course privacy, but I think I addressed this.

16:13.040 --> 16:19.520
Just to recap, we will index public content, only public public content to be precise.

16:19.520 --> 16:24.320
We respect consent, and it will be blockable.

16:24.320 --> 16:31.120
The last concern I heard a couple of times is, isn't this just bespoke APIs?

16:31.120 --> 16:36.680
Yes, it is. This is not an extension to the activity part protocol.

16:36.680 --> 16:43.000
And I think some people may be sad that it isn't, but I know others will be very happy that it isn't.

16:43.000 --> 16:51.280
For the moment, this is just a set of APIs we define that we hope will be useful for many different projects.

16:51.280 --> 16:57.480
It might evolve into something else at some point, but for the moment, this is what we're doing.

16:57.480 --> 17:03.200
And I said it a couple of times now, we need others to help us out, to work with us.

17:03.200 --> 17:08.400
And if you would like to contribute, please go to this specification, GitHub repository.

17:08.400 --> 17:13.080
The one with a very long name, you cannot miss it.

17:13.080 --> 17:19.280
And come talk to us directly. If you are here at Phospham, the best way to do this is visit us at our stand.

17:19.280 --> 17:23.520
It's on the ground floor of the age building over there.

17:23.520 --> 17:26.720
So I think I'm still in time. Thank you very much.

17:27.120 --> 17:30.120
APPLAUSE

17:34.600 --> 17:37.240
All right, we have some time for questions.

17:37.240 --> 17:39.240
You want to go first here in my background.

17:39.240 --> 17:41.080
Andy?

17:41.080 --> 17:45.600
Are there two of the sort of single-central search providers problem?

17:45.600 --> 17:51.320
Are there any plans to allow the providers to federate that information between themselves?

17:51.320 --> 17:54.760
So you end up with a distributed index as well?

17:54.840 --> 17:58.440
Not yet, but the idea is super interesting.

17:58.440 --> 18:05.760
Sout of scope for now, because we had to cut the scope a lot, but certainly interesting, yes.

18:05.760 --> 18:06.760
Yes?

18:06.760 --> 18:07.760
All right.

18:11.360 --> 18:16.360
Before by sputtering instances, question mark, I think I addressed this.

18:16.360 --> 18:22.360
We are not going to crawl or spider the web or the featherers on a contrary week.

18:22.360 --> 18:25.200
I explicitly do not do this.

18:25.200 --> 18:29.320
I talked about these different levels of security we built into this concept.

18:29.320 --> 18:35.040
And actually, there's one of the things that will be resources intensive, crawling would be easier,

18:35.040 --> 18:39.160
and we would get a lot more content.

18:39.160 --> 18:40.760
But we will not do that.

18:40.760 --> 18:51.280
Yes, so question, is a website owner?

18:51.280 --> 18:54.320
I'm not very popular on a mobile phone, I have four followers.

18:54.320 --> 18:56.720
My wife and follow me yesterday.

18:56.720 --> 19:02.560
Don't collect it, anyway.

19:02.560 --> 19:10.720
If I post something on a mobile phone, if I post something on a mobile phone, I get

19:10.760 --> 19:14.480
data by, I think, the well-known data's effect.

19:14.480 --> 19:20.600
All the different instances, effects my website for the little thumbnail thingy, and my website

19:20.600 --> 19:26.760
goes gloriously down, it's a heavy server, I'm not very popular, I think this is a big issue.

19:26.760 --> 19:36.080
Anyway, this may help if the kind of metadata is also shared, and all the different instances

19:36.080 --> 19:42.400
do not need to refetch data.

19:42.400 --> 19:48.640
I think out of scope, but will I need to worry about not only being data by hundreds of

19:48.640 --> 19:53.000
multiple instances, but now also with a few search providers?

19:53.000 --> 19:57.880
There will always be a lot less search providers and a method on instances, so I wouldn't

19:57.880 --> 20:02.280
worry about this just yet.

20:02.360 --> 20:06.840
This will not solve this problem, but we are well aware of the problem, and we will

20:06.840 --> 20:08.840
need to solve this soon.

20:08.840 --> 20:09.840
Okay, thank you.

20:09.840 --> 20:14.400
We're also working on that.

20:14.400 --> 20:19.440
So if I understand this right, the order of magnitude of the data that you have to send

20:19.440 --> 20:24.120
to search provider is sort of in a similar ballpark as what the data is that's being posted

20:24.120 --> 20:25.920
to a particular server.

20:25.920 --> 20:33.080
If you aggregate a whole bunch of them, this becomes a very significant additional resource

20:33.080 --> 20:37.600
you can get a consumption in terms of the cost of operating a message in instance, and then

20:37.600 --> 20:43.040
if you have a search provider that aggregates 100 different servers and it has to have lots

20:43.040 --> 20:47.920
of them, otherwise it is not particularly useful, that becomes quite a research-intense

20:47.920 --> 20:48.920
thing.

20:48.920 --> 20:53.160
On the other hand, that research-intense thing is not something that's use-a-visible,

20:53.160 --> 20:56.480
so you can't just go to the end users and say, hey, I have this cool master instance

20:56.480 --> 20:59.200
for the XYZ community, you know, help me support it.

20:59.200 --> 21:04.180
It's sort of something that's not really seen, so it's harder to get donations, so other

21:04.180 --> 21:06.280
kinds of funding for it.

21:06.280 --> 21:10.760
What are you thoughts around this?

21:10.760 --> 21:13.760
I don't have many thoughts on this yet.

21:13.760 --> 21:16.000
I've said it before.

21:16.040 --> 21:24.200
We don't know exactly how resource-intensive this will be yet, and we will learn a lot about

21:24.200 --> 21:34.000
this in the coming months, so we will have a much better idea, a couple of months' time.

21:34.000 --> 21:38.000
On the other hand, I know a couple of method-on-appments.

21:38.000 --> 21:44.920
Some of them have a lot of resources at their hands, so for some of them spinning up

21:44.960 --> 21:54.920
more servers isn't really an issue, so I would expect several community servers to step up and offer

21:54.920 --> 21:56.400
their servers.

21:56.400 --> 22:01.560
We as Macedon might do the same, we don't have any concrete plans on this yet, but of course,

22:01.560 --> 22:10.520
we may do this, I don't know, but yeah, I think we will see if this works, if this

22:10.520 --> 22:16.520
works at all, and we will see a couple of servers.

22:16.520 --> 22:23.520
It's probably not something that a single user instance admin would run.

22:23.520 --> 22:28.520
Do you have something like a developer room in the matrix for discussion?

22:28.520 --> 22:29.520
Partner didn't get that.

22:29.520 --> 22:34.520
Do you have something like developer room in matrix for discussions where the place where

22:34.520 --> 22:35.520
you communicate?

22:35.520 --> 22:38.520
No, we don't have that at the moment.

22:41.520 --> 22:45.520
We're going to take maybe one or two more questions for note.

22:45.520 --> 22:47.520
Do you know?

22:47.520 --> 22:48.520
Okay.

22:48.520 --> 22:51.520
Come to this time that the user's question is coming so smoothly.

22:51.520 --> 22:52.520
Oh, that great.

22:52.520 --> 22:57.520
So if you have questions about this implementation or about Macedon,

22:57.520 --> 23:01.520
Macedon does have the stand, so please come ask questions.

23:01.520 --> 23:02.520
Please visit us.

23:02.520 --> 23:04.520
Maybe one or two more.

23:04.520 --> 23:09.520
Image is going to be indexed by cash also.

23:09.520 --> 23:13.520
I'm sorry.

23:13.520 --> 23:18.520
Image is going to be hashed in searchable, like.

23:18.520 --> 23:19.520
Image is.

23:19.520 --> 23:20.520
Yeah.

23:20.520 --> 23:22.520
No, this is totally out of scope for now.

23:22.520 --> 23:23.520
Okay.