Fediverse Schema Observatory Database Snapshots

This project provides occasional snapshots of its database, in the hopes that others can create their own applications building on the data collected here. These snapshots are released under a CC0 license.

Table of Contents

About the database

The database is a SQLite database, and contains the following tables:

CREATE TABLE schemas (hash TEXT PRIMARY KEY, first_seen TEXT, last_seen TEXT, schema TEXT, total BIGINT, notes TEXT);
CREATE TABLE domains (domain TEXT PRIMARY KEY, nodeinfo TEXT, first_seen TEXT, last_seen TEXT, software TEXT, version TEXT);
CREATE TABLE schemasSoftware (schemaHash TEXT, software TEXT, version TEXT, total BIGINT, PRIMARY KEY(schemaHash, software, version));

Table descriptions

The schemas table contains one row for each schema observed. Each row contains the schema along with a hash to uniquely identify the schema, the first_seen and last_seen time of the schema, the total number of times that the schema has been observed, and a notes field which actually is the overall summary of the schema. The schema column contains JSON-formatted text. See the Usage tips section below for tips on how to query JSON directly in SQLite.

The domains table contains one row for each domain that has been observed emitting a schema, along with recently fetched nodeinfo, the first_seen and last_seen time for the domain, and the latest known software and version running at the domain.

The schemasSoftware table associates a given schema with the software that has been observed to emit it. Each row contains a schemaHash which is the hash value for the schema itself, software and version observed emitting that schema, and the total number of times that combination of schema, software, and version has been observed. It is expected that there will be more than one row with the same schemaHash, since the same schema can be emitted by multiple different pieces of software.

Usage tips

The best way to query the database is to use json_extract and other SQLite JSON functions to pull out specific fields from the JSON encoded text field schema. For example, to find all schemas that contain a top-level type of Update, you could use:

SELECT * FROM schemas WHERE json_extract(schema, '$.type') = 'Update';

To find all schemas that contain a top-level type of Update and an object.type of Note, you could use:

SELECT * FROM schemas WHERE json_extract(schema, '$.type') = 'Update' AND json_extract(schema, '$.object.type') = 'Note';

To find all schemas observed to be emitted by a specific software, you can join the schemas and schemasSoftware tables. For example, to find all schemas emitted by servers running WriteFreely, you could use:

SELECT s.* FROM schemas s JOIN schemasSoftware ss ON s.hash = ss.schemaHash WHERE ss.software = 'writefreely';

I recommend becoming familiar with JSONPath syntax, as that is what SQLite uses to identify specific fields within the JSON text. See this introduction to JSONPath for a good introduction to JSONPath. Use this online JSONPath playground to get a feel for it interactively.

Download database snapshots

Again, the following snapshots are released under a CC0 license.