please don't use the bsky "search" endpoints as a generic API or for automation. it is not designed for that, is an expensive waste/abuse of resources, and we will block by IP or user agent. searches should come from direct human queries in the moment
January 19, 2025 at 7:08 PM UTCHave we yak shaved enough yet on comments? I don’t think we have, what’s one more post among friends.
The problem
As mentioned in Bluesky Comments, my posting workflow doesn’t allow me to know the Bluesky post ahead of time. So I thought, well I’ll just use the search API to find the post URL based on my echofeed post. That worked pretty well, but then I saw this skeet:
My first reaction was “Oh it’ll be fine this is just limited use and there was another post that says it’ll be fine for limited use”. It was fine, but after a day or so after Bryan’s post, I started getting CORS errors from my search function, and I figured that was that, and I needed to do this a better way, and maybe learn something along the way.
The research
I had seen Jetstream in my research prior to now, and figured some sort of database would probably be the best bet, where it’s just a single lookup, rather than every browser of the site having to do the search (even if it wasn’t blocked).
My initial thought was using Typescript for a server API, and there’s definitely still benefits to there, being a robust ecosystem of ATProto packages. I however wanted to Think Different™ so I decided to write it in Rust, both because I could, and also because I wanted to learn a bit of Rust “systems” programming, since I’ve been spending a lot of my day in Typescript.
After doing some very detailed Kagi searches 1, I discovered Rocket and determined it was actually perfect for my needs. Simple declarations of handler functions, and it’s nice and type-safe. What’s not to like.
It’s Rust time
With just a bit of boilerplate code, I can spin up the API handler needed for the metadata info for my front-end.
#[get("/")]fn index() -> &'static str { "at-comments database API server"}
#[get("/<slug>")]fn post_meta(slug: &str) -> Result<Json<models::Post>, NotFound<String>> { match db::post_meta(slug) { Ok(post) => Ok(Json(post)), Err(_) => Err(NotFound("Resource was not found.".to_string())), }}
#[launch]async fn rocket() -> _ { // setup websocket stuff tokio::spawn(post_listener::subscribe_posts()); // setup server to respond rocket::build() .mount("/", routes![index]) .mount("/", routes![post_meta])}
There’s two important parts here in the main server, and I’ve labeled them A and B.
Part A: Metadata endpoint
We’ll say for the sake of argument that somehow the web server has the info
for a given post already (since we’ll discuss the Jetstream and DB in part B),
so part A is the thing that looks up info for a given post. One of the nice
things about Rocket is that it has lots of type guards, so all I need to do
is specify that it should be expecting a string input in the path, and it’s
going to return a Result<Json<models::Post>, NotFound<String>>
. We call out to
db::post_meta(slug)
which checks the database using the post slug, and returns
the Post object. Since Post is tagged as a Serialize
struct in db.rs
, Rocket
can just convert it to JSON using library code and safely return it. If it’s not
found, then it just returns a 404 and we move on with our day.
The comments code now just needs to make a request to the endpoint to get the Bluesky post rkey:
const getPostAndThreadData = async ( slug: string, setThread: (thread: AppBskyFeedDefs.ThreadViewPost) => void, setUri: (uri: string) => void, setError: (errorString: string, error: string) => void) => { const url = `https://meta.jack.is/${slug}`; const meta = (await ky.get(url).json()) as Meta;
After getting the metadata, it just acts like normal.
Part B: Websockets and databases
So how do we get the post info to put in a database for the API route to have what it needs? Bluesky Jetstream of course. Using jetstream-oxide, as well as the patches from cyypherus, we can now listen to the Jetstream and filter for just my posts. This actually ends up being more robust than the old search API, as here I can restrict my filter to “just posts with 📝 as the first character”, which couldn’t easily be done using the search API.
Using Diesel the websocket loop then inserts into a postgres database with all the info needed for the front-end to find the post.
Next steps
There are one or two improvements I’d like to make here, primarily around the database handling. It’s not using connection pooling at the moment, but figuring out how Rocket handles all of that (and passing a connection to an arbitrary tokio function) was out of scope for this MVP.
In case you’re curious, you can find the code here: at-comments-listener. It’s primarily designed for me, so there’s a lot of stuff in there that’s hard coded, but maybe someone will find it interesting.
Footnotes
-
“rust web server framework” ↩