Backup Brain: Self-Hosted Bookmarking & More

Table of Contents

Overview
Searchable Attributes
- Searching Archives
Tooling
Limitations
- Unofficial / Unclear
- Official Limitations
  - Maximum number of query words
  - Maximum number of words per attribute

Overview

Our searches are powered by Meilisearch, a powerful open source full-text search engine.

When you search, your Backup Brain you’re searching the titles, descriptions, tags, and the extracted text content of the archived page.

Searchable Attributes

For each Bookmark we index the following attributes

title
description
tags
the latest archive (if any)

Searching Archives

Frequently you remember a piece of information from a page you read, but not what the page was called, or what you might have put in the description.

For this reason, general searches will always consider the text from the most recent archive of a page.

It’s limited to the most recent, for two reasons:

Meilisearch has a limit on the number of words it’ll allow in any single attribute, and multiple archives from a long page could easily surpass this limit.
The most recent archive is what you’re shown by default when you go to the archives page & it’d be confusing to have a search result that match content from an old version you weren’t being shown.

Tooling

Meilisearch runs as its own server. It is assumed that you’ll install and run it on the same machine as the Backup Brain server. However, this is not a requirement. The location of your Meilisearch server is defined by the MEILISEARCH_URL environment variable in your .env file.

Backup Brain creates & uses an index named backup_brain_general so as not to interfere with any other indexes you may have on the same Meilisearch instance.

Limitations

Unofficial / Unclear

“stemming”¹ and the removal of “stop-words” is language specific and it’s unclear what language’s they’ve added support for, and there’s a long-running discussion about the problems they’ve encountered supporting Chinese because its words are not always space separated.

While there have been multiple mentions of “stemming” in the context of tokenization by Meilisearch engineers it’s unclear what languages stemming is supported in or what one needs to do to enable it for a given language.

Official Limitations

The following limitations are quoted from Meilisearch’s Known Limitations page. The short version is that searches should be 10 words or less & the end of exceptionally long archives won’t be included.

Maximum number of query words

Limitation: The maximum number of terms taken into account for each search query is 10. If a search query includes more than 10 words, all words after the 10th will be ignored.

Explanation: Queries with many search terms can lead to long response times. This goes against our goal of providing a fast search-as-you-type experience.

Maximum number of words per attribute

Limitation: Meilisearch can index a maximum of 65535 positions per attribute. Any words exceeding the 65535 position limit will be silently ignored.

Explanation: This limit is enforced for relevancy reasons. The more words there are in a given attribute, the less relevant the search queries will be.

[With stemming] the words “drives”, “drove”, and “driven” will be recorded in the index under the single concept word “drive”. - Wikipedia’s Full-text search page ↩︎