Admittedly, it may not have been my brightest idea to use BigQuery as the search backend for poketto.me. But since Firebase doesn’t have built-in full-text search, I would have needed to add another tool to my stack anyway. I figured BigQuery would be easier than managing something external like Apache Lucene or Elasticsearch. Plus, BigQuery has a built-in SEARCH(...) function, so why not give it a try?

As it turns out, SEARCH(...) is really more of a token-based text analyzer than a true search function:
1️⃣ It splits both the search term and the text into tokens (words).
2️⃣ It matches full tokens only.
3️⃣ It returns just TRUE or FALSE—no offsets, no match lengths.
🤯 And worst of all: It only works with ASCII. Yes. Like it’s 1979.

With these limitations, my first iteration of full-text search in poketto.me was basically unusable:
😟 Searching for entire words worked (“Finanzminister” returned results).
😟 Partial words didn’t (“Finanzmin” returned nothing).
😟 Non-ASCII characters broke it completely (“Österreich” yielded no results).

So what did I do? I fell back on the bluntest tool in my toolbox: REGEXP_CONTAINS(...).

I know this isn’t a long-term solution, but for now, it works surprisingly well (and fast):
1️⃣ I implemented my own tokenization logic—splitting search terms into words, keeping quoted phrases together.
2️⃣ I construct a long SQL query using REGEXP_CONTAINS to check titles, full text, site names, and bylines:

REGEXP_CONTAINS(LOWER(text), LOWER(@search_term_{idx}))

3️⃣ For each “found” save, I scan the full text again for offsets and lengths to highlight matches in the UI.

Clumsy? *Yes.* Sustainable? *Probably not.* But it works—for now—until I replace it with a real full-text search backend.

🔗 [https://cloud.google.com/bigquery/docs/reference/standard-sql/search_functions#search]{.underline}
🔗 [https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains]{.underline}