<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Internationalization on Build in Public</title><link>https://build.ralphmayr.com/tags/internationalization/</link><description>Recent content in Internationalization on Build in Public</description><generator>Hugo</generator><language>en-us</language><copyright>©️ Ralph Mayr 2026</copyright><lastBuildDate>Fri, 03 Oct 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://build.ralphmayr.com/tags/internationalization/index.xml" rel="self" type="application/rss+xml"/><item><title>Be careful when counting your whitespace</title><link>https://build.ralphmayr.com/posts/95-be-careful-when-counting-your-whitespace/</link><pubDate>Fri, 03 Oct 2025 00:00:00 +0000</pubDate><guid>https://build.ralphmayr.com/posts/95-be-careful-when-counting-your-whitespace/</guid><description>&lt;p&gt;One of the great things about working on poketto.me is that I'm constantly learning about fascinating linguistic subtleties. For instance, while working on automatic content summaries and extracting key facts and figures, I came across an interesting issue with token counting in Chinese script.&lt;/p&gt;
&lt;p&gt;I had put a safeguard in place so that poketto.me would only attempt to summarize content longer than 100 words. This works well for German and English content, but when I tested the feature on an article published by Xinhua, a Chinese news agency, my code said the article had only about 12 words, which was obviously incorrect, so it didn't produce a summary.&lt;/p&gt;</description></item></channel></rss>