xayn is privacy-safe, personalized mobile web search powered by on-device ais

As readers of TechCrunch are aware, a significant challenge of the contemporary internet is the exchange of personal privacy for ease of use. The practice of online tracking is the method by which this ‘substantial compromise of personal information’ occurs. Extensive monitoring of users’ online activity supports the leading position of Google’s search engine and Facebook’s social network, representing two prominent examples of advertising-based business operations.
Verizon, the parent company of TechCrunch, also collects data from various sources – including mobile devices and media outlets such as this one – to enhance its advertising capabilities.
Numerous organizations depend on acquiring user data to generate some form of perceived benefit. Very few of these companies are fully forthcoming regarding the extent and nature of the private information they accumulate – or, indeed, precisely how they utilize it. However, could the internet function differently?
Xayn, a company based in Berlin, aims to alter this situation – beginning with personalized, yet privacy-respecting, web search on smartphones.
Today, they are releasing a search engine application (available on both Android and iOS) that delivers the benefits of customized results without the ‘typical’ practice of data collection. This functionality is achieved through the use of on-device AI models that learn directly on the user’s device. The key benefit is that no data is transmitted for uploading (although the trained AI models themselves are an exception).
The team developing the application, which includes 30% of members holding PhDs, has been addressing the core issue of privacy versus convenience for approximately six years (although the company was officially established in 2017); initially as an academic research endeavor – subsequently leading to the creation of an open-source framework for masked federated learning, known as XayNet. The Xayn app is built upon this framework.
To date, they have secured €9.5 million in early-stage investment – with funding originating from European venture capital firm Earlybird; Dominik Schiener (co-founder of Iota); and Thales AB, a Swedish company specializing in authentication and payment services.
They are now focused on commercializing their XayNet technology by integrating it into a user-facing search application – targeting a business model described by CEO and co-founder, Dr Leif-Nissen Lundbæk, as similar to that of “Zoom,” the widely used videoconferencing platform which offers both free and paid services.
This signifies that Xayn’s search functionality is not supported by advertising. Consequently, users will experience search results completely free of advertisements.
The strategy involves utilizing the consumer application as a demonstration of a b2b product powered by the same underlying AI technology. The offering to business and public sector clients is faster, more efficient corporate/internal search capabilities without jeopardizing the confidentiality of commercial data.
Lundbæk contends that organizations have a significant need for improved search tools to (securely) apply to their own data, citing studies that indicate search activities account for approximately 18% of global working time. He also references a study conducted by a municipal authority which revealed that employees spent 37% of their work hours searching for documents or other digital content.
“This is a business model that Google has attempted but been unable to successfully implement,” he states, adding: “We are addressing not only a challenge faced by everyday users but also by companies… For them, privacy is not merely desirable; it is essential for any possibility of utilization.”
On the consumer front, there will also be optional premium features available within the application – with the intention of offering it as a freemium download.
Swipe to nudge the algorithm
A significant aspect of Xayn’s recently released web search application is its provision of user control over the relevance of displayed content.
This is achieved through a swipe-based system, similar to the interface used in the Tinder app, allowing users to influence the direction of its personalization algorithm. This functionality begins on the home screen, which features news content tailored to the user’s location, and extends to the pages displaying search results.
The news-centric home screen represents another noteworthy feature. It appears that different types of home screen feeds may be available with premium subscriptions in the future.
The application also offers the ability to completely enable or disable personalized search results with a simple toggle – users can activate or deactivate the AI by tapping the brain icon located in the upper right corner. When the AI is deactivated, results cannot be swiped, though bookmarking and sharing options remain available.
The app includes a history section that displays searches conducted over the past seven days by default. Users can also view searches from Today, the past 30 days, or their entire history, with an option to clear their search data.
A ‘Collections’ feature is also incorporated, enabling users to create and manage folders for saved bookmarks.
While browsing search results, users can add an item to a Collection by swiping right and then selecting the bookmark icon, which prompts them to choose the desired folder.
The swipe interface is designed to be both familiar and intuitive, although the TestFlight beta version reviewed by TechCrunch experienced some loading delays.
Swiping left on a piece of content displays a bright pink warning symbol marked with an ‘x’. Continuing the swipe will remove the item from view, likely reducing the frequency of similar content in future results.
Conversely, swiping right indicates that a piece of content is useful, causing it to remain visible in the feed with a green outline. (Swiping right also reveals options to bookmark and share the content.)
Although several privacy-focused search engines already exist, such as DuckDuckGo in the US and Qwant in France, Xayn contends that the user experience offered by these alternatives often doesn’t match the relevance of results provided by tracking-based search engines like Google, potentially increasing the time required to find information.
In essence, prioritizing privacy with search engines like DuckDuckGo or Qwant may require more effort to obtain specific answers compared to using Google – representing a ‘convenience cost’ associated with protecting one’s privacy during web searches.
Xayn proposes a third, more intelligent approach that allows users to maintain their privacy while searching online. This involves utilizing AI models that learn directly on the user’s device and can be combined in a privacy-preserving manner, enabling personalized results without compromising data security.
“Privacy is absolutely fundamental… This means, much like other privacy-focused solutions, we do not track anything. No data is transmitted to our servers; we do not store any information, and we do not engage in any tracking whatsoever. Furthermore, we ensure that all connections are secure and prevent any possibility of tracking,” explains Lundbæk, detailing the team’s AI-powered, decentralized/edge-computing methodology.
On-device reranking
Xayn utilizes multiple search index sources, including Microsoft’s Bing, as stated by Lundbæk, and its approach shares similarities with DuckDuckGo, which also employs its own web crawling bots.
However, a key distinction lies in Xayn’s application of its own reranking algorithms to produce privacy-focused, personalized search results. This contrasts with DuckDuckGo’s business model, which relies on contextual advertising based on simple signals like location and search terms, without creating detailed user profiles.
Lundbæk points out a potential drawback of the simpler targeting methods used for advertising: users may encounter a high volume of ads, as businesses increase ad frequency to improve click-through rates. A large number of advertisements within search results can negatively impact the overall search experience.
“We obtain numerous results at the device level and perform some ad hoc indexing, constructing an index on the device and applying our search algorithms to filter and present only the most relevant information, while excluding everything else,” explains Lundbæk, outlining Xayn’s operational process. “Essentially, we may slightly reduce the prominence of less relevant results, but we also prioritize freshness, exploration, and ensuring you aren’t confined to a filter bubble.”
Certain aspects of Xayn’s technology fall within the realm of federated learning (FL), a technology Google has been investigating, including a ‘privacy-safe’ proposal for replacing third-party tracking cookies. Nevertheless, Xayn contends that Google’s interests, as a data-driven company, do not align with restricting its access to user data, even if it were to adopt FL for search.
Conversely, as a small, privacy-focused German startup, its priorities are significantly different. Therefore, the privacy-preserving technology it has developed over several years is genuinely dedicated to protecting user data, according to the company.
“Google actually has [fewer] personnel working on federated learning than our team,” Lundbæk observes, adding: “We have offered substantial criticism of TFF [Google-designed TensorFlow Federated]. While it is a form of federated learning, it lacks actual encryption, and Google has incorporated numerous backdoors into the system.”
“It’s important to understand Google’s objectives. Google aims to replace [tracking] cookies—and particularly, to eliminate the cumbersome process of obtaining user consent. However, they still desire your data. Their goal isn’t to enhance your privacy; rather, they intend to—ultimately—acquire your data even more easily. Purely federated learning does not provide a privacy solution.
“Significant effort is required to make it privacy-preserving. Pure TFF is certainly not privacy-preserving. Consequently, they will likely apply this technology to areas that pose challenges to user experience—such as cookies—but I would be very surprised if they used it directly for search. Even if they did, the numerous backdoors in their system would make it relatively simple to access the data using TFF. Therefore, I believe it’s merely a convenient workaround for them.”
“Data is fundamentally the core of Google’s business model,” he continues. “So, while any steps they take may appear positive, I believe Google is strategically maneuvering without making substantial changes.”
So, how does Xayn’s reranking algorithm function?
The app operates four AI models on each device, combining encrypted AI models from respective devices asynchronously—using homomorphic encryption—into a unified model. Subsequently, this collective model is returned to individual devices to personalize the content delivered, the company states.
The four AI models running on the device are dedicated to natural language processing; grouping interests; analyzing domain preferences; and computing context.
“The knowledge is retained, but the data remains exclusively on your device,” Lundbæk clarifies.
“We can train numerous AI models on your phone and determine whether to combine this knowledge or keep it localized on your device.”
“We have created a sophisticated solution involving four distinct AI models that operate in conjunction with each other,” he elaborates, noting that they work to establish “centers of interest and centers of dislikes” for each user—based on swipe interactions—which he says “must be highly efficient—they must be dynamic, evolving with your interests over time”.
As the user interacts more with Xayn, its personalization engine becomes more accurate through on-device learning, further enhanced by the ability for users to actively provide like/dislike feedback via swiping.
The level of personalization is highly individualized—Lundbæk refers to it as “hyper personalization”—in contrast to a tracking search engine like Google, which he notes also analyzes cross-user patterns to determine which results to display—something he asserts Xayn does not do.
Small Data vs. Big Data
Lundbæk explains that their focus is on understanding each individual user, leading to a “small data” challenge rather than attempting to analyze massive datasets. This necessitates exceptionally rapid learning – drawing significant insights from just eight to twenty user interactions. A key consideration in this fast-paced learning process is mitigating the formation of filter bubbles, or biased results, within the search engine.
To counteract the potential for echo chambers and filter bubbles, the Xayn engine operates through two distinct phases: ‘exploration’ and ‘exploitation’ – the latter simply meaning the engine leverages existing knowledge about the user to deliver relevant results.
He emphasizes the importance of continuous discovery, noting that this is the purpose behind one of the four AI models employed – a dynamic contextual multi-armed bandit reinforcement learning algorithm designed for contextual computing.
Beyond its privacy-focused design, Xayn believes its approach offers several advantages, including the ability to identify clear user interests and avoid the negative impact of tracking services that can discourage users from conducting certain searches.
“Users have direct control over the algorithm’s learning process,” he states. “They can easily indicate their preferences – whether to see more or less of a particular type of result – with a simple swipe gesture, allowing for straightforward system training.”
However, this method does present a potential drawback. The algorithm, when activated, may initiate learning even without explicit user feedback (likes or dislikes).
This places a responsibility on the user to actively provide feedback through swiping to achieve optimal search results. This is an active requirement, differing from the passive data collection and profiling practices of larger tech companies like Google, which, however, compromises user privacy.
Effectively utilizing the app requires an ongoing interaction cost, or at least a commitment to providing feedback to receive the most relevant results. Users may need to actively signal disinterest in irrelevant results rather than simply scrolling past them.
To maximize usefulness, it may be beneficial to carefully evaluate each item and provide the AI with a clear assessment of its value. (In a competitive digital landscape, minimizing any form of digital friction is crucial.)
Addressing this point, Lundbæk clarifies: “Without swiping, the AI learns from weak positive signals but not from negative ones. Learning still occurs when the AI is enabled, but it is limited and has a minimal impact. Patterns are identified from positive interactions, such as liking something after visiting a website. Furthermore, only one of the four AI models – the domain learning model – learns from simple clicks; the others do not.”
Xayn acknowledges the potential for the swiping mechanic to feel burdensome. The team intends to incorporate “some kind of gamification aspect” in the future, transforming the process from a source of friction into an enjoyable experience. The specifics of this implementation remain to be determined.
There is also an inherent delay in using Xayn compared to Google, due to the former’s reliance on on-device AI training, while Google processes data in the cloud using specialized hardware.
“We’ve dedicated over a year to this, with the primary goal of demonstrating its functionality,” Lundbæk admits. “And naturally, it’s slower than Google.”
“Google doesn’t face these on-device processing requirements and has even developed dedicated hardware, like TPUs, specifically for this type of model,” he continues. “Considering the hardware differences, it’s remarkable that we’ve managed to implement on-device AI processing on smartphones. However, it is undeniably slower than Google.”
The team is actively working to improve Xayn’s speed and anticipates further gains as they prioritize optimization, with a version potentially 40 times faster currently in development.
“While the final iteration may not be 40 times faster – as we’ll leverage this increased speed to analyze more content and provide a broader perspective – it will become faster over time,” he adds.
Regarding the accuracy of search results compared to Google, he contends that Google’s ‘network effect’ advantage – where search ranking improves with a larger user base – is not insurmountable, thanks to the capabilities of edge AI working effectively with ‘small data.’
However, Google remains the dominant search standard.
“Currently, we primarily benchmark ourselves against Bing and DuckDuckGo, where we consistently achieve superior results. However, Google is the market leader and employs significant personalization,” he explains when asked about benchmarking data.
“Interestingly, Google utilizes not only personalization but also a network effect. PageRank heavily relies on this effect, where more users lead to better results by tracking click-through rates.
“The key point is that, with advancements in AI technology – such as the approach we’ve taken – the network effect is becoming less significant. In fact, I’d argue that it’s no longer a major factor when competing with pure AI technology. Therefore, we can achieve results comparable to Google now, and potentially even surpass them over time. But we offer a different approach.”
Initial tests of the beta app revealed satisfactory search results for simple queries, with the potential for improvement through continued use. However, the slight processing delay was noticeable compared to established search engines.
This isn’t a critical flaw – merely a reminder that performance expectations in search are high, even with a commitment to user privacy.
A Potential Shift in the Competitive Landscape?
Lundbæk proposes that Google’s current dominance, largely due to a network effect, is becoming less pronounced, with increasing numbers of alternative search options emerging. He attributes this shift to growing user concerns regarding data privacy, which is creating a favorable environment for new competition within the search industry.
He emphasizes that the search market differs from social networks like Facebook, where a single platform often holds a monopoly. Lundbæk believes this situation is beneficial, as competition consistently fosters technical advancements and allows for a wider range of customer preferences to be met.
Naturally, any organization aiming to challenge Google’s substantial market share – exceeding 90% in Europe – faces the significant hurdle of attracting users from the established search engine.
Lundbæk explains that the startup is not currently planning large-scale marketing expenditures. Instead, they intend to prioritize sustainable growth by iteratively developing the product alongside a dedicated “tight community” of early adopters. Their strategy relies on collaborative promotion within the privacy-focused technology sector and engagement with influential figures.
He also anticipates that the current level of media attention surrounding privacy issues will contribute to increased visibility.
“We address a particularly timely and important issue,” he states. “Our goal is to demonstrate that privacy-respecting search is achievable, and to showcase that this approach can be applied to various applications.
“There’s a perception that you must choose between the large, data-collecting US companies and smaller, privacy-focused solutions that often compromise on user experience. We aim to disprove this notion and establish alternatives grounded in European principles.”
Indeed, discussions around technological sovereignty are prominent among EU policymakers, despite the continued popularity of major US tech companies among European consumers.
Furthermore, increasing regional data protection regulations are making it more difficult to depend on US-based services for data processing. Adherence to the GDPR data protection framework is an additional consideration for businesses, which is drawing attention to ‘privacy-preserving’ technologies.
According to Lundbæk, Xayn intends to reach a broader audience by expanding its business-to-business offerings, with the expectation that increased usage in the workplace will translate to adoption by individual users – a reversal of the consumerization trend previously driven by smartphones and bring-your-own-device policies.
“Through these methods, we believe we can organically grow our user base and spread awareness without the need for extensive marketing campaigns,” he explains.
While the initial launch focused on mobile applications, a desktop version is also scheduled for release in the first quarter of next year.
The team recognizes the complexity of developing a browser extension without creating a dedicated browser, acknowledging that competing with established browsers like Chrome and Firefox presents a considerable challenge in itself.
“We have built our entire AI using Rust, a secure programming language. Security and safety are paramount to our development process. A key benefit of Rust is its versatility – it can function across various platforms, from embedded systems to mobile devices, and can be compiled into web assembly for use as a browser extension in any browser,” he adds. “With the exception of Internet Explorer, naturally.”