BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most Search-Savvy'

Models & Benchmarks · Published: May 23, 2024 · David Kowalski · ~13 min read

Author

David Kowalski · Developer Tools & Agents Editor

Coding agents and IDE workflows tested the way working teams use them.

As developers, we often struggle with AI assistants that hallucinate answers or provide shallow summaries instead of digging into actual sources. BaiChuan Intelligence claims to solve this friction by launching Baixiaoying, an assistant designed specifically to understand search mechanics and use guided questioning.

The seeds planted by Wang Xiaochuan during the search era have blossomed again in the age of large language models.

His startup, Baichuan Intelligence, has just released its first AI application for consumers: Baixiaoying.

At first glance, it appears to be another mainstream AI assistant. However, the company emphasizes that this assistant is unique because it understands search and employs guided questioning techniques.

Indeed, combining “search” with “Wang Xiaochuan” naturally sparks curiosity.

Baixiaoying can answer user questions at any time, rapidly read documents, organize materials, and assist in content creation. It also possesses capabilities such as multi-turn search and targeted search, enabling it to more accurately understand and meet user needs.

Baichuan explained that equipping the model with professional search skills is intended to “provide users with professional, rich knowledge and resources.”

Furthermore, it supports voice interaction.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 1

We have tested the app. Users can now download Baixiaoying from the iOS App Store, Android application markets, or Baichuan Intelligence’s official website. Alternatively, it is available for free use via the Ying.ai web interface.

Behind Baixiaoying lies Baichuan 4, Baichuan Intelligence’s newly unveiled next-generation foundational large model. The ability to interact via voice hints that this new model possesses multimodal capabilities.

It hit the market immediately upon release, competing in SuperCLUE (a comprehensive Chinese evaluation benchmark for general large models). It set a new domestic record with a total score of 80.64 and narrowly defeated GPT-4-Turbo-0125 by 1.51 points in the comprehensive Chinese ability test.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 2

By unveiling Baichuan 4 alongside Baixiaoying—an AI assistant that understands search and asks questions—Baichuan Intelligence has taken another step toward its planned super model and super application. This year, the company broke from its previous monthly update rhythm, holding back a major release…

How to Use Baixiaoying?

Baixiaoying marks Baichuan Intelligence’s first AI application since the company launched over a year ago. Like most generalist assistants, it handles long-text reading and multimodal understanding out of the box. But what actually sets it apart is its core design philosophy: it understands search and knows how to ask questions.

This isn’t just about retrieving data; it’s about integrating Baichuan 4’s foundational capabilities with dedicated search technology. The team demonstrated three specific ways Baixiaoying executes this proficiency, starting with targeted searches. When you pose a question, the model identifies the domain and extracts key information directly from authoritative sources to enrich its output. The priority here is speed and accuracy.

For more complex queries, it supports multi-turn searches. Instead of a single shot, Baixiaoying breaks down questions step-by-step to uncover what the user truly seeks before delivering an answer. This approach gathers deeper, professional insights in scenarios like market research or industry analysis, far surpassing the utility of single-turn searches.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 3

The third differentiator is embedded search results. Unlike other AIs that simply “summarize webpage information” after a single call, Baixiaoying integrates search results directly as viewpoints and arguments within its responses. The team was clear about their stance on this:

“Something like Perplexity is called summarizing search results. We believe that direction should be the work of Search 2.0, which search engine companies can handle themselves; it is not what we aim to do.”

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 4

I think embedding sources as arguments feels more rigorous than generic summarization. As a builder, multi-turn logic is essential for any serious research workflow. Personally, structured tables make scanning complex data much faster.

When outputting information, Baixiaoying emphasizes structured output, presenting key details via description plus tables to highlight main points at a glance.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 5

Empowered by multi-turn, targeted, and embedded search, the assistant genuinely “understands” the mechanics of information retrieval. The team acknowledges that while integrating models with search improves accuracy and reduces hallucinations, technology alone isn’t enough; product design must support it. My initial experience confirms that this approach boosts timeliness and enriches answers with case studies and data, making responses more comprehensive.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 6

Beyond searching, Baixiaoying also focuses on how it “asks questions.” Non-professional users often provide vague descriptions of their needs. To address this, the assistant guides users step-by-step through questioning based on their initial query, helping them clearly articulate requirements.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 7

I think guided questioning lowers the barrier for non-technical users. As a builder, good UX design is just as critical as model capability.

These features aim to lower the entry barrier for ordinary people using AI assistants, making them more user-friendly while ultimately delivering useful answers. Baixiaoying handles long-text reading and multimodal understanding with ease. Below are showcases of these capabilities; you can try them yourself (finding bugs and testing limits is perhaps the most anticipated activity in the era of large models). The multimodal test results were quite good; it accurately identified a half-face sculptural bust in a mus

BaiChuan’s New Model Tops Chinese Benchmark; First AI Assistant ‘Bai Xiaoying’ Launched as ‘Most Search-Savvy’

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 8

Its long-text capability allowed it to pass the test of reading financial reports smoothly:

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 9

I followed the rollout and saw that Baixiaoying covers daily conversation, office work, search, learning, and multimodal recognition. It feels like a broad-scope tool rather than a niche specialist.

Personally, broad scope is useful for general tasks but lacks deep specialization. I think multimodal support matters if I need to process images alongside text. As a builder, office work integration needs to be seamless, not clunky.

However, Baichuan Intelligence’s Founder and CEO boldly stated that this is not the “super application” he previously mentioned would be launched.

Currently, there are neither super models nor super applications in the market.

In his words, Baixiaoying is currently an AI assistant, serving as an intermediate stage where user applications transform from “tools” into “partners” in the age of large models. The entire process is one of gradual development and gradually meeting user needs.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 10

Behind the Scenes: Baichuan 4 Takes the Top Spot in Its Debut

As I mentioned earlier, powering Baixiaoying is Baichuan 4, Baichuan Intelligence’s latest iteration. It marks their first foray into multimodal models since entering the large model arena.

Compared to its predecessor, Baichuan 3 (released late January), Baichuan 4 shows significant improvements across various capabilities. Specifically, instruction following improved by 20%, information understanding by 9%, knowledge Q&A by 15%, creation by 16%, and logical reasoning by 15%. In specialized abilities, mathematics improved by 14% and coding by 9%.

In its debut match on the SuperCLUE comprehensive benchmark, which has long been dominated by OpenAI, Baichuan 4 took first place:

With a total score of 80.64, it surpassed the previous top-ranked model by 0.61 points.

It was indeed a narrow victory…

However, although the margin is small, in the era of large models, even a difference of 0.01 points is considered precious.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 11

How was this achieved?

During the training process, Baichuan 4 introduced technical optimization methods, including collaborative data filtering and optimization based on model-based and human-based approaches. It also utilized a scientific Scaling Law for positional encoding in long-text modeling, effectively enhancing the model’s utilization of data.

In the alignment phase, the team focused on optimizing Baichuan 4’s Reasoning, Planning, and Instruction Following capabilities. This was achieved through loss-driven data selection and training, multi-stage progressive improvement, and multi-model parameter fusion.

Furthermore, the team proposed a Sequential Preference Optimization (SPO) method during this stage. By sequentially fine-tuning LLMs to align with multiple dimensions of human preference, key metrics and model stability were significantly improved.

They also broke through RLHF and RLAIF integration via the RLxF reinforcement learning alignment technology, greatly enhancing the model’s instruction-following abilities.

Additionally, Baichuan 4 possesses industry-leading multimodal capabilities, performing excellently on evaluation benchmarks such as MMMU, MMBench-EN, CMMMU, MMBench-CN, and MathVista, outperforming multimodal models like Gemini Pro and Claude 3 Sonnet.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 12

However, Baichuan 4 did not take the open-source route this time.

Wang Xiaochuan responded generously to this decision: “Last year, we took the lead in opening source as a pledge of commitment for the Baichuan team entering the large model space. At that time, the domestic open-source environment was very immature. Our initiative in open source made an important contribution to the domestic open-source industry. Now, there are many players competing in the open-source field.”

Friends, you must believe in market regulation mechanisms—said Wang Xiaochuan.

Although Baichuan 4 is closed-source, API supply remains available.

After opening the new generation of foundational models to the public, they simultaneously released four model APIs: Baichuan 4, Baichuan3-Turbo, Baichuan3-Turbo-128k, and Assistant API.

They are also divided into Flagship and Professional tiers. The Flagship tier fully opens all capabilities of Baichuan 4; the Professional tier offers Baichuan3-Turbo, which is more affordable than the Flagship version, performs better than Baichuan 2, and has been specifically optimized for high-frequency enterprise application scenarios.

Interestingly, although the Assistant API is also open for free trial by enterprise users, Baichuan’s stance on the recent intense large model price war was clear:

What? Price wars? We decline.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 13

First, Wang Xiaochuan clarified that their primary focus is on the to-C (consumer-facing) market. Cloud providers’ price Grappling matches have little impact on Baichuan.

Secondly, he holds a firm stance, believing that intense compet

Personally, closed-source models limit my ability to audit safety or fine-tune for specific internal workflows. I think aPI pricing tiers matter less if the underlying model doesn’t fit our existing integration stack.

BaiChuan’s New Model Tops Chinese Benchmark; First AI Assistant ‘Bai Xiaoying’ Launched as ‘Most Search-Savvy’

The API-first model that many startups rely on is hitting a wall in China. Wang Xiaochuan argues that while competition is inevitable, the current aggressive tactics are unsustainable for new entrants. “In the Chinese market, providing API services is not a viable path for startup companies.”

He breaks down why the economics don’t add up compared to the US market:

“If we look at it purely from a business perspective, China’s current commercial environment means the To-B market is roughly ten times smaller than the To-C market. Such a disparity does not exist in the United States;

Secondly, when analyzing data, you find that while revenue is collected in RMB, computing power costs are incurred in USD. This highlights another significant difference between the Chinese and American API service markets.”

For Baichuan Intelligence, there is internal consensus on one point: it is essential to pursue differentiated strategies.

“Simply competing on price might give leading startups an advantage through low-cost models, but relying solely on low prices as a competitive edge is insufficient for market success.”

As a builder, aPI arbitrage is dead in China; differentiation is the only way out. Personally, currency mismatch kills margins for To-B AI startups here.

BaiChuan’s New Model Tops Chinese Benchmark; First AI Assistant ‘Bai Xiaoying’ Launched as ‘Most Search-Savvy’

Why does the first To-C product look like this?

I’ve been watching the industry scramble for “super apps” built on large language models, a trend Wang Xiaochuan predicted back in 2024. While many LLM startups rushed to launch consumer offerings, Baichuan Intelligence stayed calm amidst the price wars, opting for a measured approach.

Wang Xiaochuan smiled and said, “I don’t think Baixiaoying was released too late; on the contrary, I believe it was released too early. I think model applications require more time for refinement.”

He noted that an app with millions of Daily Active Users (DAU) is still far from earning the title of “super application.” Previously, companies released apps primarily to showcase their models, but now users often remain unclear about what these apps actually do.

The entire industry has not yet reached a mature state.
Having previously developed input methods, search engines, and browsers, we deeply understand that there is an optimal timing for when an application evolves into a widely used product.

Therefore, whether Baixiaoying’s debut was early or late is irrelevant; Baichuan Intelligence simply chose the right moment to introduce it to the industry, allowing the team to operate it more concretely.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 15

Before aiming for super applications, Baichuan Intelligence deployed Baixiaoying as the first warrior to face scrutiny on the battlefield, a decision rooted in logic.

As mentioned earlier, Baichuan believes that unlike products in the information age defined by their tool-like attributes, large models create new species.

Transforming AI from a tool into a partner means building an AI assistant based on large models is akin to “creating a human being.”

Just as humans can use tools, think, listen, read, see, and write, AI assistant products should possess corresponding capabilities as model performance continuously improves.

Search serves as the most critical tool for current large models. It not only enables real-time access to the latest information but also effectively mitigates hallucination issues, making it a key technology for LLMs and a primary exploration direction for Baichuan Intelligence. When releasing Baichuan-53B last year, the team already introduced the concept of search enhancement, with RAG (Retrieval-Augmented Generation) technology remaining at the forefront.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 16

Based on this philosophy, Baichuan chose an AI assistant proficient in search to fire the first shot in its To-C scenario.

Thus, the distinct “Baichuan flavor” embedded in Baixiaoying is immediately apparent:

An intermediate state of AI partnership + Baichuan model advantages + Sogou’s deep search expertise + accumulated past product experience.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 17

Regarding the future after the debut of its first application, Wang Xiaochuan hinted at a few details.

The vision is naturally to achieve super models and super applications that are reliable and must integrate with search.

The breakthrough point should be enabling AI to act like professionals in various industries, incorporating the data density and cognitive depth specific to those fields to ensure usability.

As for the direction of future iterations—well, Wang Xiaochuan kept it a secret; no matter how he was pressed, he refused to say more.

However, during the post-launch communication session, he inadvertently let some clues slip!

He mentioned that one reason AI assistants need to ask questions is to accumulate capabilities for future super applications. He gave an example: “If you go to a doctor and say you have a fever, and the AI directly gives you a result, that would certainly be unfeasible.”

He also referenced a recent interview with Geoffrey Hinton, Turing Award winner and mentor of Ilya Sutskever, in which the veteran stated that healthcare is the most promising application area f

I read through the coverage of BaiChuan’s latest moves, which center on two main announcements: a new model that tops Chinese benchmarks and the launch of their first AI assistant, “Bai Xiaoying,” marketed as the “most search-savvy.” The narrative leans heavily into practical application, particularly in sectors like healthcare.

As one observer noted during an interview-heavy discussion (he suggested everyone watch this interview), the direction feels distinctly aligned with Wang Xiaochuan’s strategic style.

I think search-aware assistants could change how I find answers inside large codebases. As a builder, benchmark wins mean little if the assistant can’t handle my specific repo context. Personally, healthcare applications raise serious privacy questions that generic models often overlook.

For now, let us simply try using Baixiaoying and wait a little longer…

References

I reviewed the source material to verify the benchmark claims and context surrounding this release.

Intensifying Competition in China’s Domestic Large Models: Baichuan Intelligence’s “Baichuan4” Gets First Full-Network Test, Refreshing SuperCLUE Chinese Benchmark with a Total Score of 80.64 — Release of Baichuan4’s SuperCLUE Chinese Benchmark Evaluation Results

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most Search-Savvy'

Author

How to Use Baixiaoying?

BaiChuan’s New Model Tops Chinese Benchmark; First AI Assistant ‘Bai Xiaoying’ Launched as ‘Most Search-Savvy’

Behind the Scenes: Baichuan 4 Takes the Top Spot in Its Debut

BaiChuan’s New Model Tops Chinese Benchmark; First AI Assistant ‘Bai Xiaoying’ Launched as ‘Most Search-Savvy’

BaiChuan’s New Model Tops Chinese Benchmark; First AI Assistant ‘Bai Xiaoying’ Launched as ‘Most Search-Savvy’

Why does the first To-C product look like this?

References

Comments

Related News

Latest Headlines