Before Enterprise AI Went Cloud-First, Gerev Bet on Local Search

By Olivia Carter

May 18, 2023

3 minutes

OpsMatters

Five months after ChatGPT entered the consumer market, one of the most urgent questions inside enterprise information-technology departments still has few settled answers: whether the new generation of large-language-model tools can be brought to bear on a company’s internal documents, codebases, and chat archives, and what that means for the company’s data-security posture. The default answer from the venture-backed entrants in workplace AI has been: yes, in our cloud.

Glean, a four-year-old enterprise-search startup founded by former Google engineers, runs its retrieval system as a managed service. It raised a $100 million Series C in May 2022 at a valuation reported above $1 billion. In March, Microsoft announced Copilot, which operates against tenant data inside Microsoft’s cloud. Notion’s AI assistant, which shipped in February, takes a similar cloud-based approach.

These companies are betting that compute, models, and integrations will live in the cloud. The trade-off, which security teams have been increasingly vocal about in recent months, is that the company’s most sensitive internal text may need to make a round trip to someone else’s infrastructure for the system to be useful.

Gerev.ai makes a different argument. The project, which went public on GitHub roughly two months ago, runs entirely on customers’ machines. It connects, through a pluggable architecture of source adapters, to a company’s Slack workspace, Confluence wiki, Google Drive, Jira tracker, Mattermost server, and BookStack pages. Employees write queries in natural language, and the system retrieves and composes answers from the company’s own corpus, with no data crossing the organization’s perimeter to reach a third party.

That architecture is the product’s main claim. Gerev is built around source adapters that allow administrators to add new data sources without rewriting the indexing pipeline. It uses a normalized intermediate representation so the search layer can work across different systems, and it is designed to run model orchestration against on-premise compute rather than a vendor-hosted service.

In roughly two months, Gerev’s repository has accumulated more than two thousand GitHub stars. A community of more than a hundred developers has also formed on Discord, with many experimenting with the system inside their own employers’ environments.

The force behind Gerev is Yuval Steuer, a twenty-three-year-old engineer in Tel Aviv who completed his mandatory military service last year and has been working on the project with another engineer in the months since. Steuer graduated from GAMA, one of the Israeli military’s most competitive technical-selection programs, and later held a cyber-operator role in the IDF Intelligence Corps. That background places him in a technical culture that has produced a large number of Israeli infrastructure and cybersecurity founders. For Gerev, the relevance is direct: the project’s starting assumptions are security, local deployment, and engineering-led adoption.

What stands out, against the backdrop of Glean’s nine-figure war chest and Microsoft’s Copilot rollout, is the kind of organizations whose engineers Steuer says have already pulled the repository into internal work. According to Steuer, engineers at companies including Boeing, Airbus, Bank of America, the Golden State Warriors, Kohl’s, and Adobe have experimented with Gerev or used it in internal workflows. Those are not necessarily vendor relationships or formal deployments, which is part of the point: open-source infrastructure often enters large companies through individual engineers before it appears in procurement systems.

“The way it’s been spreading is the way open source spreads,” Steuer says. “One engineer at a big company pulls the repo, runs it against some of their own data, sees that it works, and shows it to the rest of the team.”

The teams adopting Gerev describe two main uses. Engineering organizations want an internal search tool that can answer questions across company documentation rather than simply surfacing literal keyword matches. Others describe the project as a possible answer to a category of internal request that has historically been difficult to satisfy: a new employee asking how a particular system works, a product manager trying to reconstruct the rationale for a decision made by a team that has since dispersed, or a security investigator trying to trace a piece of information across systems whose maintainers have moved on.

The systems already in place often struggle with those questions because the relevant context is scattered across tools, teams, and time. Gerev’s early pilots suggest that a locally deployed search layer can make some of that institutional memory easier to retrieve.

Steuer acknowledges that Glean, Microsoft, Notion, and a half-dozen smaller venture-backed entrants will keep growing. They have sales teams, go-to-market budgets, and product roadmaps that compound on the integrations they have built into major workplace platforms. But he argues that Gerev has an advantage with the engineers choosing what to pull, deploy, and advocate for inside their companies.

“The implications of sending internal data to a third-party model are still working their way through enterprise security reviews,” Steuer says. “That makes the engineer’s choice unusually consequential. The repositories engineers pull, the projects they recommend, and the architectures they advocate for in design reviews are where a meaningful share of enterprise software decisions begin, long before procurement hears about them.”

Elasticsearch and Redis cut this path through enterprise software more than a decade ago. Both began as open-source projects, spread through engineer-to-engineer adoption inside large organizations, and became standard infrastructure for entire categories.

For enterprise IT leaders, the decision is no longer only which AI-search vendor to buy. It is also where the company’s internal knowledge should live while the system reasons over it. Gerev’s early traction suggests that some engineers want an answer that stays inside the perimeter. Two months into its public life, the project is still closer to a wedge than a company. The question is whether that wedge is wide enough for a small open-source project to matter in a category already being surrounded by giants.

Before Enterprise AI Went Cloud-First, Gerev Bet on Local Search

Monthly Archive

Follow Us