Semantic search
Search that ranks by meaning and similarity (often embeddings) rather than exact keyword match, so "React front-end" can surface profiles that only say "UI engineer with hooks".
Michal Juhas · Last reviewed May 2, 2026
Who this is for
Sourcers drowning in title inflation and recruiters building shortlists from noisy databases.
In practice
- Define the vector unit: profile chunk, whole CV, or job description paragraph.
- Log thresholds: why did this row score above 0.72 today?
- Blend signals: tenure, recent skills, and semantic similarity together where the product allows.
Where it breaks
Thin profiles, internal mobility titles, and stealth mode companies produce meaningless vectors. Semantic search amplifies whatever text exists, including empty buzzwords.
From recent workshops
Sourcing automation conversations compare APIs versus "just prompting" for discovery. Semantic search sits in the API-heavy world: you need clean inputs, stable identifiers, and monitoring when a provider changes ranking behavior.
Literal Boolean versus semantic
| Need | Prefer |
|---|---|
| Exact cert or employer string | Boolean |
| Synonyms and adjacent skills | Semantic |
| Explainable shortlist to legal | Boolean slice + human read |
Related on this site
- Blog: Boolean search vs AI sourcing
- Tools: ChatGPT
- Guides: Sourcers
- Live cohort: Workshops
- Deepening skills: Become a member
Frequently asked questions
When should semantic search lead and when should Boolean lead?
Let Boolean or structured filters handle must-haves (location, authorization, level band). Use semantic ranking inside that slice to float similar wording. Sourcing automation workshops stress that order so you do not rank noise you should have excluded.
How is this different from asking ChatGPT to "find similar profiles"?
Productized semantic search uses embeddings and indexes built for scale, with reproducible scores. Ad hoc chat is great for exploration but weak for audit trails and repeatability unless you log prompts and sources.
What are the main quality risks?
False positives from generic buzzwords, domain ambiguity ("Python" the snake versus the language), and English-centric embeddings on multilingual markets. Always spot-check the tail of results.
Can semantic search replace reading profiles?
No. It prioritizes reading order. Humans still decide fit, outreach tone, and compliance. Pair with hallucination hygiene when models summarize what they found.
How does this relate to RAG?
Both use embeddings, but semantic search ranks candidates or documents for discovery; RAG retrieves chunks to answer a question with citations. Many stacks use both.
What should we read next?
Boolean search vs AI sourcing and AI sourcing tools for recruiters. Browse tools before you buy another vendor layer.
Do we need to store embeddings in the EU?
Treat embeddings like personal data if they represent people. Your DPO or counsel should align vendor regions, retention, and purpose limitation. Do not invent legal comfort; document decisions.