Vector Search
Graphmind includes a built-in HNSW (Hierarchical Navigable Small World) vector index for approximate nearest neighbor search. This lets you store embeddings alongside graph data and combine vector similarity with graph traversal.
Creating a Vector Index
Create an index on a specific label and property, specifying the vector dimensionality and similarity metric:
CREATE VECTOR INDEX myIdx FOR (n:Document) ON (n.embedding) OPTIONS {dimensions: 384, similarity: 'cosine'}
- index name -- a unique identifier for the index (e.g.,
myIdx) - FOR (variable:Label) -- the node label to index
- ON (variable.property) -- the property containing vector embeddings
- OPTIONS --
dimensions(integer) andsimilarity('cosine'or'l2')
Supported similarity metrics:
cosine-- cosine similarity (most common for text embeddings)l2-- Euclidean distance
Listing Vector Indexes
-- Show only vector indexes (with dimensions, similarity, and vector count)
SHOW VECTOR INDEXES
-- Show all indexes (property + vector)
SHOW INDEXES
SHOW VECTOR INDEXES returns: name, label, property, dimensions, similarity, vectors, type.
Via SDK (Embedded Mode)
Python:
client.create_vector_index("Document", "embedding", dimensions=384, metric="cosine")
Rust:
use graphmind_sdk::{EmbeddedClient, VectorClient, DistanceMetric};
let client = EmbeddedClient::new();
client.create_vector_index("Document", "embedding", 384, DistanceMetric::Cosine).await?;
Inserting Vectors
After creating nodes, add vectors to them:
Via SDK
Python:
# Create a node first
client.query('CREATE (d:Document {title: "Graph Databases"})')
# Get the node ID
result = client.query_readonly('MATCH (d:Document {title: "Graph Databases"}) RETURN id(d)')
node_id = result.records[0][0]
# Add the vector
embedding = [0.1, 0.2, -0.3, ...] # 384 dimensions
client.add_vector("Document", "embedding", node_id=node_id, vector=embedding)
Rust:
client.add_vector("Document", "embedding", node_id, &embedding_vec).await?;
SEARCH Clause (Recommended)
The SEARCH clause is used within MATCH and OPTIONAL MATCH to constrain patterns using approximate nearest neighbor (ANN) vector search. This is the recommended way to do vector search in Graphmind.
Syntax
[OPTIONAL] MATCH pattern
SEARCH binding_variable IN (
VECTOR INDEX index_name
FOR query_vector
[WHERE filter_predicate]
LIMIT top_k
) [SCORE AS score_alias]
- binding_variable -- must match a node variable from the MATCH pattern
- index_name -- name of an existing vector index (or the label it indexes)
- query_vector -- a vector literal like
[1, 2, 3], a parameter$embedding, or a property reference - WHERE (optional) -- in-index filter applied during the search (only property predicates with AND)
- LIMIT -- number of approximate nearest neighbors to return
- SCORE AS (optional) -- returns the similarity score (0.0 to 1.0, higher = more similar) as a named column
Basic Vector Search
-- Find the 4 most similar movies to the query vector
MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR [1, 2, 3]
LIMIT 4
)
RETURN movie.title AS title
With Similarity Score
MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR [1, 2, 3]
LIMIT 4
) SCORE AS similarityScore
RETURN movie.title AS title, similarityScore
Similarity scores are FLOAT values between 0.0 and 1.0. A score of 1.0 means the vectors are identical.
With In-Index Filtering
Filter during the vector search itself (more efficient than post-filtering):
MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR [1, 2, 3]
WHERE movie.rating > 7.5
LIMIT 4
)
RETURN movie.title, movie.rating
With Post-Filtering (WHERE outside SEARCH)
Post-filter results after the vector search:
MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR [1, 2, 3]
LIMIT 10
)
WHERE movie.rating > 8.0
RETURN movie.title, movie.rating
Using a Property as Query Vector
MATCH (snowWhite:Movie {title: 'Snow White'})
MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR snowWhite.embedding
LIMIT 4
)
RETURN movie.title AS title
OPTIONAL MATCH with SEARCH
If the query vector evaluates to null (e.g., a non-existing property), OPTIONAL MATCH returns null rows:
MATCH (m:Movie {title: 'Snow White'})
OPTIONAL MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR m.nonExistentProp
LIMIT 4
)
RETURN movie.title
CALL Procedure (Legacy)
The CALL db.index.vector.queryNodes procedure is still supported but the SEARCH clause is preferred:
CALL db.index.vector.queryNodes('Document', 'embedding', 10, [0.15, 0.25, ...])
YIELD node, score
RETURN node.title, score
ORDER BY score ASC
Via SDK
Python:
query_vector = [0.15, 0.25, ...] # same dimensionality as the index
results = client.vector_search("Document", "embedding", query_vector=query_vector, k=10)
for node_id, distance in results:
print(f"Node {node_id}: distance {distance:.4f}")
Rust:
let results = client.vector_search("Document", "embedding", &query_vec, 10).await?;
for (node_id, distance) in results {
println!("Node {:?} at distance {:.4}", node_id, distance);
}
Hybrid Queries
Combine vector search with Cypher graph traversal:
-- Find similar documents, then traverse to their authors
MATCH (doc:Document)
SEARCH doc IN (
VECTOR INDEX doc_embeddings
FOR [0.15, 0.25, 0.35]
LIMIT 5
) SCORE AS similarity
MATCH (doc)<-[:WROTE]-(author:Person)
RETURN doc.title, similarity, author.name
ORDER BY similarity DESC
-- Find documents similar to a query, then find related documents by shared tags
MATCH (doc:Document)
SEARCH doc IN (
VECTOR INDEX doc_embeddings
FOR [0.15, 0.25, 0.35]
LIMIT 3
)
MATCH (doc)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(related:Document)
WHERE related <> doc
RETURN DISTINCT related.title, collect(t.name) AS shared_tags
Use Cases
| Use Case | How |
|---|---|
| Semantic search | Embed document text, search by meaning |
| Recommendation | Find similar users/products by embedding features |
| RAG (Retrieval-Augmented Generation) | Store knowledge base embeddings, retrieve context for LLMs |
| Deduplication | Find near-duplicate records by vector distance |
| Image similarity | Store image embeddings, find visually similar items |
Performance
The HNSW index provides sub-millisecond search latency on datasets up to 1M vectors. Index construction time scales linearly with the number of vectors.
Limitations
- Vectors must all have the same dimensionality within an index.
- The index is held in memory. Memory usage is approximately
dimensions * 4 bytes * num_vectorsplus HNSW graph overhead. - The
SEARCHclause's in-indexWHEREfilter only supports property predicates joined withAND.OR,NOT,IN, and string operators are not supported in the in-index filter (use post-filtering with an outerWHEREinstead). - The
SEARCHclause pattern must have exactly one bound variable (the binding variable).