Skip to main content

Vector Search

Graphmind includes a built-in HNSW (Hierarchical Navigable Small World) vector index for approximate nearest neighbor search. This lets you store embeddings alongside graph data and combine vector similarity with graph traversal.

Creating a Vector Index

Create an index on a specific label and property, specifying the vector dimensionality and similarity metric:

CREATE VECTOR INDEX myIdx FOR (n:Document) ON (n.embedding) OPTIONS {dimensions: 384, similarity: 'cosine'}
  • index name -- a unique identifier for the index (e.g., myIdx)
  • FOR (variable:Label) -- the node label to index
  • ON (variable.property) -- the property containing vector embeddings
  • OPTIONS -- dimensions (integer) and similarity ('cosine' or 'l2')

Supported similarity metrics:

  • cosine -- cosine similarity (most common for text embeddings)
  • l2 -- Euclidean distance

Listing Vector Indexes

-- Show only vector indexes (with dimensions, similarity, and vector count)
SHOW VECTOR INDEXES

-- Show all indexes (property + vector)
SHOW INDEXES

SHOW VECTOR INDEXES returns: name, label, property, dimensions, similarity, vectors, type.

Via SDK (Embedded Mode)

Python:

client.create_vector_index("Document", "embedding", dimensions=384, metric="cosine")

Rust:

use graphmind_sdk::{EmbeddedClient, VectorClient, DistanceMetric};

let client = EmbeddedClient::new();
client.create_vector_index("Document", "embedding", 384, DistanceMetric::Cosine).await?;

Inserting Vectors

After creating nodes, add vectors to them:

Via SDK

Python:

# Create a node first
client.query('CREATE (d:Document {title: "Graph Databases"})')

# Get the node ID
result = client.query_readonly('MATCH (d:Document {title: "Graph Databases"}) RETURN id(d)')
node_id = result.records[0][0]

# Add the vector
embedding = [0.1, 0.2, -0.3, ...] # 384 dimensions
client.add_vector("Document", "embedding", node_id=node_id, vector=embedding)

Rust:

client.add_vector("Document", "embedding", node_id, &embedding_vec).await?;

The SEARCH clause is used within MATCH and OPTIONAL MATCH to constrain patterns using approximate nearest neighbor (ANN) vector search. This is the recommended way to do vector search in Graphmind.

Syntax

[OPTIONAL] MATCH pattern
SEARCH binding_variable IN (
VECTOR INDEX index_name
FOR query_vector
[WHERE filter_predicate]
LIMIT top_k
) [SCORE AS score_alias]
  • binding_variable -- must match a node variable from the MATCH pattern
  • index_name -- name of an existing vector index (or the label it indexes)
  • query_vector -- a vector literal like [1, 2, 3], a parameter $embedding, or a property reference
  • WHERE (optional) -- in-index filter applied during the search (only property predicates with AND)
  • LIMIT -- number of approximate nearest neighbors to return
  • SCORE AS (optional) -- returns the similarity score (0.0 to 1.0, higher = more similar) as a named column
-- Find the 4 most similar movies to the query vector
MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR [1, 2, 3]
LIMIT 4
)
RETURN movie.title AS title

With Similarity Score

MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR [1, 2, 3]
LIMIT 4
) SCORE AS similarityScore
RETURN movie.title AS title, similarityScore

Similarity scores are FLOAT values between 0.0 and 1.0. A score of 1.0 means the vectors are identical.

With In-Index Filtering

Filter during the vector search itself (more efficient than post-filtering):

MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR [1, 2, 3]
WHERE movie.rating > 7.5
LIMIT 4
)
RETURN movie.title, movie.rating

With Post-Filtering (WHERE outside SEARCH)

Post-filter results after the vector search:

MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR [1, 2, 3]
LIMIT 10
)
WHERE movie.rating > 8.0
RETURN movie.title, movie.rating

Using a Property as Query Vector

MATCH (snowWhite:Movie {title: 'Snow White'})
MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR snowWhite.embedding
LIMIT 4
)
RETURN movie.title AS title

If the query vector evaluates to null (e.g., a non-existing property), OPTIONAL MATCH returns null rows:

MATCH (m:Movie {title: 'Snow White'})
OPTIONAL MATCH (movie:Movie)
SEARCH movie IN (
VECTOR INDEX moviePlots
FOR m.nonExistentProp
LIMIT 4
)
RETURN movie.title

CALL Procedure (Legacy)

The CALL db.index.vector.queryNodes procedure is still supported but the SEARCH clause is preferred:

CALL db.index.vector.queryNodes('Document', 'embedding', 10, [0.15, 0.25, ...])
YIELD node, score
RETURN node.title, score
ORDER BY score ASC

Via SDK

Python:

query_vector = [0.15, 0.25, ...]  # same dimensionality as the index
results = client.vector_search("Document", "embedding", query_vector=query_vector, k=10)
for node_id, distance in results:
print(f"Node {node_id}: distance {distance:.4f}")

Rust:

let results = client.vector_search("Document", "embedding", &query_vec, 10).await?;
for (node_id, distance) in results {
println!("Node {:?} at distance {:.4}", node_id, distance);
}

Hybrid Queries

Combine vector search with Cypher graph traversal:

-- Find similar documents, then traverse to their authors
MATCH (doc:Document)
SEARCH doc IN (
VECTOR INDEX doc_embeddings
FOR [0.15, 0.25, 0.35]
LIMIT 5
) SCORE AS similarity
MATCH (doc)<-[:WROTE]-(author:Person)
RETURN doc.title, similarity, author.name
ORDER BY similarity DESC
-- Find documents similar to a query, then find related documents by shared tags
MATCH (doc:Document)
SEARCH doc IN (
VECTOR INDEX doc_embeddings
FOR [0.15, 0.25, 0.35]
LIMIT 3
)
MATCH (doc)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(related:Document)
WHERE related <> doc
RETURN DISTINCT related.title, collect(t.name) AS shared_tags

Use Cases

Use CaseHow
Semantic searchEmbed document text, search by meaning
RecommendationFind similar users/products by embedding features
RAG (Retrieval-Augmented Generation)Store knowledge base embeddings, retrieve context for LLMs
DeduplicationFind near-duplicate records by vector distance
Image similarityStore image embeddings, find visually similar items

Performance

The HNSW index provides sub-millisecond search latency on datasets up to 1M vectors. Index construction time scales linearly with the number of vectors.

Limitations

  • Vectors must all have the same dimensionality within an index.
  • The index is held in memory. Memory usage is approximately dimensions * 4 bytes * num_vectors plus HNSW graph overhead.
  • The SEARCH clause's in-index WHERE filter only supports property predicates joined with AND. OR, NOT, IN, and string operators are not supported in the in-index filter (use post-filtering with an outer WHERE instead).
  • The SEARCH clause pattern must have exactly one bound variable (the binding variable).