20 Gremlin Query Examples for Azure Cosmos DB
A practical cheat sheet of Gremlin traversal queries for Azure Cosmos DB — from basic CRUD and upserts to shortest paths, fraud detection, and RU optimization.
Why a Gremlin Cheat Sheet?
The Gremlin query language is powerful but has a steep learning curve — especially on Azure Cosmos DB, where partition keys, RU costs, and unsupported steps add complexity that standard TinkerPop tutorials don’t cover. This post gives you 20 copy-paste-ready queries organized by category, with Cosmos DB-specific tips for each one.
All examples use an e-commerce graph with product, person, and category vertices, but the patterns apply to any domain.
Creating Data
1. Add a Vertex with Partition Key
g.addV('product').property('id', 'prod-001').property('category', 'electronics').property('name', 'Wireless Headphones').property('price', 79.99).property('inStock', true)
Every vertex needs an id (unique within the partition) and the partition key property. If your container’s partition key is /category, you must include it as a property. Without it, Cosmos DB assigns a random partition and your queries become expensive cross-partition fan-outs.
2. Add an Edge
g.V(['electronics', 'prod-001']).addE('purchased_by').to(g.V(['customers', 'user-042'])).property('date', '2025-01-15').property('quantity', 2)
The tuple syntax ['partitionKeyValue', 'id'] is a Cosmos DB feature for efficient point reads. Edges are stored with their source vertex’s partition, so out() traversals are fast (single partition) while in() traversals are always cross-partition.
3. Upsert a Vertex (Create or Update)
g.V('prod-001').has('category', 'electronics').fold().coalesce(unfold(), addV('product').property('id', 'prod-001').property('category', 'electronics')).property('name', 'Wireless Headphones Pro').property('price', 89.99)
The fold() / coalesce() / unfold() pattern is the standard Gremlin upsert. If the vertex exists, unfold() returns it. If not, addV() creates it. Properties are then set either way. This pattern is critical for idempotent data pipelines — Microsoft’s own docs don’t cover it well.
Reading and Filtering
4. Filter by Property Value
g.V().hasLabel('product').has('category', 'electronics').has('price', gt(50)).valueMap('name', 'price')
Always include the partition key in your filter (here category) to avoid cross-partition queries. Use valueMap() instead of values() when you need key-value pairs, not just values.
5. String Filtering with TextP
g.V().hasLabel('product').has('category', 'electronics').has('name', TextP.containing('Wireless')).valueMap('name', 'price')
TextP.containing() does substring matching. Also available: startingWith(), endingWith(), notContaining(). These are supported on Cosmos DB but not well-documented.
6. Range Queries and Sorting
g.V().hasLabel('product').has('category', 'electronics').has('price', between(50, 150)).order().by('price', asc).valueMap('name', 'price')
The between() predicate is shorthand for gte(50).and(lt(150)). Other predicates: gt, gte, lt, lte, eq, neq. Always combine with limit() or range() to control RU cost.
7. Pagination with Range
g.V().hasLabel('product').has('category', 'electronics').order().by('name', asc).range(0, 10)
range(start, end) provides offset-based pagination. range(0, 10) returns the first 10 results, range(10, 20) returns the next 10. More RU-efficient than fetching everything and slicing client-side. Note: limit(n) is equivalent to range(0, n).
Traversing Relationships
8. Find Direct Neighbors
g.V('alice').out('knows').values('name')
out() follows outgoing edges, in() follows incoming, both() follows both. On Cosmos DB, prefer out() over in() — outgoing traversals stay within the source vertex’s partition, while incoming traversals fan out across all partitions.
9. Two-Hop Traversal with Dedup
g.V('alice').out('knows').out('knows').dedup().values('name')
Chain out() for multi-hop traversals. Without dedup(), the same vertex can appear multiple times through different paths — and the result set grows exponentially with each hop.
10. Get Edges with Properties
g.V('alice').outE('knows').as('e').inV().as('v').select('e', 'v').by('since').by('name')
Use outE() to access edge objects directly. The as() / select() / by() pattern projects properties from both edges and vertices into a clean result.
11. Find Common Connections
g.V('alice').out('knows').where(__.in('knows').has('id', 'bob')).values('name')
Finds mutual friends — people both Alice and Bob know. The where() step applies a nested traversal as a filter. The __ (double underscore) starts an anonymous traversal.
Updating and Deleting
12. Update a Property
g.V('prod-001').has('category', 'electronics').property('price', 69.99)
Setting a property that already exists overwrites it. Always include the partition key in the lookup to avoid a cross-partition read before the write.
13. Drop a Vertex
g.V('prod-001').has('category', 'electronics').drop()
drop() removes the vertex and all connected edges. Never run g.V().drop() in production — it scans the entire graph and can consume thousands of RUs. Instead, batch deletes with g.V().hasLabel('product').limit(100).drop() in a loop.
Advanced Patterns
14. Shortest Path
g.V('alice').repeat(both('knows').simplePath()).until(has('id', 'eve')).path().limit(1)
repeat() / until() does a breadth-first search (Cosmos DB always uses breadth-first, unlike standard TinkerPop which defaults to depth-first). simplePath() prevents cycles. Because of breadth-first traversal, limit(1) guarantees the shortest path.
15. Cycle Detection (Fraud Pattern)
g.V('acct-001').as('start').repeat(out('transferred_to')).times(5).where(eq('start')).path().by('accountId').dedup().limit(10)
Detects circular money transfer patterns up to 5 hops — a classic fraud detection query. The where(eq('start')) check confirms the traversal returned to the origin, proving a cycle exists.
16. Recommendation Engine
g.V('prod-001').has('category', 'electronics').in('purchased').out('purchased').where(neq('a')).as('a').groupCount().by('name').order(local).by(values, desc).limit(local, 5)
Collaborative filtering: start from a product, find customers who bought it, find what else they bought, rank by frequency. The top 5 results are your “Customers who bought this also bought” recommendations.
17. Variable-Depth Hierarchy
g.V('dept-engineering').repeat(out('manages')).until(outE('manages').count().is(0)).emit().path().by('name')
Recursively traverses a management tree until leaf nodes. emit() outputs vertices at every level, not just the leaves. The path().by('name') shows the full chain from root to each descendant.
18. Group and Count by Label
g.V().groupCount().by(label)
Returns a map of vertex label to count — a quick schema summary. This is one of the first queries to run on any new graph. Note: this is always a cross-partition query.
19. Find Most Connected Vertices
g.V().hasLabel('person').project('name', 'connections').by('name').by(both().count()).order().by(select('connections'), desc).limit(10)
Identifies hub nodes — the top 10 most-connected people. Useful for finding influencers in social networks or critical dependencies in infrastructure graphs.
20. Diagnose Query Cost with executionProfile
g.V().hasLabel('product').has('category', 'electronics').has('price', gt(50)).out('reviewed_by').valueMap('name', 'rating').executionProfile()
Append executionProfile() to any query to see detailed metrics: RU charge, time per step, and index utilization. This is the single most important tool for optimizing Cosmos DB Gremlin queries — use it to find which step is consuming the most RUs.
Cosmos DB Gotchas
- Partition key in every lookup: Always include
has('partitionKey', 'value')or use tuple syntaxg.V(['pkValue', 'id']). Without it, every lookup is a cross-partition fan-out. in()is expensive: Edges are stored with the source vertex.out()stays in one partition;in()fans out across all partitions. If you need fastin()traversals, add reverse edges.- Mid-traversal
.V()skips the index: Only the first.V()in a query uses the index. Subsequent.V()calls do full scans. Restructure with.map()or.union(). - No lambdas:
map { it.get() }and.by { it.value('x') }are not supported. Use standard steps like.map(__.values('x')). - No
match()step: Declarative pattern matching is not available on Cosmos DB. - Properties must be primitives: No nested objects. Flatten complex data into key-value pairs.
- GraphSON v2 only: Use
GraphSON2MessageSerializer, not v3. Cosmos DB returns non-standard untyped GraphSON (known issue TINKERPOP-2581). - Batch large deletes:
g.V().drop()on a large graph can time out or consume massive RUs. Delete in batches of 100-500 withlimit().
Next Steps
- Download GremlinStudio to run these queries with graph visualization, autocomplete, and RU cost display
- Read the Query Editor docs for keyboard shortcuts and formatting
- Learn how to debug traversals step by step with the built-in Gremlin debugger
- Set up a local Cosmos DB Emulator to practice without cloud costs
Happy querying!