5 Gremlin Anti-Patterns Costing You on Cosmos DB
Common Gremlin query mistakes that silently inflate your Azure Cosmos DB bill. Learn the anti-patterns, understand why they're expensive, and see concrete before/after fixes.
The Hidden Cost of Flexible Queries
Gremlin on Azure Cosmos DB is powerful. The query language lets you express complex graph traversals in a few lines, and Cosmos DB handles the scaling. But that flexibility comes with a trap: it is remarkably easy to write queries that return the correct results while consuming 10x more Request Units than necessary.
RU cost is directly tied to your Azure bill. Every query charges RUs based on the amount of data read, the number of index lookups, and the volume of results returned. A poorly written query against a production graph with hundreds of thousands of vertices can burn through your provisioned throughput in seconds.
Here are five anti-patterns we see repeatedly in real Cosmos DB Gremlin workloads, along with concrete fixes for each one.
Anti-Pattern #1: The Unfiltered Full Scan
The mistake:
g.V()
Or its slightly more specific cousin:
g.V().hasLabel('person')
Why it costs you: Both of these scan every matching vertex in the graph. Without a .limit() or a .has() filter on an indexed property, Cosmos DB reads through every document that matches. In a development graph with 200 vertices, this feels instant. In production with 100K+ vertices, a single g.V() can consume thousands of RUs and potentially time out.
The fix: Always constrain your queries. For exploration, add .limit(). For production code, filter on indexed properties.
// Exploration: always limit
g.V().hasLabel('person').limit(25)
// Production: filter on indexed properties
g.V().has('person', 'department', 'engineering')
This one change alone can reduce the RU cost of exploratory queries by orders of magnitude.
Anti-Pattern #2: Late Filtering
The mistake:
g.V().out('knows').out('likes').has('category', 'tech')
Why it costs you: This traversal starts from every vertex in the graph, follows all knows edges, then follows all likes edges, and only then filters by category. If your graph has 1,000 starting vertices and each has an average of 50 outgoing knows edges, the second step is already processing 50,000 traversers. The third step fans out again before the filter discards most of the results.
Every traverser at every step costs RUs. The later you filter, the more intermediate work Cosmos DB performs.
The fix: Push filters as early as possible. Each step that reduces the traverser count saves RUs on every subsequent step.
g.V().has('person', 'interest', 'tech').out('knows').out('likes')
By starting with only the vertices you actually care about, you might reduce the initial set from 1,000 to 50 — and every downstream step processes 20x fewer paths.
Anti-Pattern #3: Fetching All Properties When You Need Two
The mistake:
g.V().hasLabel('person').valueMap()
Why it costs you: .valueMap() returns every property on every matched vertex. If your person vertices carry 20 properties (name, email, address, preferences, metadata) but you only need name and email, you are transferring and deserializing 10x more data than necessary. On Cosmos DB, the response payload size directly contributes to RU cost.
There is also a Cosmos DB-specific quirk: .valueMap() wraps every value in an array ({name: ["Alice"]} instead of {name: "Alice"}), which complicates client-side processing.
The fix: Use .project() to select only the properties you need.
g.V().hasLabel('person')
.project('name', 'email')
.by('name')
.by('email')
For a single property, .values() is even simpler:
g.V().hasLabel('person').values('name')
Both approaches return clean scalar values instead of wrapped arrays, and the RU cost drops proportionally to the reduction in data read.
Anti-Pattern #4: Ignoring Partition Keys
The mistake: Storing vertices with random or poorly chosen partition key values, then running queries that do not include the partition key property.
g.V().has('person', 'name', 'Alice')
Why it costs you: Cosmos DB partitions data across physical partitions. When your query does not include the partition key, the engine fans the request out to every physical partition in the database. A graph distributed across 10 physical partitions means your single-vertex lookup costs roughly 10x the RUs it should, because Cosmos DB must check each partition.
The fix: Design your partition key strategy around your most common access patterns. Common approaches include using entity type (label) as the partition key, or using a domain/tenant identifier. Then always include the partition key property in your queries.
// Include partition key property in the filter
g.V().has('pk', 'person').has('name', 'Alice')
Remember that Cosmos DB requires the partition key property on all mutations as well:
g.addV('person').property('pk', 'person').property('name', 'Alice')
A well-designed partition strategy is the single biggest factor in Cosmos DB Gremlin performance. It deserves the same upfront design effort you would give to a relational schema.
Anti-Pattern #5: Running Complex Queries Blind
The mistake: Writing long multi-step Gremlin traversals, executing them, and then being surprised when they take 30 seconds and consume 8,000 RUs.
g.V().hasLabel('order')
.has('status', 'completed')
.outE('contains').inV()
.outE('manufactured_by').inV()
.outE('located_in').inV()
.has('country', 'US')
.path()
Why it costs you: You cannot optimize what you cannot measure. This six-step traversal might look reasonable, but step 3 could be producing 50,000 intermediate results because each order contains dozens of line items. The fan-out is invisible unless you inspect intermediate state.
The fix: Break the query down and examine the traverser count at each step. GremlinStudio’s step-by-step debugger lets you execute each step individually and see the intermediate results, so you can pinpoint exactly where traverser counts explode or collapse to zero.
You can also use Cosmos DB’s .executionProfile() step to see the RU cost and execution metrics for each stage of the traversal:
g.V().hasLabel('order').has('status', 'completed')
.outE('contains').inV()
.executionProfile()
In the example above, you might discover that step 3 produces 48,000 traversers. The fix could be as simple as adding a .limit() or restructuring the traversal to filter line items before following edges to manufacturers.
Measuring the Impact
Before and after any optimization, measure. Cosmos DB returns the RU charge in the response headers for every query. GremlinStudio displays this cost automatically in the status bar after each execution, so you can see the RU impact of every change in real time.
A typical optimization pass across these five anti-patterns can reduce total RU consumption by 5-10x for read-heavy workloads. That translates directly to lower provisioned throughput requirements and a smaller Azure bill.
Catch These Before They Hit Your Bill
These anti-patterns are easy to introduce and hard to spot by reading code alone. The right tooling makes the difference: syntax-aware editing to write better queries, a visual debugger to understand traversal behavior, and automatic RU tracking to measure every change.
Download GremlinStudio and start optimizing your Gremlin queries today.