Performance and RocksDB Tuning¶
PyGraphDB includes a benchmark script for ingestion and typed sampling, plus a RocksDB tuning campaign script for the optional PyRex backend.
Install Optional Dependencies¶
python -m pip install -e ".[leveldb,rocksdb,fast-ingest]"
Run Backend Benchmarks¶
Compare the standard backends on the same append-only workload:
python benchmarks.py --backend leveldb --nodes 20000 --edges 100000 --batch-size 10000 --append-only
python benchmarks.py --backend rocksdb --nodes 20000 --edges 100000 --batch-size 10000 --append-only
Tune RocksDB¶
Run a small RocksDB tuning matrix against the LevelDB baseline:
python scripts/tune_rocksdb.py --nodes 20000 --edges 100000 --batch-size 10000
Benchmarks¶
Graph Ingestion, BFS, and Sampling Matrix¶
Use scripts/benchmark_matrix.py to compare LevelDB and RocksDB across graph
sizes, core counts, and ingestion paths:
uv run python scripts/benchmark_matrix.py \
--sizes 10000 100000 1000000 \
--cores 1 2 4 \
--backends leveldb rocksdb \
--ingestion-modes object arrow polars \
--rocksdb-configs parallel-buffer64mb-bloom10 \
--chunk-size 100000 \
--samples 1000 \
--sample-size 5 \
--bfs-limit 100000 \
--output-dir benchmark_results/matrix_YYYYMMDD
The matrix writes matrix_results.csv and matrix_results.jsonl. It closes
and reopens the database before traversal workloads so BFS and sampling do not
measure only Python-side object state. BFS uses typed adjacency records so the
same traversal can run after object, Arrow, and Polars ingestion.
The ingestion modes do not have identical semantics:
objectUses
put_nodesandput_edges_bulk. It writes edge records, typed adjacency, relationship indexes, and legacy adjacency blobs.arrowandpolarsUse append-only columnar ingestion. They write edge records, typed adjacency, and relationship indexes, but intentionally skip legacy adjacency blob rewrites. LevelDB uses the generic append-only fallback; only RocksDB can use PyRex’s native
write_columnar_batchfast path when available.
Compaction-Pressure Benchmark¶
Use scripts/benchmark_rocksdb_compaction.py for a workload designed to show
where RocksDB’s background compaction parallelism helps. The benchmark repeatedly
overwrites the same key set using a permuted key order. This creates overlapping
SST ranges and sustained LSM compaction pressure.
uv run python scripts/benchmark_rocksdb_compaction.py \
--configs leveldb rocksdb-p1-bg1-smallbuf rocksdb-p4-bg4-smallbuf rocksdb-p8-bg8-smallbuf rocksdb-p4-bg4-largebuf \
--keys 250000 \
--passes 6 \
--batch-size 5000 \
--value-size 1024 \
--write-buffer-size 2097152 \
--output-dir benchmark_results/compaction_pressure_YYYYMMDD
The script writes compaction_pressure_results.csv and
compaction_pressure_results.jsonl. PyRex does not currently expose RocksDB
properties such as compaction-pending, level sizes, or statistics, so the script
records SST/log file counts and sizes as indirect evidence of flush and
compaction behavior.
A local run on 2026-06-25 produced:
Configuration |
Backend |
Initial write rate |
Overwrite avg rate |
Final SSTs |
|---|---|---|---|---|
LevelDB |
LevelDB |
329,433 writes/s |
114,663 writes/s |
30 |
RocksDB p1/bg1 small buffer |
RocksDB |
694,105 writes/s |
262,405 writes/s |
14 |
RocksDB p4/bg4 small buffer |
RocksDB |
1,008,871 writes/s |
749,948 writes/s |
47 |
RocksDB p8/bg8 small buffer |
RocksDB |
987,248 writes/s |
772,436 writes/s |
17 |
RocksDB p4/bg4 large buffer |
RocksDB |
1,088,475 writes/s |
1,132,815 writes/s |
7 |
This is the benchmark case where the RocksDB-backed database performs clearly better. The workload creates many overlapping sorted runs; RocksDB can flush and compact them with multiple background jobs, while LevelDB has a much narrower background-compaction model. Larger write buffers also help RocksDB by reducing flush frequency and creating fewer, larger compaction inputs.
This result should not be generalized to every graph workload. Append-only graph ingestion with Python object construction, serialization, key construction, and index maintenance can be dominated by Python-side overhead. In those cases, RocksDB’s compaction parallelism is not necessarily the bottleneck, so LevelDB can appear competitive or faster.
ArcadeDB Comparison Benchmark¶
Use scripts/benchmark_arcadedb_vs_pygraphdb.py to compare the
RocksDB-backed pygraphdb path with ArcadeDB’s native property graph engine. The
ArcadeDB side uses the arcadedb-embedded Python package, which runs ArcadeDB
in-process through JPype/JVM bindings and does not require a separate server.
The suite is intentionally mixed: some workloads favor RocksDB/PyRex write
paths, while others favor ArcadeDB’s index-free adjacency traversal.
Run a pygraphdb-only smoke test:
uv run python scripts/benchmark_arcadedb_vs_pygraphdb.py \
--engines pygraphdb \
--nodes 10000 \
--edges 50000 \
--iterations 25 \
--output-dir benchmark_results/arcadedb_vs_pygraphdb_YYYYMMDD
To include embedded ArcadeDB without adding it to the project dependencies, use
uv --with:
uv run --with arcadedb-embedded python scripts/benchmark_arcadedb_vs_pygraphdb.py \
--engines pygraphdb arcadedb \
--workloads columnar_ingest star_traversal bfs_depth typed_path rocksdb_compaction \
--nodes 100000 \
--edges 500000 \
--batch-size 100000 \
--iterations 100 \
--repetitions 10 \
--arcadedb-heap-size 4g \
--arcadedb-parallel 4 \
--output-dir benchmark_results/arcadedb_vs_pygraphdb_YYYYMMDD
The script writes arcadedb_vs_pygraphdb_results.csv and
arcadedb_vs_pygraphdb_results.jsonl with raw per-repetition rows, plus
arcadedb_vs_pygraphdb_summary.csv and
arcadedb_vs_pygraphdb_summary.jsonl with mean and sample standard deviation
by engine and workload. If arcadedb-embedded is not installed, ArcadeDB rows
are emitted as status=skipped and pygraphdb rows still run.
The workloads are:
columnar_ingestCompares pygraphdb’s serialized Arrow ingestion and RocksDB native
write_columnar_batchfast path, when available, with ArcadeDB’s embeddedGraphBatchimporter. Both load attributed vertices and typed graph edges.star_traversalBuilds one high-degree hub and repeatedly expands outgoing
RelAneighbors. This is a case where ArcadeDB’s vertex-local edge segments and index-free adjacency can be competitive or faster than prefix scans over a KV store.bfs_depthRuns bounded-depth traversal from
n0. pygraphdb uses typed adjacency records; ArcadeDB uses SQLMATCHover outgoingRelAedges.typed_pathRepeatedly follows a typed
RelAthenRelBpath. This exercises property-graph typed edge expansion rather than raw key/value writes.rocksdb_compactionRuns a permuted repeated-overwrite workload directly against the pygraphdb RocksDB store. ArcadeDB rows are marked not applicable because this workload targets raw LSM compaction behavior rather than a property-graph operation.
Interpret results by workload instead of expecting one global winner. RocksDB is expected to show its strength on compaction-sensitive overwrites and append-only columnar ingestion. ArcadeDB is expected to be strongest when the query can start from an indexed vertex and then stay on native adjacency chains for hub, BFS, or typed-path expansion.
Columnar Ingestion Benchmark¶
PyGraphDB 0.2.0a0 adds serialized Arrow/Polars columnar ingestion for
attributed nodes and typed edges. With PyRexStore and
pyrex-rocksdb>=0.3.0a0, the store uses PyRex’s native
write_columnar_batch API when available.
The new APIs require caller-provided serialized node_value and
edge_value payload columns. This keeps get_node and get_edge
compatible with the configured serializer and avoids a storage-format migration.
Edge columnar ingestion is append-only in this first release and writes edge
records plus typed adjacency records; it intentionally skips legacy adjacency
blob rewrites.
See notebooks/05_columnar_ingestion_benchmark.ipynb for a runnable
comparison. A local run on 10,000 nodes and 50,000 edges with batch size
10,000 produced:
Mode |
Node rate |
Edge insert rate |
|---|---|---|
LevelDB object batch |
1,110,296/s |
167,463 edges/s |
RocksDB Arrow columnar native |
1,035,690/s |
265,050 edges/s |
RocksDB Polars columnar native |
929,044/s |
250,517 edges/s |
The edge-ingestion speedup was roughly 1.5x to 1.6x over the LevelDB
object-batch baseline for this small benchmark. Larger append-only workloads
with already-serialized columnar payloads are expected to benefit more because
the native path avoids per-row backend batch calls.
Initial Local Findings¶
An initial campaign on 10,000 nodes and 50,000 edges found that the
current RocksDB backend is functional but does not yet beat LevelDB on append-only
edge ingestion:
Configuration |
Edge insert rate |
Neighbor sample rate |
|---|---|---|
LevelDB |
168,217 edges/s |
159,546 seeds/s |
RocksDB default |
121,789 edges/s |
178,704 seeds/s |
RocksDB Bloom filter |
128,184 edges/s |
183,307 seeds/s |
RocksDB parallelism + buffer |
125,531 edges/s |
187,759 seeds/s |
RocksDB no WAL |
127,135 edges/s |
183,348 seeds/s |
The first profile of the best RocksDB run showed substantial Python-side time in serialization, key construction, string encoding, and adjacency blob merging. That suggests the next major RocksDB improvement is not just more RocksDB tuning; it is reducing Python overhead and moving toward prefix-key adjacency records instead of rewriting serialized adjacency blobs.
The script writes:
benchmark_results/rocksdb_tuning_results.jsonFull command output and parsed metrics.
benchmark_results/rocksdb_tuning_results.csvA compact table for comparing rates across configurations.
To collect a cProfile file for the best RocksDB configuration:
python scripts/tune_rocksdb.py --profile
RocksDB Knobs Exposed by PyRexStore¶
parallelismCalls RocksDB’s
increase_parallelismto raise background thread counts.max_background_jobsSets the maximum number of RocksDB background jobs.
write_buffer_sizeSets write buffer sizes for the database and column-family options.
bloom_bits_per_keyEnables a block-based Bloom filter with the requested bits per key.
disable_walDisables the RocksDB write-ahead log. This can improve ingestion throughput in some workloads, but it weakens durability and should be treated as a bulk loading mode, not a safe default.