Performance and Benchmarks¶

PyGraphDB includes benchmark scripts for ingestion, traversal, sampling, RocksDB tuning, and an optional ArcadeDB comparison. Treat the included local results as directional examples, not universal claims.

Install Benchmark Dependencies¶

python -m pip install -e ".[leveldb,rocksdb,fast-ingest]"

Backend Benchmarks¶

Use benchmarks.py for a quick backend comparison on the same append-only workload.

python benchmarks.py --backend leveldb --nodes 20000 --edges 100000 --batch-size 10000 --append-only
python benchmarks.py --backend rocksdb --nodes 20000 --edges 100000 --batch-size 10000 --append-only

Use scripts/benchmark_matrix.py for larger matrix runs across graph sizes, backends, core counts, and ingestion modes.

uv run python scripts/benchmark_matrix.py \
   --sizes 10000 100000 1000000 \
   --cores 1 2 4 \
   --backends leveldb rocksdb \
   --ingestion-modes object arrow polars \
   --chunk-size 100000 \
   --samples 1000 \
   --sample-size 5 \
   --output-dir benchmark_results/matrix_YYYYMMDD

The matrix writes CSV and JSONL outputs and reopens the database before traversal workloads so traversal does not only measure Python-side object state.

Columnar Ingestion Benchmarks¶

Columnar ingestion accepts already-serialized node and edge payloads. With PyRexStore and pyrex-rocksdb>=0.3.0a0, RocksDB can use PyRex’s native write_columnar_batch path.

See notebooks/05_columnar_ingestion_benchmark.ipynb for a runnable example. A local run on 10,000 nodes and 50,000 edges with batch size 10,000 produced:

Mode	Node rate	Edge insert rate
LevelDB object batch	1,110,296/s	167,463 edges/s
RocksDB Arrow columnar native	1,035,690/s	265,050 edges/s
RocksDB Polars columnar native	929,044/s	250,517 edges/s

Larger append-only workloads with pre-serialized columnar payloads are expected to benefit more than small runs dominated by Python object construction.

RocksDB Tuning and Compaction Benchmarks¶

Use scripts/tune_rocksdb.py for a small RocksDB tuning matrix against a LevelDB baseline.

python scripts/tune_rocksdb.py --nodes 20000 --edges 100000 --batch-size 10000

Use scripts/benchmark_rocksdb_compaction.py for a repeated-overwrite workload that creates compaction pressure.

uv run python scripts/benchmark_rocksdb_compaction.py \
   --configs leveldb rocksdb-p1-bg1-smallbuf rocksdb-p4-bg4-smallbuf rocksdb-p8-bg8-smallbuf rocksdb-p4-bg4-largebuf \
   --keys 250000 \
   --passes 6 \
   --batch-size 5000 \
   --value-size 1024 \
   --output-dir benchmark_results/compaction_pressure_YYYYMMDD

A local run on 2026-06-25 produced:

Configuration	Backend	Initial write rate	Overwrite avg rate	Final SSTs
LevelDB	LevelDB	329,433 writes/s	114,663 writes/s	30
RocksDB p1/bg1 small buffer	RocksDB	694,105 writes/s	262,405 writes/s	14
RocksDB p4/bg4 small buffer	RocksDB	1,008,871 writes/s	749,948 writes/s	47
RocksDB p8/bg8 small buffer	RocksDB	987,248 writes/s	772,436 writes/s	17
RocksDB p4/bg4 large buffer	RocksDB	1,088,475 writes/s	1,132,815 writes/s	7

This workload favors RocksDB because it creates overlapping sorted runs that can benefit from background compaction parallelism. It should not be generalized to all graph workloads.

ArcadeDB Comparison Benchmarks¶

Use scripts/benchmark_arcadedb_vs_pygraphdb.py to compare PyGraphDB with the optional embedded ArcadeDB package. ArcadeDB is not required for normal PyGraphDB use.

Run a PyGraphDB-only smoke test:

uv run python scripts/benchmark_arcadedb_vs_pygraphdb.py \
   --engines pygraphdb \
   --nodes 10000 \
   --edges 50000 \
   --iterations 25 \
   --output-dir benchmark_results/arcadedb_vs_pygraphdb_YYYYMMDD

Include embedded ArcadeDB with uv --with:

uv run --with arcadedb-embedded python scripts/benchmark_arcadedb_vs_pygraphdb.py \
   --engines pygraphdb arcadedb \
   --workloads columnar_ingest star_traversal bfs_depth typed_path rocksdb_compaction \
   --nodes 100000 \
   --edges 500000 \
   --batch-size 100000 \
   --iterations 100 \
   --repetitions 10 \
   --output-dir benchmark_results/arcadedb_vs_pygraphdb_YYYYMMDD

The script writes raw rows and summary files grouped by engine and workload. If arcadedb-embedded is not installed, ArcadeDB rows are marked skipped and PyGraphDB rows still run.

Representative small local results from 2026-06-25:

Workload	PyGraphDB/RocksDB	ArcadeDB embedded	Relative result
columnar_ingest	0.0358 s	0.0506 s	PyGraphDB 1.41x faster
star_traversal	0.0383 s	0.0333 s	ArcadeDB 1.15x faster
bfs_depth	0.0303 s	0.0366 s	PyGraphDB 1.21x faster
typed_path	0.0293 s	0.0404 s	PyGraphDB 1.38x faster
rocksdb_compaction	0.0022 s	Not applicable	PyGraphDB only

Interpret these results by workload. RocksDB/PyGraphDB tends to show strength on append-only columnar ingestion and compaction-sensitive raw writes. ArcadeDB can be strongest when queries start from an indexed vertex and stay on native adjacency chains.

Benchmark Caveats¶

Local benchmark results depend on graph shape, storage device, CPU settings, Python version, backend versions, and warm-up behavior.
Small graphs can be dominated by Python object construction, serialization, and key construction rather than backend I/O.
Prefer raw CSV/JSONL outputs for comparisons and keep benchmark parameters with published results.