Quickstart

Create a Graph

from pygraphdb.graphdb import Edge, GraphDB, Node
from pygraphdb.kvstores import LMDBStore
from pygraphdb.serializers import PickleSerializer

graph_db = GraphDB(LMDBStore(path="quickstart_lmdb"), PickleSerializer())

alice = Node(node_id="alice", labels=["Person"], properties={"name": "Alice", "age": 30})
bob = Node(node_id="bob", labels=["Person"], properties={"name": "Bob", "age": 25})

graph_db.put_node(alice)
graph_db.put_node(bob)

edge = Edge(
    edge_id="alice-bob",
    source=alice.get_id,
    target=bob.get_id,
    properties={"type": "friend", "weight": 0.9},
)
graph_db.put_edge(edge)

print(graph_db.get_node(b"alice").to_dict())
print(graph_db.get_edge(b"alice-bob").to_dict())

graph_db.close()

Bulk Insert Edges

put_edges_bulk stores many edge records and updates adjacency indexes in one operation.

nodes = [Node(node_id=f"user-{idx}") for idx in range(4)]
for node in nodes:
    graph_db.put_node(node)

edges = [
    Edge(edge_id="u0-u1", source="user-0", target="user-1", properties={"type": "follows"}),
    Edge(edge_id="u0-u2", source="user-0", target="user-2", properties={"type": "follows"}),
    Edge(edge_id="u2-u3", source="user-2", target="user-3", properties={"type": "follows"}),
]
graph_db.put_edges_bulk(edges)

For append-only ingestion where edge IDs are known to be new, skip replacement checks to avoid one existing-edge read per edge:

graph_db.put_edges_bulk(edges, check_existing=False)

Fetch Nodes in Bulk

fetched = graph_db.get_nodes([b"user-0", b"user-1", b"missing"])
for node in fetched:
    print(None if node is None else node.get_id)

Labels and Exact-Match Indexes

Labels are stored natively on Node objects and maintained in a sorted label index. Label lookups avoid full node scans.

graph_db.put_node(Node(node_id="drug-1", labels=["Drug"], properties={"name": "Aspirin", "kind": "drug"}))
graph_db.put_node(Node(node_id="protein-1", labels=["Protein"], properties={"name": "PTGS1", "kind": "protein"}))

drugs = graph_db.nodes_by_label("Drug")
print([node.get_id for node in drugs])

Property indexes are explicit so ingestion does not pay for indexes you do not need. Register an exact-match node or edge property index before using it for performance-sensitive lookup.

graph_db.create_node_property_index("name")
aspirin = graph_db.nodes_by_property("name", "Aspirin")

graph_db.create_edge_property_index("weight")
strong_edges = graph_db.edges_by_property("weight", 0.9)

Relationship types are indexed through the existing edge.properties["type"] convention.

friend_edges = graph_db.edges_by_type("friend")

Columnar Ingestion

ingest_nodes_arrow and ingest_edges_arrow accept Arrow-like columns or plain Python sequences. The first implementation requires caller-provided serialized node_value and edge_value payloads so existing serializer behavior remains unchanged.

nodes = [
    Node(node_id="drug-1", properties={"kind": "drug"}),
    Node(node_id="protein-1", properties={"kind": "protein"}),
]
graph_db.ingest_nodes_arrow(
    [node.get_id for node in nodes],
    [graph_db.serialize_node_value(node) for node in nodes],
)

edge = Edge(
    edge_id="d1-p1",
    source="drug-1",
    target="protein-1",
    properties={"type": "drug-to-protein", "score": 0.9},
)
graph_db.ingest_edges_arrow(
    [edge.get_id],
    [edge.source],
    [edge.target],
    [edge.get_type],
    [graph_db.serialize_edge_value(edge)],
    append_only=True,
)

Polars users can use ingest_nodes_polars and ingest_edges_polars with node_value and edge_value binary columns. With PyRexStore and pyrex-rocksdb>=0.3.0a0, these methods use native RocksDB columnar batch writes when available. Other stores use the Python bulk fallback.

Traverse With BFS

visited = graph_db.bfs(b"user-0", direction="any")
print(visited)

Query With Cypher

GraphDB.query supports an initial read-only Cypher subset for indexed label scans, anchored typed traversal, and typed path sampling.

result = graph_db.query(
    'MATCH (drug {id: "drug-1"})-[:drug-to-protein]->(protein) RETURN drug, protein'
)

for record in result:
    print(record["drug"].get_id, record["protein"].get_id)

Indexed labels are available through Cypher as well:

result = graph_db.query('MATCH (drug:Drug {name: "Aspirin"}) RETURN drug')

Multi-hop typed traversal is supported when each hop is outgoing and has an edge type:

result = graph_db.query(
    'MATCH (drug {id: "drug-1"})-[:drug-to-protein]->(protein)-[:protein-to-disease]->(disease) RETURN drug, protein, disease'
)

See Cypher Queries for the full supported subset and current limitations.

Use Stable IDs

Stable IDs make notebooks, tests, and serialized records easier to inspect.

drug = Node(node_id="drug-1", properties={"kind": "drug", "name": "Aspirin"})
protein = Node(node_id="protein-1", properties={"kind": "protein"})
edge = Edge(
    edge_id="drug-1-protein-1",
    source=drug.get_id,
    target=protein.get_id,
    properties={"type": "drug-to-protein"},
)