Typed Traversal and Sampling¶

Typed traversal uses edge.properties["type"]. When an edge has a type, PyGraphDB stores typed adjacency records for efficient directional scans.

Create a Typed Graph¶

from pygraphdb.graphdb import Edge, GraphDB, Node
from pygraphdb.kvstores import LMDBStore
from pygraphdb.serializers import PickleSerializer

graph_db = GraphDB(LMDBStore(path="typed_graph_lmdb"), PickleSerializer())

for node_id, kind in [
    ("drug-1", "drug"),
    ("protein-1", "protein"),
    ("protein-2", "protein"),
    ("disease-1", "disease"),
]:
    graph_db.put_node(Node(node_id=node_id, properties={"kind": kind}))

graph_db.put_edges_bulk([
    Edge(edge_id="d1-p1", source="drug-1", target="protein-1", properties={"type": "drug-to-protein"}),
    Edge(edge_id="d1-p2", source="drug-1", target="protein-2", properties={"type": "drug-to-protein"}),
    Edge(edge_id="p1-dis1", source="protein-1", target="disease-1", properties={"type": "protein-to-disease"}),
])

Query Typed Neighbors¶

proteins = graph_db.neighbors_by_edge_type(
    "drug-1",
    "drug-to-protein",
    direction="out",
)
print(proteins)

Query Typed Edges¶

edge_ids = graph_db.edges_by_edge_type(
    "drug-1",
    "drug-to-protein",
    direction="out",
)

Sample Neighbors¶

sample_neighbors uses reservoir sampling, so memory is bounded by sample_size instead of by node degree.

import random

sample = graph_db.sample_neighbors(
    "drug-1",
    "drug-to-protein",
    direction="out",
    sample_size=1,
    rng=random.Random(7),
)

Object-Based Sampling Patterns¶

Use SamplingHop and SamplingPattern for validated, documented sampling configuration objects.

import random

from pygraphdb.sampling import SamplingHop, SamplingPattern

pattern = SamplingPattern([
    SamplingHop("drug-to-protein", direction="out", sample_size=2),
    SamplingHop("protein-to-disease", direction="out", sample_size=1),
])

paths = graph_db.sample_typed_paths(
    seed_ids=["drug-1"],
    pattern=pattern,
    rng=random.Random(3),
)

Dictionary-Based Sampling Patterns¶

Existing dictionary configurations are still supported.

pattern = [
    {"edge_type": "drug-to-protein", "direction": "out", "sample_size": 2},
    {"edge_type": "protein-to-disease", "direction": "out", "sample_size": 1},
]

paths = graph_db.sample_typed_paths(["drug-1"], pattern)

Cypher Sampling Procedure¶

The Cypher API exposes multi-hop typed path sampling through a PyGraphDB-specific procedure call. This delegates to GraphDB.sample_typed_paths and returns one path value per sampled path.

result = graph_db.query(
    'CALL pg.sample_typed_paths(["drug-1"], '
    '[{"edge_type": "drug-to-protein", "direction": "out", "sample_size": 2}, '
    '{"edge_type": "protein-to-disease", "direction": "out", "sample_size": 1}]) '
    'YIELD path RETURN path'
)

for record in result:
    print(record["path"])

Sample a Materialized Subgraph¶

subgraph = graph_db.sample_typed_subgraph(
    seed_ids=["drug-1"],
    pattern=pattern,
)

print(subgraph["nodes"].keys())
print(subgraph["edges"].keys())
print(subgraph["paths"])

Rebuild Typed Adjacency¶

If edge records already exist but typed adjacency indexes are missing, rebuild them from stored edges.

rebuilt = graph_db.rebuild_typed_adjacency()
print(f"rebuilt {rebuilt} typed adjacency records")