Typed Traversal and Sampling¶
Typed traversal uses edge.properties["type"]. When an edge has a type,
PyGraphDB stores typed adjacency records for efficient directional scans.
Create a Typed Graph¶
from pygraphdb.graphdb import Edge, GraphDB, Node
from pygraphdb.kvstores import LMDBStore
from pygraphdb.serializers import PickleSerializer
graph_db = GraphDB(LMDBStore(path="typed_graph_lmdb"), PickleSerializer())
for node_id, kind in [
("drug-1", "drug"),
("protein-1", "protein"),
("protein-2", "protein"),
("disease-1", "disease"),
]:
graph_db.put_node(Node(node_id=node_id, properties={"kind": kind}))
graph_db.put_edges_bulk([
Edge(edge_id="d1-p1", source="drug-1", target="protein-1", properties={"type": "drug-to-protein"}),
Edge(edge_id="d1-p2", source="drug-1", target="protein-2", properties={"type": "drug-to-protein"}),
Edge(edge_id="p1-dis1", source="protein-1", target="disease-1", properties={"type": "protein-to-disease"}),
])
Query Typed Neighbors¶
proteins = graph_db.neighbors_by_edge_type(
"drug-1",
"drug-to-protein",
direction="out",
)
print(proteins)
Query Typed Edges¶
edge_ids = graph_db.edges_by_edge_type(
"drug-1",
"drug-to-protein",
direction="out",
)
Sample Neighbors¶
sample_neighbors uses reservoir sampling, so memory is bounded by
sample_size instead of by node degree.
import random
sample = graph_db.sample_neighbors(
"drug-1",
"drug-to-protein",
direction="out",
sample_size=1,
rng=random.Random(7),
)
Object-Based Sampling Patterns¶
Use SamplingHop and SamplingPattern for validated, documented sampling
configuration objects.
import random
from pygraphdb.sampling import SamplingHop, SamplingPattern
pattern = SamplingPattern([
SamplingHop("drug-to-protein", direction="out", sample_size=2),
SamplingHop("protein-to-disease", direction="out", sample_size=1),
])
paths = graph_db.sample_typed_paths(
seed_ids=["drug-1"],
pattern=pattern,
rng=random.Random(3),
)
Dictionary-Based Sampling Patterns¶
Existing dictionary configurations are still supported.
pattern = [
{"edge_type": "drug-to-protein", "direction": "out", "sample_size": 2},
{"edge_type": "protein-to-disease", "direction": "out", "sample_size": 1},
]
paths = graph_db.sample_typed_paths(["drug-1"], pattern)
Cypher Sampling Procedure¶
The Cypher API exposes multi-hop typed path sampling through a PyGraphDB-specific
procedure call. This delegates to GraphDB.sample_typed_paths and returns one
path value per sampled path.
result = graph_db.query(
'CALL pg.sample_typed_paths(["drug-1"], '
'[{"edge_type": "drug-to-protein", "direction": "out", "sample_size": 2}, '
'{"edge_type": "protein-to-disease", "direction": "out", "sample_size": 1}]) '
'YIELD path RETURN path'
)
for record in result:
print(record["path"])
Sample a Materialized Subgraph¶
subgraph = graph_db.sample_typed_subgraph(
seed_ids=["drug-1"],
pattern=pattern,
)
print(subgraph["nodes"].keys())
print(subgraph["edges"].keys())
print(subgraph["paths"])
Rebuild Typed Adjacency¶
If edge records already exist but typed adjacency indexes are missing, rebuild them from stored edges.
rebuilt = graph_db.rebuild_typed_adjacency()
print(f"rebuilt {rebuilt} typed adjacency records")