API Reference¶

Graph Models and Database¶

class pygraphdb.graphdb.Edge(edge_id=None, source=None, target=None, properties=None)[source]¶

Bases: object

Directed graph edge with source, target, and properties.

Parameters:

edge_id – Optional stable edge identifier. A UUID is generated when omitted.
source – Source node ID or Node instance.
target – Target node ID or Node instance.
properties – Optional edge attributes. Typed traversal reads properties["type"].

Examples

>>> Edge(edge_id="d1-p1", source="drug-1", target="protein-1").source
'drug-1'

__init__(edge_id=None, source=None, target=None, properties=None)[source]¶: If no edge_id is provided, generate a UUID.

classmethod from_dict(data)[source]¶

Factory from dictionary.

Parameters:: data (dict)

property get_id¶: Unique identifier for this edge.

property get_id_bytes¶

Return the edge ID encoded as UTF-8 bytes.

Examples

>>> Edge(edge_id="d1-p1").get_id_bytes
b'd1-p1'

property get_type¶

Return the typed traversal edge type.

Examples

>>> Edge(properties={"type": "drug-to-protein"}).get_type
'drug-to-protein'

to_dict()[source]¶: Convert to a dictionary for serialization.

class pygraphdb.graphdb.GraphDB(store, serializer, indexed_node_properties=None, indexed_edge_properties=None)[source]¶

Bases: object

High-level interface to manage Node/Edge storing, retrieval, and indexing.

Parameters:

store (KVStore)
serializer (Serializer)
indexed_node_properties (Optional[list[str]])
indexed_edge_properties (Optional[list[str]])

__init__(store, serializer, indexed_node_properties=None, indexed_edge_properties=None)[source]¶

Initialize a graph database wrapper.

Parameters:

store (KVStore) – KVStore instance such as LMDBStore, LevelDBStore, or PyRexStore.
serializer (Serializer) – Serializer for node, edge, and adjacency payloads.
indexed_node_properties (Optional[list[str]]) – Optional exact-match node property indexes to maintain for future writes.
indexed_edge_properties (Optional[list[str]]) – Optional exact-match edge property indexes to maintain for future writes.

Examples

>>> from pygraphdb.kvstores import LMDBStore
>>> from pygraphdb.serializers import PickleSerializer
>>> graph = GraphDB(LMDBStore(path="/tmp/example"), PickleSerializer(), indexed_node_properties=["name"])

bfs(start_node_id, direction='any', edge_key_serializer=<function GraphDB.<lambda>>, node_key_serializer=<function GraphDB.<lambda>>)[source]¶

Returns a list of node_ids in BFS order starting from start_node_id. Demonstrates how adjacency is used for graph traversal.

Parameters:: start_node_id (bytes)
Return type:: list[str]

close()[source]¶

Close the underlying key-value store.

Examples

>>> graph_db.close()

count_edges_by_property(property_name, value)[source]¶

Return the number of edges currently indexed for an exact property value.

Parameters:: property_name (str)
Return type:: int

count_edges_by_property_range(property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Return the number of edges indexed in a scalar property range.

Parameters:

property_name (str)
include_start (bool)
include_end (bool)

Return type:

int

count_edges_by_type(edge_type)[source]¶

Return the number of edges currently indexed for a relationship type.

Parameters:: edge_type (str)
Return type:: int

count_edges_by_type_property(edge_type, property_name, value)[source]¶

Return the number of edges indexed for a type and exact property value.

Parameters:

edge_type (str)
property_name (str)

Return type:

int

count_edges_by_type_property_range(edge_type, property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Return the number of edges indexed for a type/property range.

Parameters:

edge_type (str)
property_name (str)
include_start (bool)
include_end (bool)

Return type:

int

count_nodes_by_label(label)[source]¶

Return the number of nodes currently indexed for a label.

Parameters:: label (str)
Return type:: int

count_nodes_by_label_property(label, property_name, value)[source]¶

Return the number of nodes indexed for a label and exact property value.

Parameters:

label (str)
property_name (str)

Return type:

int

count_nodes_by_label_property_range(label, property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Return the number of nodes indexed for a label/property range.

Parameters:

label (str)
property_name (str)
include_start (bool)
include_end (bool)

Return type:

int

count_nodes_by_property(property_name, value)[source]¶

Return the number of nodes currently indexed for an exact property value.

Parameters:: property_name (str)
Return type:: int

count_nodes_by_property_range(property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Return the number of nodes indexed in a scalar property range.

Parameters:

property_name (str)
include_start (bool)
include_end (bool)

Return type:

int

create_edge_property_index(property_name)[source]¶

Parameters:: property_name (str) – Edge property to index for exact-match lookup.
Returns:: Number of existing edges added to the index.

Examples

>>> graph_db.create_edge_property_index("score")
7

create_node_property_index(property_name)[source]¶

Parameters:: property_name (str) – Node property to index for exact-match lookup.
Returns:: Number of existing nodes added to the index.

Examples

>>> graph_db.create_node_property_index("kind")
10

delete_edge(edge_id, edge_key_serializer=<function GraphDB.<lambda>>)[source]¶

Removes the edge from the edge store, and from adjacency of both source and target nodes. If either node doesn’t exist, we skip gracefully.

Parameters:: edge_id (str)

delete_node(node_id)[source]¶

Delete a node by byte key.

Parameters:: node_id – Node ID bytes.

Examples

>>> graph_db.delete_node(b"drug-1")

edge_key_to_bytes(edge_key)[source]¶

Normalize an edge key to bytes.

Parameters:: edge_key – String or bytes edge key.
Returns:: UTF-8 encoded bytes.

Examples

>>> GraphDB.edge_key_to_bytes(None, "d1-p1")
b'd1-p1'

edge_type(edge)[source]¶

Return the type used by typed traversal for an edge.

Parameters:: edge (Edge) – Edge to inspect.
Returns:: Edge type string, or None.

Examples

>>> GraphDB.edge_type(None, Edge(properties={"type": "drug-to-protein"}))
'drug-to-protein'

edges_by_edge_type(node_id, edge_type, direction='out')[source]¶

Return edge IDs connected by a specific edge type.

Parameters:

node_id – Node ID as string or bytes.
edge_type (str) – Edge type to traverse.
direction (str) – "out", "in", or "any".

Returns:

List of edge ID bytes.

Examples

>>> graph_db.edges_by_edge_type("drug-1", "drug-to-protein")

edges_by_property(property_name, value)[source]¶

Return edges using an exact-match property index.

Parameters:

property_name (str) – Indexed edge property name.
value – Exact property value to match.

Returns:

List of decoded Edge objects.

Examples

>>> graph_db.edges_by_property("score", 1)

edges_by_property_range(property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Return edges using a scalar property range index.

Parameters:

property_name (str)
include_start (bool)
include_end (bool)

edges_by_type(edge_type)[source]¶

Return edges using the relationship type catalog.

Parameters:: edge_type (str) – Relationship type stored in edge.properties["type"].
Returns:: List of decoded Edge objects.

Examples

>>> graph_db.edges_by_type("drug-to-protein")

edges_by_type_property(edge_type, property_name, value)[source]¶

Return edges using the composite type/property exact-match index.

Parameters:

edge_type (str)
property_name (str)

edges_by_type_property_range(edge_type, property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Return edges using a composite type/property range index.

Parameters:

edge_type (str)
property_name (str)
include_start (bool)
include_end (bool)

get_adjacency_list(node_id, direction='forward', return_raw=False)[source]¶

Returns the list of edge IDs connected to node_id. If none found, returns an empty list.

Parameters:

node_id (bytes) – a string representing the node_id
direction – ‘forward’, ‘backward’ or ‘any’ -> controls whether the source, target, or un-directed adjacency of the node will be returned.
return_raw – if this flag is true it will return the data as they are stored (e.g., a dictionary of ‘source’ and ‘target’ lists. )

Return type:

list[str]

get_edge(edge_id)[source]¶

Return an edge by byte key.

Parameters:: edge_id – Edge ID bytes as stored in the backend.
Returns:: The decoded edge, or None when absent.
Return type:: Edge

Examples

>>> graph_db.get_edge(b"d1-p1")

get_node(node_id)[source]¶

Return a node by byte key.

Parameters:: node_id – Node ID bytes as stored in the backend.
Returns:: The decoded node, or None when absent.
Return type:: Node

Examples

>>> graph_db.get_node(b"drug-1")

get_node_keys_generator(num_nodes=None, key_offset=None)[source]¶

Yield node keys from the backing store.

Parameters:

num_nodes – Optional maximum number of keys to yield.
key_offset – Optional starting key.

Returns:

Generator of node key bytes.

Examples

>>> list(graph_db.get_node_keys_generator(num_nodes=10))

get_nodes(node_ids)[source]¶

Use store.get_nodes_bulk(…) and deserialize each one. Return a list of Node (in the same order as node_ids, or possibly just all found).

Parameters:: node_ids (list[str])
Return type:: list[Node]

get_typed_adjacency(node_id, edge_type, direction='out')[source]¶

Return typed adjacency records with clean direction semantics.

out means source -> target, in means target -> source, and any returns the union of both directions.

Parameters:

edge_type (str)
direction (str)

index_statistics()[source]¶

Return persisted index definitions visible to the query planner.

Return type:: dict[str, object]

ingest_edges_arrow(edge_ids, sources, targets, edge_types, edge_values, *, append_only=True, native=True, chunk_size=100000)[source]¶

Ingest typed edges from Arrow-like columns.

edge_values is required and must contain serialized edge payloads compatible with the current GraphDB serializer. This ingestion path writes edge records and typed adjacency records only; it intentionally skips legacy adjacency blobs for append-friendly bulk loading.

Parameters:

edge_ids – Arrow-like or Python column of edge IDs.
sources – Arrow-like or Python column of source node IDs.
targets – Arrow-like or Python column of target node IDs.
edge_types – Arrow-like or Python column of typed traversal labels.
edge_values – Arrow-like or Python column of serialized edge bytes.
append_only (bool) – Columnar ingestion currently requires True.
native (bool) – Use native backend columnar ingestion when available.
chunk_size (int) – Maximum rows per backend write.

Returns:

Number of ingested edges.

ingest_edges_polars(df, *, edge_id='edge_id', source='source', target='target', edge_type='edge_type', edge_value='edge_value', append_only=True, native=True, chunk_size=100000)[source]¶

Ingest typed edges from a Polars DataFrame.

The edge_value column is required and must contain serialized edge payload bytes compatible with the current GraphDB serializer.

Parameters:

edge_id (str)
source (str)
target (str)
edge_type (str)
edge_value (str)
append_only (bool)
native (bool)
chunk_size (int)

ingest_nodes_arrow(node_ids, node_values, *, native=True, chunk_size=100000)[source]¶

Ingest attributed nodes from Arrow-like columns.

node_values is required and must contain serialized node payloads compatible with the current GraphDB serializer.

Parameters:

node_ids – Arrow-like or Python column of node IDs.
node_values – Arrow-like or Python column of serialized node bytes.
native (bool) – Use native backend columnar ingestion when available.
chunk_size (int) – Maximum rows per backend write.

Returns:

Number of ingested nodes.

ingest_nodes_polars(df, *, node_id='node_id', node_value='node_value', native=True, chunk_size=100000)[source]¶

Ingest attributed nodes from a Polars DataFrame.

The node_value column is required and must contain serialized node payload bytes compatible with the current GraphDB serializer.

Parameters:

node_id (str)
node_value (str)
native (bool)
chunk_size (int)

iter_edge_ids_by_property(property_name, value)[source]¶

Yield edge IDs from an exact-match edge property index.

Parameters:

property_name (str) – Indexed edge property name.
value – Exact property value to match.

Yields:

Edge ID bytes matching the property value.

Examples

>>> list(graph_db.iter_edge_ids_by_property("score", 1))
[b'e1']

iter_edge_ids_by_property_range(property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Yield edge IDs from a scalar property range index.

Parameters:

property_name (str)
include_start (bool)
include_end (bool)

iter_edge_ids_by_type(edge_type)[source]¶

Yield edge IDs from the relationship type catalog.

Parameters:: edge_type (str) – Relationship type stored in edge.properties["type"].
Yields:: Edge ID bytes with the requested relationship type.

Examples

>>> list(graph_db.iter_edge_ids_by_type("drug-to-protein"))
[b'd1-p1']

iter_edge_ids_by_type_property(edge_type, property_name, value)[source]¶

Yield edge IDs from the composite type/property exact-match index.

Parameters:

edge_type (str)
property_name (str)

iter_edge_ids_by_type_property_range(edge_type, property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Yield edge IDs from a composite type/property range index.

Parameters:

edge_type (str)
property_name (str)
include_start (bool)
include_end (bool)

iter_node_ids_by_label(label)[source]¶

Yield node IDs from the label index.

Parameters:: label (str) – Node label to scan.
Yields:: Node ID bytes with the requested label.

Examples

>>> list(graph_db.iter_node_ids_by_label("Drug"))
[b'drug-1']

iter_node_ids_by_label_property(label, property_name, value)[source]¶

Yield node IDs from the composite label/property exact-match index.

Parameters:

label (str)
property_name (str)

iter_node_ids_by_label_property_range(label, property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Yield node IDs from a composite label/property range index.

Parameters:

label (str)
property_name (str)
include_start (bool)
include_end (bool)

iter_node_ids_by_property(property_name, value)[source]¶

Yield node IDs from an exact-match property index.

Parameters:

property_name (str) – Indexed node property name.
value – Exact property value to match.

Yields:

Node ID bytes matching the property value.

Examples

>>> list(graph_db.iter_node_ids_by_property("kind", "drug"))
[b'drug-1']

iter_node_ids_by_property_range(property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Yield node IDs from a scalar property range index.

Parameters:

property_name (str)
include_start (bool)
include_end (bool)

iter_typed_adjacency(node_id, edge_type, direction='out')[source]¶

Yield typed adjacency records with clean direction semantics.

Parameters:

node_id – Node ID as string or bytes.
edge_type (str) – Edge type to traverse.
direction (str) – "out", "in", or "any".

Yields:

Typed adjacency records containing edge, neighbor, source, target, edge type, and concrete direction fields.

Examples

>>> graph_db.iter_typed_adjacency("drug-1", "drug-to-protein")

key_to_string(key)[source]¶

Normalize a key to a string.

Parameters:: key – String or UTF-8 bytes key.
Returns:: String key.

Examples

>>> GraphDB.key_to_string(None, b"drug-1")
'drug-1'

neighbors_by_edge_type(node_id, edge_type, direction='out')[source]¶

Return neighbor IDs connected by a specific edge type.

Parameters:

node_id – Node ID as string or bytes.
edge_type (str) – Edge type to traverse.
direction (str) – "out", "in", or "any".

Returns:

List of neighbor ID bytes.

Examples

>>> graph_db.neighbors_by_edge_type("drug-1", "drug-to-protein")

node_key_to_bytes(node_key)[source]¶

Normalize a node key to bytes.

Parameters:: node_key – String or bytes node key.
Returns:: UTF-8 encoded bytes.

Examples

>>> GraphDB.node_key_to_bytes(None, "drug-1")
b'drug-1'

nodes_by_label(label)[source]¶

Return nodes with a label using the label index.

Parameters:: label (str) – Node label to scan.
Returns:: List of decoded Node objects.

Examples

>>> graph_db.nodes_by_label("Drug")

nodes_by_label_property(label, property_name, value)[source]¶

Return nodes using the composite label/property exact-match index.

Parameters:

label (str)
property_name (str)

nodes_by_label_property_range(label, property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Return nodes using a composite label/property range index.

Parameters:

label (str)
property_name (str)
include_start (bool)
include_end (bool)

nodes_by_property(property_name, value)[source]¶

Return nodes using an exact-match property index.

Parameters:

property_name (str) – Indexed node property name.
value – Exact property value to match.

Returns:

List of decoded Node objects.

Examples

>>> graph_db.nodes_by_property("kind", "drug")

nodes_by_property_range(property_name, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Return nodes using a scalar property range index.

Parameters:

property_name (str)
include_start (bool)
include_end (bool)

put_adjacency_list(node_id, edges_list)[source]¶

Stores the adjacency list for node_id.

Parameters:

node_id (str)
edges_list (list[str])

put_edge(edge, update_adjacency=True)[source]¶

Store an edge and update adjacency indexes.

Parameters:

edge (Edge) – Edge to serialize and write.
update_adjacency – Whether to update the legacy untyped adjacency list.

Examples

>>> graph_db.put_edge(Edge(source="drug-1", target="protein-1"))

put_edges_bulk(edges, check_existing=True)[source]¶

Store multiple edges and update adjacency indexes in bulk.

Parameters:

edges (List[Edge]) – Edges to write.
check_existing (bool) – When true, read existing edges first and remove stale typed adjacency and sorted index records before replacement. Set to false for append-only ingestion with new edge IDs.

Examples

>>> graph_db.put_edges_bulk([Edge(source="drug-1", target="protein-1")], check_existing=False)

put_node(node)[source]¶

Store a node.

Parameters:: node (Node) – Node to serialize and write.

Examples

>>> graph_db.put_node(Node(node_id="drug-1"))

put_nodes(nodes)[source]¶

Store multiple nodes and maintain label/property indexes.

Parameters:: nodes (list[Node]) – Nodes to serialize and write.

Examples

>>> graph_db.put_nodes([Node(node_id="drug-1", labels=["Drug"])])

query(cypher, parameters=None)[source]¶

Execute a supported read-only Cypher query.

Parameters:

cypher (str) – Query text in the supported PyGraphDB Cypher subset.
parameters (dict[str, object] | None) – Optional Cypher parameter values keyed without the leading $.

Returns:

pygraphdb.cypher.QueryResult containing projected records.

Examples

>>> graph_db.query('MATCH (n:Drug) RETURN n')
>>> graph_db.query('MATCH (a {id: "drug-1"})-[:drug-to-protein]->(b) RETURN a, b')

range_query_nodes(property_name, start_val, end_val)[source]¶

Example stub: You might rely on the underlying store to handle indexing for nodes.

Parameters:: property_name (str)

rebuild_edge_property_index(property_name)[source]¶

Rebuild an exact-match edge property index from stored edges.

Parameters:: property_name (str) – Edge property to index.
Returns:: Number of indexed edge records.

Examples

>>> graph_db.rebuild_edge_property_index("score")
7

rebuild_label_index()[source]¶

Rebuild the node label index from stored nodes.

Returns:: Number of label index entries written.

Examples

>>> graph_db.rebuild_label_index()
12

rebuild_node_property_index(property_name)[source]¶

Rebuild an exact-match node property index from stored nodes.

Parameters:: property_name (str) – Node property to index.
Returns:: Number of indexed node records.

Examples

>>> graph_db.rebuild_node_property_index("name")
3

rebuild_relationship_type_index()[source]¶

Rebuild the relationship type catalog from stored edges.

Returns:: Number of typed edge records indexed.

Examples

>>> graph_db.rebuild_relationship_type_index()
20

rebuild_typed_adjacency()[source]¶

Rebuild typed adjacency indexes from stored edge records.

Returns:: Number of typed edges indexed.

Examples

>>> graph_db.rebuild_typed_adjacency()

sample_neighbors(node_id, edge_type, direction='out', sample_size=10, rng=None)[source]¶

Sample typed neighbors using reservoir sampling.

Parameters:

node_id – Node ID as string or bytes.
edge_type (str) – Edge type to traverse.
direction (str) – "out", "in", or "any".
sample_size (int) – Maximum number of records to return.
rng – Optional random number generator with randrange.

Returns:

List of typed adjacency records.

Examples

>>> graph_db.sample_neighbors("drug-1", "drug-to-protein", sample_size=2)

sample_typed_paths(seed_ids, pattern, rng=None)[source]¶

Sample paths that follow an ordered typed edge pattern.

Parameters:

seed_ids – Starting node IDs as strings or bytes.
pattern (SamplingPattern | list[dict]) – SamplingPattern or list of dictionaries such as {"edge_type": "drug-to-protein", "direction": "out", "sample_size": 2}.
rng – Optional random number generator with randrange.

Returns:

List of dictionaries with seed and sampled path records.

Examples

>>> from pygraphdb.sampling import SamplingHop, SamplingPattern
>>> pattern = SamplingPattern([SamplingHop("drug-to-protein", sample_size=2)])
>>> graph_db.sample_typed_paths(["drug-1"], pattern)

sample_typed_subgraph(seed_ids, pattern, rng=None)[source]¶

Sample and materialize a typed subgraph around seed nodes.

Parameters:

seed_ids – Starting node IDs as strings or bytes.
pattern (SamplingPattern | list[dict]) – SamplingPattern or list of dictionary hop configs.
rng – Optional random number generator with randrange.

Returns:

Dictionary with nodes, edges, and paths entries.

Examples

>>> pattern = [{"edge_type": "drug-to-protein", "direction": "out", "sample_size": 2}]
>>> graph_db.sample_typed_subgraph(["drug-1"], pattern)

serialize_edge_value(edge)[source]¶

Serialize an edge for use with columnar edge ingestion.

Parameters:: edge (Edge)
Return type:: bytes

serialize_node_value(node)[source]¶

Serialize a node for use with columnar node ingestion.

Parameters:: node (Node)
Return type:: bytes

update_edge(edge_id, new_data, merge_func)[source]¶

Similar approach for edges. The new_data might include new properties, or you might also allow changing source/target if that makes sense.

Parameters:

edge_id (str)
new_data (dict)

Return type:

Edge

update_node(node_id, new_data, merge_func)[source]¶

Fetch existing node (if any). If none found, treat as new or handle gracefully. merge_func(old_node_dict, new_data_dict) -> merged_properties (dict)

Parameters:

node_id (str)
new_data (dict)

Return type:

Node

class pygraphdb.graphdb.GraphEntityDictSerializer(serializer)[source]¶

Bases: object

Serialize graph entities through a dictionary-compatible serializer.

Parameters:: serializer (Serializer) – Serializer used for the final bytes conversion.

Examples

>>> from pygraphdb.serializers import JSONSerializer
>>> s = GraphEntityDictSerializer(JSONSerializer())
>>> s.deserialize(s.serialize(Node(node_id="n1"), "Node"), "Node").get_id
'n1'

__init__(serializer)[source]¶

Initialize the entity serializer wrapper.

Parameters:: serializer (Serializer) – Serializer used to encode dictionaries as bytes.

deserialize(val, entity_type)[source]¶

Deserializer (conditional on entity type)

Parameters:

val – bytes containing the data
entity_type (str) – (str) is Edge, Node, AdjacencyList

serialize(entity, entity_type)[source]¶

Serialize a graph entity by entity type.

Parameters:

entity – Node, Edge, or adjacency-list object.
entity_type (str) – One of "Node", "Edge", or "AdjacencyList".

Returns:

Serialized bytes.

Examples

>>> from pygraphdb.serializers import PickleSerializer
>>> GraphEntityDictSerializer(PickleSerializer()).serialize(Node("n1"), "Node")[:1]
b'\x80'

class pygraphdb.graphdb.Node(node_id=None, properties=None, labels=None)[source]¶

Bases: object

Graph node with an ID, native labels, and arbitrary properties.

Parameters:

node_id – Optional stable node identifier. A UUID is generated when omitted.
properties – Optional dictionary of node attributes.
labels – Optional iterable of node labels. Labels are stored natively and maintained in the label index by GraphDB.

Examples

>>> Node(node_id="drug-1", labels=["Drug"], properties={"kind": "drug"}).get_id
'drug-1'
>>> Node(node_id="drug-1", labels=["Drug", "Drug"]).labels
('Drug',)

__init__(node_id=None, properties=None, labels=None)[source]¶: Initialize a node, generating a UUID when node_id is omitted.

classmethod from_dict(data)[source]¶

Create a node from serialized dictionary data.

Parameters:: data (dict) – Dictionary produced by to_dict. Older dictionaries without labels deserialize with an empty label tuple.
Returns:: Node instance.

Examples

>>> Node.from_dict({"id": "n1", "properties": {}}).labels
()

property get_id¶: Unique identifier for this node.

property get_id_bytes¶

Return the node ID encoded as UTF-8 bytes.

Examples

>>> Node(node_id="drug-1").get_id_bytes
b'drug-1'

to_dict()[source]¶

Convert to a dictionary form for serialization.

Returns:: Dictionary containing id, properties, and labels.

Examples

>>> Node(node_id="n1", labels=["Drug"]).to_dict()["labels"]
['Drug']

class pygraphdb.graphdb.TimeIndexedEdge(timestamp_dat, *args, **kwargs)[source]¶

Bases: Edge

Edge whose byte key is prefixed by a timestamp.

Parameters:

timestamp_dat – Datetime used as the sortable key prefix.
*args – Positional arguments passed to Edge.
**kwargs – Keyword arguments passed to Edge.

Examples

>>> edge = TimeIndexedEdge(datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc), edge_id="e1")
>>> edge.get_id_bytes.endswith(b':e1')
True

__init__(timestamp_dat, *args, **kwargs)[source]¶

Initialize a timestamp-prefixed edge.

Parameters:

timestamp_dat – Datetime used as the sortable key prefix.
*args – Positional arguments passed to Edge.
**kwargs – Keyword arguments passed to Edge.

classmethod from_dict(data)[source]¶

Factory from dictionary.

Parameters:: data (dict)

property get_id_bytes¶

Return timestamp-prefixed edge ID bytes.

Examples

>>> edge = TimeIndexedEdge(datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc), edge_id="e1")
>>> edge.get_id_bytes.endswith(b':e1')
True

to_dict()[source]¶: Convert to a dictionary for serialization.

pygraphdb.graphdb.bytes_to_datetime(b, tzinfo=datetime.timezone.utc)[source]¶

Convert bytes produced by datetime_to_bytes back to a datetime.

Parameters:

b (bytes) – Eight-byte timestamp generated by datetime_to_bytes.
tzinfo – Time zone used for the epoch reference.

Returns:

Decoded datetime.

Return type:

datetime

Examples

>>> bytes_to_datetime(b'\x00' * 8)
datetime.datetime(1970, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)

pygraphdb.graphdb.datetime_to_bytes(dt, tzinfo=datetime.timezone.utc)[source]¶

Convert a datetime to big-endian microseconds since the Unix epoch.

Parameters:

dt (datetime) – Datetime at or after 1970-01-01.
tzinfo – Time zone used for the epoch reference.

Returns:

Eight bytes containing the timestamp as an unsigned integer.

Return type:

bytes

Examples

>>> datetime_to_bytes(datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc))
b'\x00\x00\x00\x00\x00\x00\x00\x00'

Sampling Configuration¶

Typed sampling configuration objects for PyGraphDB.

The graph sampling APIs accept these objects as a structured alternative to plain dictionaries while preserving dict compatibility.

class pygraphdb.sampling.SamplingHop(edge_type, direction='out', sample_size=10)[source]¶

Bases: object

Configuration for one typed sampling hop.

Parameters:

edge_type (str) – Edge type to traverse, read from edge.properties["type"].
direction (str) – Traversal direction. Use "out" for source to target, "in" for target to source, or "any" for both directions.
sample_size (int) – Maximum number of neighbors to sample at this hop for each node in the current frontier.

Examples

>>> hop = SamplingHop("drug-to-protein", direction="out", sample_size=2)
>>> hop.to_dict()
{'edge_type': 'drug-to-protein', 'direction': 'out', 'sample_size': 2}

direction: str = 'out'¶

edge_type: str¶

classmethod from_dict(data)[source]¶

Create a hop from a dictionary-style sampling configuration.

Parameters:: data (Mapping[str, object]) – Mapping with edge_type and optional direction and sample_size keys.
Returns:: A validated SamplingHop instance.
Return type:: SamplingHop

Examples

>>> SamplingHop.from_dict({'edge_type': 'drug-to-protein', 'sample_size': 2})
SamplingHop(edge_type='drug-to-protein', direction='out', sample_size=2)

sample_size: int = 10¶

to_dict()[source]¶

Return a dictionary compatible with the original sampling API.

Returns:: A dictionary containing edge_type, direction, and sample_size.
Return type:: dict[str, object]

Examples

>>> SamplingHop("drug-to-protein", sample_size=2).to_dict()["sample_size"]
2

class pygraphdb.sampling.SamplingPattern(hops)[source]¶

Bases: object

Ordered typed sampling pattern.

Parameters:: hops (Sequence[SamplingHop | Mapping[str, object]]) – Sequence of SamplingHop objects or dictionary-style hop configurations.

Examples

>>> pattern = SamplingPattern([
...     SamplingHop("drug-to-protein", sample_size=2),
...     {"edge_type": "protein-to-disease", "direction": "out"},
... ])
>>> len(pattern)
2

classmethod from_dicts(hops)[source]¶

Create a pattern from dictionary-style hop configurations.

Parameters:: hops (Iterable[Mapping[str, object]]) – Iterable of mappings accepted by SamplingHop.from_dict.
Returns:: A normalized sampling pattern.
Return type:: SamplingPattern

Examples

>>> SamplingPattern.from_dicts([{'edge_type': 'drug-to-protein'}]).to_dicts()[0]['edge_type']
'drug-to-protein'

hops: Sequence[SamplingHop | Mapping[str, object]]¶

to_dicts()[source]¶

Return dictionary configurations for all hops.

Returns:: List of dictionary-style hop configurations.
Return type:: list[dict[str, object]]

Examples

>>> SamplingPattern([SamplingHop('a-to-b')]).to_dicts()[0]['direction']
'out'

pygraphdb.sampling.as_sampling_hop(hop)[source]¶

Normalize a hop configuration to SamplingHop.

Parameters:: hop (SamplingHop | Mapping[str, object]) – Either a SamplingHop or dictionary-style hop configuration.
Returns:: A SamplingHop instance.
Return type:: SamplingHop

Examples

>>> as_sampling_hop({'edge_type': 'drug-to-protein'}).edge_type
'drug-to-protein'

pygraphdb.sampling.as_sampling_pattern(pattern)[source]¶

Normalize a sampling pattern to SamplingPattern.

Parameters:: pattern (SamplingPattern | Iterable[SamplingHop | Mapping[str, object]]) – A SamplingPattern or iterable of hop configurations.
Returns:: A SamplingPattern instance.
Return type:: SamplingPattern

Examples

>>> as_sampling_pattern([{'edge_type': 'drug-to-protein'}]).hops[0].sample_size
10

Columnar Ingestion¶

Columnar ingestion containers for PyGraphDB.

class pygraphdb.ingestion.EdgeList(edge_ids, sources, targets, edge_types, edge_values)[source]¶

Bases: object

Columnar typed edges with caller-provided serialized edge values.

Parameters:

edge_ids (list[bytes])
sources (list[bytes])
targets (list[bytes])
edge_types (list[str])
edge_values (list[bytes])

chunks(chunk_size)[source]¶

Yield fixed-size EdgeList chunks.

Parameters:: chunk_size (int)

edge_ids: list[bytes]¶

edge_types: list[str]¶

edge_values: list[bytes]¶

classmethod from_arrow(edge_ids, sources, targets, edge_types, edge_values)[source]¶: Create an edge list from Arrow-like or Python columns.

classmethod from_polars(df, *, edge_id='edge_id', source='source', target='target', edge_type='edge_type', edge_value='edge_value')[source]¶: Create an edge list from a Polars DataFrame.

sources: list[bytes]¶

targets: list[bytes]¶

class pygraphdb.ingestion.NodeList(node_ids, node_values)[source]¶

Bases: object

Columnar nodes with caller-provided serialized node values.

Parameters:

node_ids (list[bytes])
node_values (list[bytes])

chunks(chunk_size)[source]¶

Yield fixed-size NodeList chunks.

Parameters:: chunk_size (int)

classmethod from_arrow(node_ids, node_values)[source]¶: Create a node list from Arrow-like or Python columns.

classmethod from_polars(df, *, node_id='node_id', node_value='node_value')[source]¶: Create a node list from a Polars DataFrame.

node_ids: list[bytes]¶

node_values: list[bytes]¶

Cypher Queries¶

Minimal read-only Cypher support for PyGraphDB.

The supported subset maps directly to existing typed adjacency and sampling APIs:

MATCH (a {id: “node-id”})-[:TYPE1]->(b)<-[:TYPE2]-(c) RETURN a.name, b LIMIT 10 CALL pg.sample_typed_paths([“node-id”], [{“edge_type”: “TYPE”, “sample_size”: 2}]) YIELD path RETURN path

class pygraphdb.cypher.QueryResult(columns, records)[source]¶

Bases: object

Tabular query result returned by GraphDB.query.

columns contains projected column names in return order. records is a list of dictionaries keyed by column name.

Examples

>>> result = QueryResult(columns=("n",), records=[{"n": "node"}])
>>> len(result)
1
>>> list(result)[0]["n"]
'node'

Parameters:

columns (tuple[str, ...])
records (list[dict[str, object]])

columns: tuple[str, ...]¶

records: list[dict[str, object]]¶

pygraphdb.cypher.execute(graph, query, parameters=None)[source]¶

Execute a supported Cypher query against a GraphDB instance.

Parameters:

graph – GraphDB instance used for indexed lookups and traversal.
query (str) – Cypher query text.
parameters (dict[str, object] | None)

Returns:

QueryResult with projected records.

Return type:

QueryResult

Examples

>>> execute(graph_db, 'MATCH (n:Drug) RETURN n')

pygraphdb.cypher.parse(query)[source]¶

Parse the supported Cypher subset.

Parameters:: query (str) – Cypher query text.
Returns:: Parsed query object.
Raises:: ValueError – If the query is outside the supported subset.
Return type:: MatchQuery | SampleTypedPathsCall | NodeScanQuery | RelationshipScanQuery | MultiMatchQuery

Examples

>>> parse('MATCH (n:Drug) RETURN n').label
'Drug'

pygraphdb.cypher.plan(query)[source]¶

Return the logical plan for a supported Cypher query.

Parameters:: query (str)
Return type:: LogicalPlan

Key-Value Stores¶

class pygraphdb.kvstores.KVStore[source]¶

Bases: object

Abstract interface for a simple key-value store.

close()[source]¶: Close any resources owned by the store.

delete(key)[source]¶

Delete a raw key/value pair.

Parameters:: key (bytes)

delete_edge(edge_id)[source]¶

Delete an edge.

Parameters:: edge_id (str)

delete_index_entry(index_name, key_parts, value)[source]¶

Delete one sorted index entry.

Parameters:

index_name (str) – Logical index name.
key_parts (list[bytes]) – Ordered components used when the entry was written.
value (bytes) – Entity ID or payload used when the entry was written.

Examples

>>> store.delete_index_entry("node_label", [b"Drug"], b"drug-1")

delete_metadata(key)[source]¶

Delete a metadata key/value pair.

Parameters:: key (bytes)

delete_node(node_id)[source]¶

Delete a node.

Parameters:: node_id (str)

delete_range_index_entry(index_name, key_parts, range_value, value)[source]¶

Delete one sorted range index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
range_value (bytes)
value (bytes)

delete_typed_adjacency(source_id, target_id, edge_type, edge_id)[source]¶

Delete typed adjacency records for an edge.

Parameters:

source_id (bytes)
target_id (bytes)
edge_type (str)
edge_id (bytes)

get(key)[source]¶

Return a raw value by key.

Parameters:: key (bytes)
Return type:: bytes

get_edge(edge_id)[source]¶

Retrieve an edge.

Parameters:: edge_id (str)
Return type:: bytes

get_edges_bulk(edge_ids)[source]¶

Retrieve multiple serialized edges by key.

Parameters:: edge_ids (list[str])
Return type:: dict[str, bytes]

get_metadata(key)[source]¶

Return a metadata value by key.

Parameters:: key (bytes)
Return type:: bytes

get_node(node_id)[source]¶

Retrieve a node by ID.

Parameters:: node_id (str)
Return type:: bytes

get_nodes_bulk(node_ids)[source]¶

Retrieve multiple serialized nodes by key.

Parameters:: node_ids (list[str])
Return type:: dict[str, bytes]

ingest_edges_columnar(edge_list, *, append_only=True, native=True)[source]¶

Store columnar typed edges with caller-provided serialized values.

Parameters:

append_only (bool)
native (bool)

ingest_nodes_columnar(node_list, *, native=True)[source]¶

Store columnar nodes with caller-provided serialized values.

Parameters:: native (bool)

iter_index_prefix(index_name, key_parts)[source]¶

Yield values whose index key starts with key_parts.

Parameters:

index_name (str) – Logical index name.
key_parts (list[bytes]) – Ordered prefix components.

Yields:

Values associated with matching index entries.

Examples

>>> list(store.iter_index_prefix("node_label", [b"Drug"]))
[b'drug-1']

iter_range_index(index_name, key_parts, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Yield values whose range index key falls between start and end values.

Parameters:

index_name (str)
key_parts (list[bytes])
start_value (bytes | None)
end_value (bytes | None)
include_start (bool)
include_end (bool)

iter_typed_adjacency(node_id, edge_type, direction='out')[source]¶

Yield typed adjacency records for a node and edge type.

Parameters:

node_id (bytes)
edge_type (str)
direction (str)

put(key, value)[source]¶

Store a raw key/value pair.

Parameters:

key (bytes)
value (bytes)

put_edge(edge_id, value)[source]¶

Store an edge (serialized).

Parameters:

edge_id (str)
value (bytes)

put_edges_bulk(keys_and_values)[source]¶

Store multiple serialized edges.

Parameters:: keys_and_values (dict[str, bytes])

put_index_entries_bulk(entries)[source]¶

Store many sorted index entries.

Parameters:: entries (list[tuple[str, list[bytes], bytes]]) – Tuples of (index_name, key_parts, value).

Examples

>>> store.put_index_entries_bulk([("node_label", [b"Drug"], b"drug-1")])

put_index_entry(index_name, key_parts, value)[source]¶

Store one sorted index entry.

Parameters:

index_name (str) – Logical index name, such as "node_label".
key_parts (list[bytes]) – Ordered components used as the scan prefix.
value (bytes) – Entity ID or payload returned by prefix scans.

Examples

>>> store.put_index_entry("node_label", [b"Drug"], b"drug-1")

put_metadata(key, value)[source]¶

Store a metadata key/value pair.

Parameters:

key (bytes)
value (bytes)

put_node(node_id, value)[source]¶

Store a node (serialized).

Parameters:

node_id (str)
value (bytes)

put_nodes_bulk(keys_and_values)[source]¶

Store multiple node (serialized) values in a single batch/transaction if possible.

Parameters:: keys_and_values (dict[str, bytes])

put_range_index_entries_bulk(entries)[source]¶

Store many sorted range index entries.

Parameters:: entries (list[tuple[str, list[bytes], bytes, bytes]])

put_range_index_entry(index_name, key_parts, range_value, value)[source]¶

Store one sorted range index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
range_value (bytes)
value (bytes)

put_typed_adjacency(source_id, target_id, edge_type, edge_id)[source]¶

Store typed adjacency records for an edge.

Parameters:

source_id (bytes)
target_id (bytes)
edge_type (str)
edge_id (bytes)

put_typed_adjacency_bulk(records)[source]¶

Store typed adjacency records for multiple edges.

Parameters:: records (list[tuple[bytes, bytes, str, bytes]])

range_iter(start_key, end_key)[source]¶

Iterate over keys from start_key to end_key (inclusive).

Parameters:

start_key (bytes)
end_key (bytes)

class pygraphdb.kvstores.LMDBStore(path='graph_lmdb', map_size=10485760, map_id=True, map_keys=False)[source]¶

Bases: KVStore

LMDB implementation of the PyGraphDB key-value store.

Examples

>>> store = LMDBStore(path="/tmp/example_graph_lmdb")

__init__(path='graph_lmdb', map_size=10485760, map_id=True, map_keys=False)[source]¶

Creates/opens an LMDB environment with three named sub-databases:

b’nodes’ for node data
b’edges’ for edge data
b’adj’ for adjacency lists

close()[source]¶: Close the LMDB environment.

delete(key)[source]¶

Placeholder generic delete; graph code uses specialized methods.

Parameters:: key (bytes)

delete_edge(edge_id)[source]¶

Delete an edge by byte key.

Parameters:: edge_id (str)

delete_index_entry(index_name, key_parts, value)[source]¶

Delete one sorted index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
value (bytes)

delete_metadata(key)[source]¶

Delete a metadata key/value pair.

Parameters:: key (bytes)

delete_node(node_id)[source]¶

Delete a node by byte key.

Parameters:: node_id (bytes)

delete_range_index_entry(index_name, key_parts, range_value, value)[source]¶

Delete one sorted range index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
range_value (bytes)
value (bytes)

delete_typed_adjacency(source_id, target_id, edge_type, edge_id)[source]¶

Delete forward and reverse typed adjacency records.

Parameters:

source_id (bytes)
target_id (bytes)
edge_type (str)
edge_id (bytes)

get(key)[source]¶

Placeholder generic get; graph code uses specialized methods.

Parameters:: key (bytes)
Return type:: bytes

get_adjacency(node_id)[source]¶

Return a serialized adjacency list for a node.

Parameters:: node_id (bytes | str)
Return type:: bytes | None

get_adjacency_bulk(node_ids)[source]¶

Retrieve multiple adjacency lists in a single read transaction. Returns a dict { node_id: serialized adjacency } for all found items.

Parameters:: node_ids (List[bytes])
Return type:: Dict[bytes, bytes]

get_edge(edge_id)[source]¶

Return serialized edge bytes by key, or None.

Parameters:: edge_id (str)
Return type:: bytes

get_edge_keys_generator(num_edges=None, key_offset=None)[source]¶: Yield edge keys from the edge database.

get_edges_bulk(edge_ids)[source]¶

Return serialized edges for the requested keys.

Parameters:: edge_ids (list[bytes])
Return type:: dict[bytes, bytes]

get_metadata(key)[source]¶

Return a metadata value by key, or None.

Parameters:: key (bytes)
Return type:: bytes

get_node(node_id)[source]¶

Return serialized node bytes by key, or None.

Parameters:: node_id (bytes)
Return type:: bytes

get_node_keys_generator(num_nodes=None, key_offset=None)[source]¶: Yield node keys from the node database.

get_nodes_bulk(node_ids)[source]¶

Retrieve multiple nodes in one read transaction.

Parameters:: node_ids (list[bytes])
Return type:: dict[bytes, bytes]

iter_index_prefix(index_name, key_parts)[source]¶

Yield values whose index key starts with key_parts.

Parameters:

index_name (str)
key_parts (list[bytes])

iter_range_index(index_name, key_parts, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Yield values whose range index key falls between start and end values.

Parameters:

index_name (str)
key_parts (list[bytes])
start_value (bytes | None)
end_value (bytes | None)
include_start (bool)
include_end (bool)

iter_typed_adjacency(node_id, edge_type, direction='out')[source]¶

Yield typed adjacency (edge_id, neighbor_id) pairs.

Parameters:

node_id (bytes)
edge_type (str)
direction (str)

put(key, value)[source]¶

Placeholder generic put; graph code uses specialized methods.

Parameters:

key (bytes)
value (bytes)

put_adjacency(node_id, value)[source]¶

Store a serialized adjacency list for a node.

Parameters:

node_id (bytes)
value (bytes)

Return type:

None

put_adjacency_bulk(adj_dict)[source]¶

Insert/update multiple adjacency lists in one transaction. :param adj_dict: a dict mapping node_id -> serialized adjacency (list of edges)

Parameters:: adj_dict (Dict[bytes, bytes])
Return type:: None

put_edge(edge_id, value)[source]¶

Store a serialized edge by byte key.

Parameters:

edge_id (bytes)
value (bytes)

put_edges_bulk(keys_and_values)[source]¶

Store many serialized edges in one transaction.

Parameters:: keys_and_values (dict[bytes, bytes])

put_index_entries_bulk(entries)[source]¶

Store many sorted index entries in one transaction.

Parameters:: entries (list[tuple[str, list[bytes], bytes]])

put_index_entry(index_name, key_parts, value)[source]¶

Store one sorted index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
value (bytes)

put_metadata(key, value)[source]¶

Store a metadata key/value pair.

Parameters:

key (bytes)
value (bytes)

put_node(node_id, value)[source]¶

Store a serialized node by byte key.

Parameters:

node_id (bytes)
value (bytes)

put_nodes_bulk(keys_and_values)[source]¶

Write a batch of nodes in a single transaction.

Parameters:: keys_and_values (dict[bytes, bytes])

put_range_index_entries_bulk(entries)[source]¶

Store many sorted range index entries in one transaction.

Parameters:: entries (list[tuple[str, list[bytes], bytes, bytes]])

put_range_index_entry(index_name, key_parts, range_value, value)[source]¶

Store one sorted range index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
range_value (bytes)
value (bytes)

put_typed_adjacency(source_id, target_id, edge_type, edge_id)[source]¶

Store forward and reverse typed adjacency records.

Parameters:

source_id (bytes)
target_id (bytes)
edge_type (str)
edge_id (bytes)

put_typed_adjacency_bulk(records)[source]¶

Store many typed adjacency records in one transaction.

Parameters:: records (list[tuple[bytes, bytes, str, bytes]])

range_iter(start_key, end_key)[source]¶

Yield node records whose keys fall within an inclusive range.

Parameters:

start_key (bytes)
end_key (bytes)

class pygraphdb.kvstores.LevelDBStore(path='graph_leveldb')[source]¶

Bases: KVStore

LevelDB implementation backed by plyvel.

Parameters:: path – Directory that will contain the LevelDB sub-databases.

Examples

>>> store = LevelDBStore(path="/tmp/example_graph_leveldb")

__init__(path='graph_leveldb')[source]¶: Create or open a LevelDB store. We’ll store nodes/edges by prefix.

close()[source]¶: Close all LevelDB sub-databases.

delete_edge(edge_id)[source]¶

Delete an edge by byte key.

Parameters:: edge_id (str)

delete_index_entry(index_name, key_parts, value)[source]¶

Delete one sorted index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
value (bytes)

delete_metadata(key)[source]¶

Delete a metadata key/value pair.

Parameters:: key (bytes)

delete_node(node_id)[source]¶

Delete a node by byte key.

Parameters:: node_id (bytes)

delete_range_index_entry(index_name, key_parts, range_value, value)[source]¶

Delete one sorted range index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
range_value (bytes)
value (bytes)

delete_typed_adjacency(source_id, target_id, edge_type, edge_id)[source]¶

Delete forward and reverse typed adjacency records.

Parameters:

source_id (bytes)
target_id (bytes)
edge_type (str)
edge_id (bytes)

get_adjacency(node_id)[source]¶

Return a serialized adjacency list for a node.

Parameters:: node_id (bytes)
Return type:: bytes | None

get_adjacency_bulk(node_ids)[source]¶

Return serialized adjacency lists for the requested nodes.

Parameters:: node_ids (List[bytes])
Return type:: Dict[bytes, bytes]

get_db_iterator(which_db='nodes')[source]¶: Yield all records from a named sub-database.

get_db_path(db_string='nodes')[source]¶

Return the relative path for a named LevelDB database.

Examples

>>> LevelDBStore.get_db_path.__name__
'get_db_path'

get_edge(edge_id)[source]¶

Return serialized edge bytes by key, or None.

Parameters:: edge_id (bytes)
Return type:: bytes

get_edge_keys_generator(num_edges=None, key_offset=None)[source]¶: Yield edge keys from the edge database.

get_edges_bulk(edge_ids)[source]¶

Return serialized edges for the requested keys.

Parameters:: edge_ids (list[bytes])
Return type:: dict[bytes, bytes]

get_metadata(key)[source]¶

Return a metadata value by key, or None.

Parameters:: key (bytes)
Return type:: bytes

get_node(node_id)[source]¶

Return serialized node bytes by key, or None.

Parameters:: node_id (bytes)
Return type:: bytes

get_node_keys_generator(num_nodes=None, key_offset=None)[source]¶: Yield node keys from the node database.

get_node_keys_iterator()[source]¶: Yield node database records.

get_nodes_bulk(node_ids)[source]¶

Return serialized nodes for the requested keys.

Parameters:: node_ids (list[bytes])
Return type:: dict[bytes, bytes]

iter_index_prefix(index_name, key_parts)[source]¶

Yield values whose index key starts with key_parts.

Parameters:

index_name (str)
key_parts (list[bytes])

iter_range_index(index_name, key_parts, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Yield values whose range index key falls between start and end values.

Parameters:

index_name (str)
key_parts (list[bytes])
start_value (bytes | None)
end_value (bytes | None)
include_start (bool)
include_end (bool)

iter_typed_adjacency(node_id, edge_type, direction='out')[source]¶

Yield typed adjacency (edge_id, neighbor_id) pairs.

Parameters:

node_id (bytes)
edge_type (str)
direction (str)

put_adjacency(node_id, value)[source]¶

Store a serialized adjacency list for a node.

Parameters:

node_id (bytes)
value (bytes)

Return type:

None

put_adjacency_bulk(adj_dict)[source]¶

Insert/update multiple adjacency lists in one write batch. :param adj_dict: a dict mapping node_id -> serialized adjacency

Parameters:: adj_dict (Dict[str, bytes])
Return type:: None

put_edge(edge_id, value)[source]¶

Store a serialized edge by byte key.

Parameters:

edge_id (bytes)
value (bytes)

put_edges_bulk(keys_and_values)[source]¶

Store many serialized edges in one write batch.

Parameters:: keys_and_values (dict[bytes, bytes])

put_index_entries_bulk(entries)[source]¶

Store many sorted index entries in one write batch.

Parameters:: entries (list[tuple[str, list[bytes], bytes]])

put_index_entry(index_name, key_parts, value)[source]¶

Store one sorted index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
value (bytes)

put_metadata(key, value)[source]¶

Store a metadata key/value pair.

Parameters:

key (bytes)
value (bytes)

put_node(node_id, value)[source]¶

Store a serialized node by byte key.

Parameters:

node_id (bytes)
value (bytes)

put_nodes_bulk(keys_and_values)[source]¶

Use a WriteBatch for atomic bulk updates.

Parameters:: keys_and_values (dict[bytes, bytes])

put_range_index_entries_bulk(entries)[source]¶

Store many sorted range index entries in one write batch.

Parameters:: entries (list[tuple[str, list[bytes], bytes, bytes]])

put_range_index_entry(index_name, key_parts, range_value, value)[source]¶

Store one sorted range index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
range_value (bytes)
value (bytes)

put_typed_adjacency(source_id, target_id, edge_type, edge_id)[source]¶

Store forward and reverse typed adjacency records.

Parameters:

source_id (bytes)
target_id (bytes)
edge_type (str)
edge_id (bytes)

put_typed_adjacency_bulk(records)[source]¶

Store many typed adjacency records in one write batch.

Parameters:: records (list[tuple[bytes, bytes, str, bytes]])

range_iter(start_key, end_key)[source]¶

Yield records whose keys fall within a range.

Note

This generic iterator is not used by the main graph APIs.

Parameters:

start_key (bytes)
end_key (bytes)

class pygraphdb.kvstores.PyRexStore(path='graph_rocksdb', parallelism=None, max_background_jobs=None, write_buffer_size=None, bloom_bits_per_key=None, disable_wal=False)[source]¶

Bases: KVStore

RocksDB implementation backed by pyrex-rocksdb.

PyRexStore uses one physical RocksDB database with prefixed keys instead of separate databases. This lets node, edge, adjacency, and typed adjacency records share RocksDB’s write path and makes it possible to benchmark RocksDB tuning options against the existing LevelDB backend.

Parameters:

path – Directory for the RocksDB database.
parallelism – Optional number of RocksDB background threads.
max_background_jobs – Optional RocksDB background job limit.
write_buffer_size – Optional write buffer size in bytes.
bloom_bits_per_key – Optional block-based Bloom filter bits per key.
disable_wal – Disable RocksDB’s write-ahead log for faster but less durable ingestion benchmarks.

Examples

>>> store = PyRexStore(path="/tmp/example_graph_rocksdb")

__init__(path='graph_rocksdb', parallelism=None, max_background_jobs=None, write_buffer_size=None, bloom_bits_per_key=None, disable_wal=False)[source]¶: Open a PyRex/RocksDB store with optional tuning settings.

close()[source]¶: Close the RocksDB database.

delete(key)[source]¶

Delete a raw key/value pair.

Parameters:: key (bytes)

delete_edge(edge_id)[source]¶

Delete an edge by byte key.

Parameters:: edge_id (bytes)

delete_index_entry(index_name, key_parts, value)[source]¶

Delete one sorted index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
value (bytes)

delete_metadata(key)[source]¶

Delete a metadata key/value pair.

Parameters:: key (bytes)

delete_node(node_id)[source]¶

Delete a node by byte key.

Parameters:: node_id (bytes)

delete_range_index_entry(index_name, key_parts, range_value, value)[source]¶

Delete one sorted range index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
range_value (bytes)
value (bytes)

delete_typed_adjacency(source_id, target_id, edge_type, edge_id)[source]¶

Delete forward and reverse typed adjacency records.

Parameters:

source_id (bytes)
target_id (bytes)
edge_type (str)
edge_id (bytes)

get(key)[source]¶

Return a raw value by key.

Parameters:: key (bytes)
Return type:: bytes

get_adjacency(node_id)[source]¶

Return a serialized adjacency list for a node.

Parameters:: node_id (bytes)
Return type:: bytes | None

get_adjacency_bulk(node_ids)[source]¶

Return serialized adjacency lists for the requested nodes.

Parameters:: node_ids (List[bytes])
Return type:: Dict[bytes, bytes]

get_edge(edge_id)[source]¶

Return serialized edge bytes by key, or None.

Parameters:: edge_id (bytes)
Return type:: bytes

get_edge_keys_generator(num_edges=None, key_offset=None)[source]¶: Yield edge keys from the shared RocksDB keyspace.

get_edges_bulk(edge_ids)[source]¶

Return serialized edges for the requested keys.

Parameters:: edge_ids (list[bytes])
Return type:: dict[bytes, bytes]

get_metadata(key)[source]¶

Return a metadata value by key, or None.

Parameters:: key (bytes)
Return type:: bytes

get_node(node_id)[source]¶

Return serialized node bytes by key, or None.

Parameters:: node_id (bytes)
Return type:: bytes

get_node_keys_generator(num_nodes=None, key_offset=None)[source]¶: Yield node keys from the shared RocksDB keyspace.

get_nodes_bulk(node_ids)[source]¶

Return serialized nodes for the requested keys.

Parameters:: node_ids (list[bytes])
Return type:: dict[bytes, bytes]

has_native_columnar_ingestion()[source]¶

Return whether this PyRex runtime exposes native columnar writes.

Return type:: bool

ingest_edges_columnar(edge_list, *, append_only=True, native=True)[source]¶

Store columnar typed edges, using native PyRex ingestion when available.

Parameters:

append_only (bool)
native (bool)

ingest_nodes_columnar(node_list, *, native=True)[source]¶

Store columnar nodes, using native PyRex ingestion when available.

Parameters:: native (bool)

iter_index_prefix(index_name, key_parts)[source]¶

Yield values whose index key starts with key_parts.

Parameters:

index_name (str)
key_parts (list[bytes])

iter_range_index(index_name, key_parts, start_value=None, end_value=None, include_start=True, include_end=True)[source]¶

Yield values whose range index key falls between start and end values.

Parameters:

index_name (str)
key_parts (list[bytes])
start_value (bytes | None)
end_value (bytes | None)
include_start (bool)
include_end (bool)

iter_typed_adjacency(node_id, edge_type, direction='out')[source]¶

Yield typed adjacency (edge_id, neighbor_id) pairs.

Parameters:

node_id (bytes)
edge_type (str)
direction (str)

put(key, value)[source]¶

Store a raw key/value pair in the shared RocksDB keyspace.

Parameters:

key (bytes)
value (bytes)

put_adjacency(node_id, value)[source]¶

Store a serialized adjacency list for a node.

Parameters:

node_id (bytes)
value (bytes)

Return type:

None

put_adjacency_bulk(adj_dict)[source]¶

Store many serialized adjacency lists in one RocksDB write batch.

Parameters:: adj_dict (Dict[bytes, bytes])
Return type:: None

put_edge(edge_id, value)[source]¶

Store a serialized edge by byte key.

Parameters:

edge_id (bytes)
value (bytes)

put_edges_bulk(keys_and_values)[source]¶

Store many serialized edges in one RocksDB write batch.

Parameters:: keys_and_values (dict[bytes, bytes])

put_index_entries_bulk(entries)[source]¶

Store many sorted index entries in one write batch.

Parameters:: entries (list[tuple[str, list[bytes], bytes]])

put_index_entry(index_name, key_parts, value)[source]¶

Store one sorted index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
value (bytes)

put_metadata(key, value)[source]¶

Store a metadata key/value pair.

Parameters:

key (bytes)
value (bytes)

put_node(node_id, value)[source]¶

Store a serialized node by byte key.

Parameters:

node_id (bytes)
value (bytes)

put_nodes_bulk(keys_and_values)[source]¶

Store many serialized nodes in one RocksDB write batch.

Parameters:: keys_and_values (dict[bytes, bytes])

put_range_index_entries_bulk(entries)[source]¶

Store many sorted range index entries in one write batch.

Parameters:: entries (list[tuple[str, list[bytes], bytes, bytes]])

put_range_index_entry(index_name, key_parts, range_value, value)[source]¶

Store one sorted range index entry.

Parameters:

index_name (str)
key_parts (list[bytes])
range_value (bytes)
value (bytes)

put_typed_adjacency(source_id, target_id, edge_type, edge_id)[source]¶

Store forward and reverse typed adjacency records.

Parameters:

source_id (bytes)
target_id (bytes)
edge_type (str)
edge_id (bytes)

put_typed_adjacency_bulk(records)[source]¶

Store many typed adjacency records in one RocksDB write batch.

Parameters:: records (list[tuple[bytes, bytes, str, bytes]])

range_iter(start_key, end_key)[source]¶

Yield raw records whose keys fall within an inclusive range.

Parameters:

start_key (bytes)
end_key (bytes)

class pygraphdb.kvstores.SimpleIndexCounterKVStore(dbenv=None, db_path=b'nodes')[source]¶

Bases: object

This is to help with lowering storage requirements for edge and node keys, by casting them to long ints.

It makes use of the struct.pack and struct.unpack functions and a simple counter (also stored in the medatadata) to count the number of keys (and hence the index) already entered.

__init__(dbenv=None, db_path=b'nodes')[source]¶

Initialize an index counter helper.

Parameters:

dbenv – LMDB environment.
db_path – Named LMDB database for the counter mapping.

decode_db_key(key)[source]¶: Return the stored encoded key bytes for a user key.

encode_db_key(key)[source]¶: If the key exists, it will return the existing key. if the key does not exist, it will add it to the KV store with a new increment, and return that.

get(key)[source]¶: Read a counter metadata value by key.

get_num_keys()[source]¶: Return the number of keys already assigned.

put(key, value)[source]¶

Store a counter metadata key/value pair.

Parameters:

key (bytes)
value (bytes)

put_num_keys(num_keys)[source]¶: Persist the number of keys already assigned.

class pygraphdb.kvstores.SimpleKV(db_path)[source]¶

Bases: object

Small LMDB-backed helper for metadata key/value access.

Parameters:: db_path – LMDB database handle or name used by transactions.

__init__(db_path)[source]¶: Initialize the helper with an LMDB database path or handle.

decode_db_key(key)[source]¶: Return the encoded database key for a user key.

encode_db_key(key)[source]¶: If the key exists, it will return the existing key. if the key does not exist, it will add it to the KV store with a new increment, and return that.

get(key)[source]¶: Read a metadata value by key.

get_num_keys()[source]¶: Return the stored key counter.

put(key, value)[source]¶

Write a metadata key/value pair.

Parameters:

key (bytes)
value (bytes)

put_num_keys(num_keys)[source]¶: Store the key counter.

Serializers¶

class pygraphdb.serializers.JSONSerializer[source]¶

Bases: Serializer

Uses JSON for serialization.

deserialize(data)[source]¶

Deserialize JSON bytes.

Examples

>>> JSONSerializer().deserialize(b'{"a": 1}')
{'a': 1}

Parameters:: data (bytes)
Return type:: dict

serialize(obj)[source]¶

Serialize a JSON-compatible object.

Examples

>>> JSONSerializer().serialize({"a": 1})
b'{"a": 1}'

Parameters:: obj (dict)
Return type:: bytes

class pygraphdb.serializers.MessagePackSerializer[source]¶

Bases: Serializer

Uses MessagePack for serialization.

deserialize(data)[source]¶

Deserialize MessagePack bytes.

Raises:: ImportError – If the optional msgpack package is missing.
Parameters:: data (bytes)
Return type:: dict

Examples

>>> MessagePackSerializer().deserialize(MessagePackSerializer().serialize({"a": 1}))
{'a': 1}

serialize(obj)[source]¶

Serialize an object with MessagePack.

Raises:: ImportError – If the optional msgpack package is missing.
Parameters:: obj (dict)
Return type:: bytes

Examples

>>> MessagePackSerializer().deserialize(MessagePackSerializer().serialize({"a": 1}))
{'a': 1}

class pygraphdb.serializers.PickleSerializer[source]¶

Bases: Serializer

Uses Python’s pickle for serialization.

deserialize(data)[source]¶

Deserialize pickle bytes.

Examples

>>> PickleSerializer().deserialize(PickleSerializer().serialize({"a": 1}))
{'a': 1}

Parameters:: data (bytes)
Return type:: dict

serialize(obj)[source]¶

Serialize an object with pickle.

Examples

>>> PickleSerializer().deserialize(PickleSerializer().serialize({"a": 1}))
{'a': 1}

Parameters:: obj (dict)
Return type:: bytes

class pygraphdb.serializers.ProtobufSerializer[source]¶

Bases: Serializer

Uses google.protobuf Struct for JSON-like dictionaries.

Struct does not have native integer or bytes types. This serializer tags those values before encoding so Python dictionaries round-trip without losing them.

deserialize(data)[source]¶

Deserialize protobuf Struct bytes.

Parameters:: data (bytes) – Protobuf binary payload.
Returns:: Decoded dictionary.
Raises:: ImportError – If the optional protobuf package is missing.
Return type:: dict

serialize(obj)[source]¶

Serialize a JSON-like dictionary with protobuf Struct.

Parameters:: obj (dict) – Dictionary containing JSON-like values plus tagged ints/bytes.
Returns:: Protobuf binary payload.
Raises:: ImportError – If the optional protobuf package is missing.
Return type:: bytes

class pygraphdb.serializers.Serializer[source]¶

Bases: object

Abstract base for serialization/deserialization.

deserialize(data)[source]¶

Deserialize bytes into a dictionary-like object.

Parameters:: data (bytes) – Serialized bytes.
Returns:: Decoded object.
Return type:: dict

serialize(obj)[source]¶

Serialize a dictionary-like object to bytes.

Parameters:: obj (dict) – Object to serialize.
Returns:: Serialized bytes.
Return type:: bytes