Skip to content

Graph Module

The Graph Module provides comprehensive tools for molecular graph data augmentation using PyTorch Geometric. This module includes advanced augmentation techniques specifically designed for chemical molecular graphs.

Module Overview

augchem.modules.graph

Key Components

Graphs Modules

The core functionality for graph augmentation techniques:

augchem.modules.graph.graphs_modules

augment_dataset(graphs: List[Data], augmentation_methods: List[str], edge_drop_rate: float = 0.1, node_drop_rate: float = 0.1, feature_mask_rate: float = 0.1, edge_add_rate: float = 0.05, edge_remove_rate: float = 0.05, augment_percentage: float = 0.2, seed: int = 42) -> List[Data]

Apply data augmentation techniques to a list of graphs.

Parameters:

Name Type Description Default
graphs List[Data]

List of torch_geometric Data objects representing the graphs

required
augmentation_methods List[str]

List of methods ['edge_drop', 'node_drop', 'feature_mask', 'edge_perturb']

required
edge_drop_rate float

Rate of edge removal (0.0 to 1.0)

0.1
node_drop_rate float

Rate of node removal (0.0 to 1.0)

0.1
feature_mask_rate float

Rate of feature masking (0.0 to 1.0)

0.1
edge_add_rate float

Rate of edge addition for perturbation

0.05
edge_remove_rate float

Rate of edge removal for perturbation

0.05
augment_percentage float

Size of the augmented dataset as a fraction of the original

0.2
seed int

Seed for reproducibility

42

Returns:

Type Description
List[Data]

List of augmented graphs (original + augmented)

Raises:

Type Description
ValueError

If unknown augmentation methods are specified

edge_dropping(data: Data, drop_rate: float = 0.1) -> Data

Remove complete bidirectional edges from the graph (edge dropping)

Parameters:

Name Type Description Default
data Data

torch_geometric graph

required
drop_rate float

Bidirectional edge removal rate (0.0 to 1.0)

0.1

Returns:

Type Description
Data

Graph with edges removed

edge_perturbation(data: Data, add_rate: float = 0.05, remove_rate: float = 0.05) -> Data

Perturb the graph by adding and removing complete bidirectional edges (edge perturbation)

Parameters:

Name Type Description Default
data Data

torch_geometric graph

required
add_rate float

Bidirectional connection addition rate

0.05
remove_rate float

Bidirectional connection removal rate

0.05

Returns:

Type Description
Data

Perturbed graph

feature_masking(data: Data, mask_rate: float = 0.1) -> Data

Mask node features randomly (feature masking)

Parameters:

Name Type Description Default
data Data

torch_geometric graph

required
mask_rate float

Feature masking rate (0.0 to 1.0)

0.1

Returns:

Type Description
Data

Graph with masked features

node_dropping(data: Data, drop_rate: float = 0.1) -> Data

Remove nodes randomly from the graph (node dropping)

Parameters:

Name Type Description Default
data Data

torch_geometric graph

required
drop_rate float

Node removal rate (0.0 to 1.0)

0.1

Returns:

Type Description
Data

Graph with nodes removed

Available Augmentation Techniques

Technique Description Use Case
Edge Dropping Removes complete bidirectional edges Structural perturbation, robustness testing
Node Dropping Removes nodes and associated edges Graph topology variation, missing data simulation
Feature Masking Masks node features with -inf values Feature robustness, attention mechanism training
Edge Perturbation Adds and removes edges simultaneously Chemical space exploration, bond variation

Integration Features

PyTorch Geometric Compatibility

  • Native support for torch_geometric.data.Data objects
  • Seamless integration with PyTorch Geometric DataLoaders
  • Optimized for graph neural network training pipelines

Batch Processing

  • Efficient processing of multiple graphs
  • GPU acceleration support
  • Memory-optimized operations

Quality Assurance

  • Graph integrity validation
  • Self-loop detection and removal
  • Consistent edge attribute handling

Example Workflow

from augchem.modules.graph.graphs_modules import augment_dataset

# Define your molecular graphs
molecular_graphs = load_your_molecular_graphs()

# Apply comprehensive augmentation
augmented_dataset = augment_dataset(
    graphs=molecular_graphs,
    augmentation_methods=['edge_drop', 'node_drop', 'feature_mask'],
    augment_percentage=0.25,
    seed=42
)

# Use in your machine learning pipeline
from torch_geometric.loader import DataLoader
loader = DataLoader(augmented_dataset, batch_size=32, shuffle=True)

Performance Considerations

  • Memory Usage: All operations create graph clones to preserve originals
  • GPU Support: Full tensor operation compatibility with CUDA
  • Scalability: Optimized for large molecular datasets
  • Reproducibility: Deterministic results with seed control

For detailed function documentation and examples, see the Graph Methods section.