Graph Module¶
The Graph Module provides comprehensive tools for molecular graph data augmentation using PyTorch Geometric. This module includes advanced augmentation techniques specifically designed for chemical molecular graphs.
Module Overview¶
augchem.modules.graph
¶
Key Components¶
Graphs Modules¶
The core functionality for graph augmentation techniques:
augchem.modules.graph.graphs_modules
¶
augment_dataset(graphs: List[Data], augmentation_methods: List[str], edge_drop_rate: float = 0.1, node_drop_rate: float = 0.1, feature_mask_rate: float = 0.1, edge_add_rate: float = 0.05, edge_remove_rate: float = 0.05, augment_percentage: float = 0.2, seed: int = 42) -> List[Data]
¶
Apply data augmentation techniques to a list of graphs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graphs
|
List[Data]
|
List of torch_geometric Data objects representing the graphs |
required |
augmentation_methods
|
List[str]
|
List of methods ['edge_drop', 'node_drop', 'feature_mask', 'edge_perturb'] |
required |
edge_drop_rate
|
float
|
Rate of edge removal (0.0 to 1.0) |
0.1
|
node_drop_rate
|
float
|
Rate of node removal (0.0 to 1.0) |
0.1
|
feature_mask_rate
|
float
|
Rate of feature masking (0.0 to 1.0) |
0.1
|
edge_add_rate
|
float
|
Rate of edge addition for perturbation |
0.05
|
edge_remove_rate
|
float
|
Rate of edge removal for perturbation |
0.05
|
augment_percentage
|
float
|
Size of the augmented dataset as a fraction of the original |
0.2
|
seed
|
int
|
Seed for reproducibility |
42
|
Returns:
Type | Description |
---|---|
List[Data]
|
List of augmented graphs (original + augmented) |
Raises:
Type | Description |
---|---|
ValueError
|
If unknown augmentation methods are specified |
edge_dropping(data: Data, drop_rate: float = 0.1) -> Data
¶
Remove complete bidirectional edges from the graph (edge dropping)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
torch_geometric graph |
required |
drop_rate
|
float
|
Bidirectional edge removal rate (0.0 to 1.0) |
0.1
|
Returns:
Type | Description |
---|---|
Data
|
Graph with edges removed |
edge_perturbation(data: Data, add_rate: float = 0.05, remove_rate: float = 0.05) -> Data
¶
Perturb the graph by adding and removing complete bidirectional edges (edge perturbation)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
torch_geometric graph |
required |
add_rate
|
float
|
Bidirectional connection addition rate |
0.05
|
remove_rate
|
float
|
Bidirectional connection removal rate |
0.05
|
Returns:
Type | Description |
---|---|
Data
|
Perturbed graph |
feature_masking(data: Data, mask_rate: float = 0.1) -> Data
¶
Mask node features randomly (feature masking)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
torch_geometric graph |
required |
mask_rate
|
float
|
Feature masking rate (0.0 to 1.0) |
0.1
|
Returns:
Type | Description |
---|---|
Data
|
Graph with masked features |
node_dropping(data: Data, drop_rate: float = 0.1) -> Data
¶
Remove nodes randomly from the graph (node dropping)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
torch_geometric graph |
required |
drop_rate
|
float
|
Node removal rate (0.0 to 1.0) |
0.1
|
Returns:
Type | Description |
---|---|
Data
|
Graph with nodes removed |
Available Augmentation Techniques¶
Technique | Description | Use Case |
---|---|---|
Edge Dropping | Removes complete bidirectional edges | Structural perturbation, robustness testing |
Node Dropping | Removes nodes and associated edges | Graph topology variation, missing data simulation |
Feature Masking | Masks node features with -inf values | Feature robustness, attention mechanism training |
Edge Perturbation | Adds and removes edges simultaneously | Chemical space exploration, bond variation |
Integration Features¶
PyTorch Geometric Compatibility¶
- Native support for
torch_geometric.data.Data
objects - Seamless integration with PyTorch Geometric DataLoaders
- Optimized for graph neural network training pipelines
Batch Processing¶
- Efficient processing of multiple graphs
- GPU acceleration support
- Memory-optimized operations
Quality Assurance¶
- Graph integrity validation
- Self-loop detection and removal
- Consistent edge attribute handling
Example Workflow¶
from augchem.modules.graph.graphs_modules import augment_dataset
# Define your molecular graphs
molecular_graphs = load_your_molecular_graphs()
# Apply comprehensive augmentation
augmented_dataset = augment_dataset(
graphs=molecular_graphs,
augmentation_methods=['edge_drop', 'node_drop', 'feature_mask'],
augment_percentage=0.25,
seed=42
)
# Use in your machine learning pipeline
from torch_geometric.loader import DataLoader
loader = DataLoader(augmented_dataset, batch_size=32, shuffle=True)
Performance Considerations¶
- Memory Usage: All operations create graph clones to preserve originals
- GPU Support: Full tensor operation compatibility with CUDA
- Scalability: Optimized for large molecular datasets
- Reproducibility: Deterministic results with seed control
For detailed function documentation and examples, see the Graph Methods section.