Tutorial¶
In this energnn tutorial, we will review :
How to install
energnn.The interaction with typical implementations of
energnn.problem.Problem,energnn.problem.ProblemBatchandenergnn.problem.ProblemLoader.The creation of a GNN model using
energnn.model.ready_to_use,The training of the model with
energnn.trainer.Trainer.
Installation¶
To install the latest stable release of energnn on CPU,
pip install energnn
For the GPU version,
pip install energnn --extra gpu
Problem Class¶
Let’s consider the following use case: DC Power Flow in an electrical network.
We want to determine the phase angles \(\theta\) at the different buses (nodes) of the network, given the active power injections \(P\) and the physical characteristics of the lines (defined in the susceptance \(B\) matrix).
The problem is modeled by the following linear system:
Where:
\(B\) is the susceptance matrix (similar to a Laplacian matrix).
\(P\) is the vector of active power injections at each bus.
\(\theta\) is the vector of phase angles we wish to predict.
Let’s generate a random problem instance and explore its interface.
[1]:
from energnn.problem.example import LinearSystemProblemGenerator
pb_generator = LinearSystemProblemGenerator(seed=7, n_max=4)
problem = pb_generator.generate_problem()
Context Graph¶
The input of our GNN model is referred to as the context, instantiated as an energnn.graph.Graph object.
In our case, the context contains the known data of the problem: the matrix \(B\) and the vector \(P\). This data is structured as a Hyper Heterogeneous Multi Graph (H2MG):
The matrix :math:`B` (lines): The off-diagonal elements of \(B\) represent the connections between nodes. We encode them in a hyper-edge set called line. Each line has a feature called
susceptance.The vector :math:`P` (buses): Power injections are associated with the buses of the network. We encode them in a hyper-edge set called bus. Each bus has a feature called
active_power_injection.
[2]:
# Let us explore the context structure
print(problem.context_structure)
Ports Features
Name
line [from, to] [susceptance]
bus [id] [active_power_injection]
Each hyper-edge set has ports and/or features:
Ports define the connectivity (e.g., a
lineconnects afrombus to atobus). They associate a hyper-edge with an integer address (the node index).Features are the associated numerical values (e.g., the susceptance value).
[3]:
# Print the context graph associated to the problem instance
context, _ = problem.get_context()
print(context)
bus
ports features
id active_power_injection
object_id
0 0.0 -0.361243
1 1.0 0.397479
line
ports features
from to susceptance
object_id
0 0.0 1.0 0.818972
The context of this specific problem instance has:
Several line objects (representing physical links and their susceptance),
Several bus objects (representing nodes and their power injection).
Decision Graph¶
The output of our GNN model is referred to as the decision.
In our case, it is the vector of phase angles \(\theta\).
We also model it as a graph, where each bus carries a phase_angle feature.
This specific problem class has a helper method called get_zero_decision that returns a decision of the right structure, filled with zeros. It’s a great starting point for optimization.
[4]:
# Let us explore the decision structure
print(problem.decision_structure)
Ports Features
Name
bus None [phase_angle]
Notice that decisions concern only a subset of the classes available in the context (here the bus), and that they have no ports, just features.
[5]:
# Let us create an initial zero decision
decision, _ = problem.get_zero_decision()
print(decision)
bus
features
phase_angle
object_id
0 -0.0
1 0.0
Objective Function¶
The score of a decision measures the quality of our prediction.
In this case, we use the Mean Squared Error (MSE) compared to the exact solution \(\theta^\star\): \(\frac{1}{2} \Vert \theta - \theta^\star \Vert^2\).
Let’s evaluate this score for our zero decision.
[6]:
score, _ = problem.get_score(decision=decision)
print(f"Initial score (MSE): {score}")
Initial score (MSE): 0.08047349750995636
Gradient Graph¶
To learn, the model needs to know the direction in which to modify its predictions: this is the gradient of the objective function.
In this simple case, the gradient is just the vector \(\theta - \theta^\star\).
Let’s calculate it for our initial decision.
[7]:
gradient, _ = problem.get_gradient(decision=decision)
print(gradient)
bus
features
phase_angle
object_id
0 0.037100
1 -0.399463
Notice that this gradient is of the exact same type as the decision.
Just as a quick sanity check, we can perform a simple manual gradient descent to see if the objective decreases.
[8]:
alpha = 0.5 # Learning rate
objective, _ = problem.get_score(decision=decision)
print(f"Step 0, objective = {objective}")
for i in range(10):
gradient, _ = problem.get_gradient(decision=decision)
# Update decision (theta = theta - alpha * gradient)
decision.feature_flat_array -= alpha * gradient.feature_flat_array
objective, _ = problem.get_score(decision=decision)
print(f"Step {i + 1}, objective = {objective}")
Step 0, objective = 0.08047349750995636
Step 1, objective = 0.02011837437748909
Step 2, objective = 0.005029592663049698
Step 3, objective = 0.0012573988642543554
Step 4, objective = 0.0003143500944133848
Step 5, objective = 7.858734170440584e-05
Step 6, objective = 1.964674265764188e-05
Step 7, objective = 4.911732503387611e-06
Step 8, objective = 1.2279560905881226e-06
Step 9, objective = 3.069773981678736e-07
Step 10, objective = 7.67386012512361e-08
The objective function successfully decreases! This confirms the gradient definition is consistent with the score we wish to minimize. Now, let’s explore how multiple problems can be batched together.
Problem Batch¶
Interacting with a single problem instance is useful at inference time, or for debugging purposes. But to train a whole Graph Neural Network model, it is necessary to process batches of problem instances altogether.
[21]:
from energnn.problem.example import LinearSystemProblemGenerator
pb_generator = LinearSystemProblemGenerator(seed=9, n_max=3)
problem_batch = pb_generator.generate_problem_batch(batch_size=3)
# Let us explore the context and decision structures
print("Context Structure:\n", problem_batch.context_structure, "\n")
print("Decision Structure:\n", problem_batch.decision_structure)
Context Structure:
Ports Features
Name
line [from, to] [susceptance]
bus [id] [active_power_injection]
Decision Structure:
Ports Features
Name
bus None [phase_angle]
Here, contexts are still graphs, but this time with an extra dimension for the batch:
[10]:
context, _ = problem_batch.get_context()
print(context)
bus
ports features
id active_power_injection
batch_id object_id
0 0 0.0 -1.216858
1 1.0 1.103963
2 0.0 0.000000
1 0 0.0 -2.028237
1 1.0 -0.351886
2 2.0 2.323155
2 0 0.0 1.087987
1 1.0 -1.157142
2 0.0 0.000000
line
ports features
from to susceptance
batch_id object_id
0 0 0.0 1.0 1.001875
1 0.0 0.0 0.000000
2 0.0 0.0 0.000000
1 0 0.0 1.0 0.918508
1 0.0 2.0 0.748101
2 1.0 2.0 0.584060
2 0 0.0 1.0 1.047838
1 0.0 0.0 0.000000
2 0.0 0.0 0.000000
Notice that the different networks in the batch do not necessarily have the same connectivity, or the same number of line or bus objects. To group these graphs of varying sizes together, energnn uses padding (filling with zeros).
Still, a ProblemBatch can be handled in a very similar way to a single Problem.
[11]:
alpha = 0.5
decision, _ = problem_batch.get_zero_decision()
objective, _ = problem_batch.get_score(decision=decision)
print(f"Step 0, average objective = {objective}")
for i in range(10):
gradient, _ = problem_batch.get_gradient(decision=decision)
# Update decision on the whole batch
decision.feature_flat_array -= alpha * gradient.feature_flat_array
objective, _ = problem_batch.get_score(decision=decision)
print(f"Step {i + 1}, average objective = {objective}")
Step 0, average objective = [0.41525667905807495, 0.6843823790550232, 0.25396728515625]
Step 1, average objective = [0.10381416976451874, 0.1710955947637558, 0.0634918212890625]
Step 2, average objective = [0.025953546166419983, 0.04277390241622925, 0.015872955322265625]
Step 3, average objective = [0.006488386541604996, 0.010693473741412163, 0.00396823650225997]
Step 4, average objective = [0.0016220954712480307, 0.0026733684353530407, 0.0009920591255649924]
Step 5, average objective = [0.00040552386781200767, 0.0006683421670459211, 0.00024801530526019633]
Step 6, average objective = [0.00010138096695300192, 0.0001670852507231757, 6.200382631504908e-05]
Step 7, average objective = [2.534558552724775e-05, 4.1771614633034915e-05, 1.550095657876227e-05]
Step 8, average objective = [6.3362235778186005e-06, 1.0442954589962028e-05, 3.875235961459111e-06]
Step 9, average objective = [1.5839691513974685e-06, 2.6107645680895075e-06, 9.688423006082303e-07]
Step 10, average objective = [3.9603560253453907e-07, 6.5266902993244e-07, 2.4221057515205757e-07]
Notice that there is one score per element of the batch.
Problem Loader¶
Being able to process problem instances per batch is nice, but not enough. To train a Graph Neural Network, we’ll need to iterate over multiple minibatches of problem instances. That’s where the ProblemLoader class comes in.
[22]:
from energnn.problem.example import LinearSystemProblemLoader
problem_loader = LinearSystemProblemLoader(batch_size=4, seed=7, dataset_size=16, n_max=4)
# Let us explore the context and decision structures
print("Context Structure:\n", problem_loader.context_structure, "\n")
print("Decision Structure:\n", problem_loader.decision_structure)
Context Structure:
Ports Features
Name
line [from, to] [susceptance]
bus [id] [active_power_injection]
Decision Structure:
Ports Features
Name
bus None [phase_angle]
It allows to iterate over batches of problems.
[13]:
for problem_batch in problem_loader:
context, _ = problem_batch.get_context()
decision, _ = problem_batch.get_zero_decision()
objective, _ = problem_batch.get_score(decision=decision)
print("Objective:", objective)
Objective: [0.04023674875497818, 1.0828040838241577, 0.5069718956947327, 0.6397796869277954]
Objective: [0.15303818881511688, 0.17635096609592438, 0.9519702792167664, 1.7408232688903809]
Objective: [0.028324833139777184, 0.35810908675193787, 0.463958740234375, 1.2205294370651245]
Objective: [0.047384973615407944, 0.9067906141281128, 0.3180186450481415, 0.6611507534980774]
Each iteration yields a new batch of problems.
Graph Neural Network Model¶
Let us instantiate a small Graph Neural Network model, that fits the context and decision structure of our problem class.
[14]:
from energnn.model.ready_to_use import TinyRecurrentEquivariantGNN
model = TinyRecurrentEquivariantGNN(
in_structure=problem_loader.context_structure,
out_structure=problem_loader.decision_structure
)
Make sure that your model is in evaluation mode first!
[15]:
model.eval() # Set the model in evaluation mode.
# model.train() # To set the model in train mode.
It is able to take as input a context and return a decision.
[16]:
problem = pb_generator.generate_problem()
context, _ = problem.get_context()
decision, _ = model(context)
print(decision)
bus
features
phase_angle
object_id
0 -1.241976
1 0.972012
2 0.320894
It can also process batches of contexts and return batches of decisions.
[17]:
problem_batch = pb_generator.generate_problem_batch(batch_size=4)
context, _ = problem_batch.get_context()
decision, _ = model.forward_batch(graph=context)
print(decision)
bus
features
phase_angle
batch_id object_id
0 0 3.879246
1 -4.336865
2 0.713661
1 0 -1.430068
1 0.131218
2 1.134047
2 0 -0.827532
1 0.785140
2 -0.000000
3 0 -0.009739
1 0.016669
2 -0.786488
Trainer¶
Let us train our Graph Neural Network model over a problem loader. The core training loop is defined by the following pseudocode.
for problem_batch in problem_loader:
context, _ = problem_batch.get_context()
decision, _ = model.forward_batch(context)
gradient, _ = problem_batch.get_gradient(decision)
model.backprop(gradient)
In practice, we use energnn.trainer to implement the training logic, and allow to use :
optaxfor the optimizer,orbaxfor checkpointing and saving/loading models.
[18]:
from energnn.trainer import Trainer
import optax
trainer = Trainer(model=model, gradient_transformation=optax.adam(learning_rate=3e-4))
The training is performed by iterating over a train loader, and the validation score is periodically computed on a validation loader.
[19]:
train_loader = LinearSystemProblemLoader(seed=7, dataset_size=64, batch_size=4, n_max=3)
val_loader = LinearSystemProblemLoader(seed=8, dataset_size=8, batch_size=4, n_max=3)
[20]:
_ = trainer.train(
train_loader=train_loader,
val_loader=val_loader,
eval_before_training=True,
n_epochs=10,
)
Validation: 100%|██████████| 2/2 [00:02<00:00, 1.13s/batch, score=1.0137e+01]
Epoch 1/10: 100%|██████████| 16/16 [00:08<00:00, 1.90batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.12batch/s, score=2.6406e+00]
Epoch 2/10: 100%|██████████| 16/16 [00:02<00:00, 5.37batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 11.16batch/s, score=2.5781e+00]
Epoch 3/10: 100%|██████████| 16/16 [00:03<00:00, 5.04batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.24batch/s, score=2.5190e+00]
Epoch 4/10: 100%|██████████| 16/16 [00:03<00:00, 5.31batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.84batch/s, score=2.4622e+00]
Epoch 5/10: 100%|██████████| 16/16 [00:03<00:00, 5.33batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 11.87batch/s, score=2.4078e+00]
Epoch 6/10: 100%|██████████| 16/16 [00:03<00:00, 4.97batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 11.17batch/s, score=2.3555e+00]
Epoch 7/10: 100%|██████████| 16/16 [00:03<00:00, 5.15batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.00batch/s, score=2.3042e+00]
Epoch 8/10: 100%|██████████| 16/16 [00:03<00:00, 4.81batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.22batch/s, score=2.2553e+00]
Epoch 9/10: 100%|██████████| 16/16 [00:03<00:00, 4.89batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 8.78batch/s, score=2.2075e+00]
Epoch 10/10: 100%|██████████| 16/16 [00:03<00:00, 4.56batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 11.20batch/s, score=2.1607e+00]
Et voilà! The GNN has started to learn how to solve this basic linear system derived from the DC Power Flow problem, which is a sparse linear system.
Notice that the approach is not limited to linear system and can be used for a much broader set of applications!