Tutorial¶

In this energnn tutorial, we will review :

How to install energnn.
The interaction with typical implementations of energnn.problem.Problem, energnn.problem.ProblemBatch and energnn.problem.ProblemLoader.
The creation of a GNN model using energnn.model.ready_to_use,
The training of the model with energnn.trainer.Trainer.

Installation¶

To install the latest stable release of energnn on CPU,

pip install energnn

For the GPU version,

pip install energnn --extra gpu

Problem Class¶

Let’s consider the following use case: DC Power Flow in an electrical network.

We want to determine the phase angles \(\theta\) at the different buses (nodes) of the network, given the active power injections \(P\) and the physical characteristics of the lines (defined in the susceptance \(B\) matrix).

The problem is modeled by the following linear system:

\[B \theta = P\]

Where:

\(B\) is the susceptance matrix (similar to a Laplacian matrix).
\(P\) is the vector of active power injections at each bus.
\(\theta\) is the vector of phase angles we wish to predict.

Let’s generate a random problem instance and explore its interface.

[1]:

from energnn.problem.example import LinearSystemProblemGenerator

pb_generator = LinearSystemProblemGenerator(seed=7, n_max=4)
problem = pb_generator.generate_problem()

Context Graph¶

The input of our GNN model is referred to as the context, instantiated as an energnn.graph.Graph object.

In our case, the context contains the known data of the problem: the matrix \(B\) and the vector \(P\). This data is structured as a Hyper Heterogeneous Multi Graph (H2MG):

The matrix :math:`B` (lines): The off-diagonal elements of \(B\) represent the connections between nodes. We encode them in a hyper-edge set called line. Each line has a feature called susceptance.
The vector :math:`P` (buses): Power injections are associated with the buses of the network. We encode them in a hyper-edge set called bus. Each bus has a feature called active_power_injection.

[2]:

# Let us explore the context structure
print(problem.context_structure)

           Ports                  Features
Name
line  [from, to]             [susceptance]
bus         [id]  [active_power_injection]

Each hyper-edge set has ports and/or features:

Ports define the connectivity (e.g., a line connects a from bus to a to bus). They associate a hyper-edge with an integer address (the node index).
Features are the associated numerical values (e.g., the susceptance value).

[3]:

# Print the context graph associated to the problem instance
context, _ = problem.get_context()
print(context)

bus
          ports               features
             id active_power_injection
object_id
0           0.0              -0.361243
1           1.0               0.397479
line
          ports         features
           from   to susceptance
object_id
0           0.0  1.0    0.818972

The context of this specific problem instance has:

Several line objects (representing physical links and their susceptance),
Several bus objects (representing nodes and their power injection).

Decision Graph¶

The output of our GNN model is referred to as the decision.

In our case, it is the vector of phase angles \(\theta\).

We also model it as a graph, where each bus carries a phase_angle feature.

This specific problem class has a helper method called get_zero_decision that returns a decision of the right structure, filled with zeros. It’s a great starting point for optimization.

[4]:

# Let us explore the decision structure
print(problem.decision_structure)

     Ports       Features
Name
bus   None  [phase_angle]

Notice that decisions concern only a subset of the classes available in the context (here the bus), and that they have no ports, just features.

[5]:

# Let us create an initial zero decision
decision, _ = problem.get_zero_decision()
print(decision)

bus
             features
          phase_angle
object_id
0                -0.0
1                 0.0

Objective Function¶

The score of a decision measures the quality of our prediction.

In this case, we use the Mean Squared Error (MSE) compared to the exact solution \(\theta^\star\): \(\frac{1}{2} \Vert \theta - \theta^\star \Vert^2\).

Let’s evaluate this score for our zero decision.

[6]:

score, _ = problem.get_score(decision=decision)
print(f"Initial score (MSE): {score}")

Initial score (MSE): 0.08047349750995636

Gradient Graph¶

To learn, the model needs to know the direction in which to modify its predictions: this is the gradient of the objective function.

In this simple case, the gradient is just the vector \(\theta - \theta^\star\).

Let’s calculate it for our initial decision.

[7]:

gradient, _ = problem.get_gradient(decision=decision)
print(gradient)

bus
             features
          phase_angle
object_id
0            0.037100
1           -0.399463

Notice that this gradient is of the exact same type as the decision.

Just as a quick sanity check, we can perform a simple manual gradient descent to see if the objective decreases.

[8]:

alpha = 0.5  # Learning rate

objective, _ = problem.get_score(decision=decision)
print(f"Step 0, objective = {objective}")

for i in range(10):
    gradient, _ = problem.get_gradient(decision=decision)

    # Update decision (theta = theta - alpha * gradient)
    decision.feature_flat_array -= alpha * gradient.feature_flat_array

    objective, _ = problem.get_score(decision=decision)
    print(f"Step {i + 1}, objective = {objective}")

Step 0, objective = 0.08047349750995636
Step 1, objective = 0.02011837437748909
Step 2, objective = 0.005029592663049698
Step 3, objective = 0.0012573988642543554
Step 4, objective = 0.0003143500944133848
Step 5, objective = 7.858734170440584e-05
Step 6, objective = 1.964674265764188e-05
Step 7, objective = 4.911732503387611e-06
Step 8, objective = 1.2279560905881226e-06
Step 9, objective = 3.069773981678736e-07
Step 10, objective = 7.67386012512361e-08

The objective function successfully decreases! This confirms the gradient definition is consistent with the score we wish to minimize. Now, let’s explore how multiple problems can be batched together.

Problem Batch¶

Interacting with a single problem instance is useful at inference time, or for debugging purposes. But to train a whole Graph Neural Network model, it is necessary to process batches of problem instances altogether.

[21]:

from energnn.problem.example import LinearSystemProblemGenerator

pb_generator = LinearSystemProblemGenerator(seed=9, n_max=3)
problem_batch = pb_generator.generate_problem_batch(batch_size=3)

# Let us explore the context and decision structures
print("Context Structure:\n", problem_batch.context_structure, "\n")
print("Decision Structure:\n", problem_batch.decision_structure)

Context Structure:
            Ports                  Features
Name
line  [from, to]             [susceptance]
bus         [id]  [active_power_injection]

Decision Structure:
      Ports       Features
Name
bus   None  [phase_angle]

Here, contexts are still graphs, but this time with an extra dimension for the batch:

[10]:

context, _ = problem_batch.get_context()
print(context)

bus
                   ports               features
                      id active_power_injection
batch_id object_id
0        0           0.0              -1.216858
         1           1.0               1.103963
         2           0.0               0.000000
1        0           0.0              -2.028237
         1           1.0              -0.351886
         2           2.0               2.323155
2        0           0.0               1.087987
         1           1.0              -1.157142
         2           0.0               0.000000
line
                   ports         features
                    from   to susceptance
batch_id object_id
0        0           0.0  1.0    1.001875
         1           0.0  0.0    0.000000
         2           0.0  0.0    0.000000
1        0           0.0  1.0    0.918508
         1           0.0  2.0    0.748101
         2           1.0  2.0    0.584060
2        0           0.0  1.0    1.047838
         1           0.0  0.0    0.000000
         2           0.0  0.0    0.000000

Notice that the different networks in the batch do not necessarily have the same connectivity, or the same number of line or bus objects. To group these graphs of varying sizes together, energnn uses padding (filling with zeros).

Still, a ProblemBatch can be handled in a very similar way to a single Problem.

[11]:

alpha = 0.5

decision, _ = problem_batch.get_zero_decision()
objective, _ = problem_batch.get_score(decision=decision)
print(f"Step 0, average objective = {objective}")

for i in range(10):
    gradient, _ = problem_batch.get_gradient(decision=decision)

    # Update decision on the whole batch
    decision.feature_flat_array -= alpha * gradient.feature_flat_array

    objective, _ = problem_batch.get_score(decision=decision)
    print(f"Step {i + 1}, average objective = {objective}")

Step 0, average objective = [0.41525667905807495, 0.6843823790550232, 0.25396728515625]
Step 1, average objective = [0.10381416976451874, 0.1710955947637558, 0.0634918212890625]
Step 2, average objective = [0.025953546166419983, 0.04277390241622925, 0.015872955322265625]
Step 3, average objective = [0.006488386541604996, 0.010693473741412163, 0.00396823650225997]
Step 4, average objective = [0.0016220954712480307, 0.0026733684353530407, 0.0009920591255649924]
Step 5, average objective = [0.00040552386781200767, 0.0006683421670459211, 0.00024801530526019633]
Step 6, average objective = [0.00010138096695300192, 0.0001670852507231757, 6.200382631504908e-05]
Step 7, average objective = [2.534558552724775e-05, 4.1771614633034915e-05, 1.550095657876227e-05]
Step 8, average objective = [6.3362235778186005e-06, 1.0442954589962028e-05, 3.875235961459111e-06]
Step 9, average objective = [1.5839691513974685e-06, 2.6107645680895075e-06, 9.688423006082303e-07]
Step 10, average objective = [3.9603560253453907e-07, 6.5266902993244e-07, 2.4221057515205757e-07]

Notice that there is one score per element of the batch.

Problem Loader¶

Being able to process problem instances per batch is nice, but not enough. To train a Graph Neural Network, we’ll need to iterate over multiple minibatches of problem instances. That’s where the ProblemLoader class comes in.

[22]:

from energnn.problem.example import LinearSystemProblemLoader

problem_loader = LinearSystemProblemLoader(batch_size=4, seed=7, dataset_size=16, n_max=4)

# Let us explore the context and decision structures
print("Context Structure:\n", problem_loader.context_structure, "\n")
print("Decision Structure:\n", problem_loader.decision_structure)

Context Structure:
            Ports                  Features
Name
line  [from, to]             [susceptance]
bus         [id]  [active_power_injection]

Decision Structure:
      Ports       Features
Name
bus   None  [phase_angle]

It allows to iterate over batches of problems.

[13]:

for problem_batch in problem_loader:
    context, _ = problem_batch.get_context()
    decision, _ = problem_batch.get_zero_decision()
    objective, _ = problem_batch.get_score(decision=decision)
    print("Objective:", objective)

Objective: [0.04023674875497818, 1.0828040838241577, 0.5069718956947327, 0.6397796869277954]
Objective: [0.15303818881511688, 0.17635096609592438, 0.9519702792167664, 1.7408232688903809]
Objective: [0.028324833139777184, 0.35810908675193787, 0.463958740234375, 1.2205294370651245]
Objective: [0.047384973615407944, 0.9067906141281128, 0.3180186450481415, 0.6611507534980774]

Each iteration yields a new batch of problems.

Graph Neural Network Model¶

Let us instantiate a small Graph Neural Network model, that fits the context and decision structure of our problem class.

[14]:

from energnn.model.ready_to_use import TinyRecurrentEquivariantGNN

model = TinyRecurrentEquivariantGNN(
    in_structure=problem_loader.context_structure,
    out_structure=problem_loader.decision_structure
)

Make sure that your model is in evaluation mode first!

[15]:

model.eval()  # Set the model in evaluation mode.
# model.train()  # To set the model in train mode.

It is able to take as input a context and return a decision.

[16]:

problem = pb_generator.generate_problem()
context, _ = problem.get_context()
decision, _ = model(context)
print(decision)

bus
             features
          phase_angle
object_id
0           -1.241976
1            0.972012
2            0.320894

It can also process batches of contexts and return batches of decisions.

[17]:

problem_batch = pb_generator.generate_problem_batch(batch_size=4)
context, _ = problem_batch.get_context()
decision, _ = model.forward_batch(graph=context)
print(decision)

bus
                      features
                   phase_angle
batch_id object_id
0        0            3.879246
         1           -4.336865
         2            0.713661
1        0           -1.430068
         1            0.131218
         2            1.134047
2        0           -0.827532
         1            0.785140
         2           -0.000000
3        0           -0.009739
         1            0.016669
         2           -0.786488

Trainer¶

Let us train our Graph Neural Network model over a problem loader. The core training loop is defined by the following pseudocode.

for problem_batch in problem_loader:
    context, _ = problem_batch.get_context()
    decision, _ = model.forward_batch(context)
    gradient, _ = problem_batch.get_gradient(decision)
    model.backprop(gradient)

In practice, we use energnn.trainer to implement the training logic, and allow to use :

optax for the optimizer,
orbax for checkpointing and saving/loading models.

[18]:

from energnn.trainer import Trainer
import optax

trainer = Trainer(model=model, gradient_transformation=optax.adam(learning_rate=3e-4))

The training is performed by iterating over a train loader, and the validation score is periodically computed on a validation loader.

[19]:

train_loader = LinearSystemProblemLoader(seed=7, dataset_size=64, batch_size=4, n_max=3)
val_loader = LinearSystemProblemLoader(seed=8, dataset_size=8, batch_size=4, n_max=3)

[20]:

_ = trainer.train(
    train_loader=train_loader,
    val_loader=val_loader,
    eval_before_training=True,
    n_epochs=10,
)

Validation: 100%|██████████| 2/2 [00:02<00:00,  1.13s/batch, score=1.0137e+01]
Epoch 1/10: 100%|██████████| 16/16 [00:08<00:00,  1.90batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.12batch/s, score=2.6406e+00]
Epoch 2/10: 100%|██████████| 16/16 [00:02<00:00,  5.37batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 11.16batch/s, score=2.5781e+00]
Epoch 3/10: 100%|██████████| 16/16 [00:03<00:00,  5.04batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.24batch/s, score=2.5190e+00]
Epoch 4/10: 100%|██████████| 16/16 [00:03<00:00,  5.31batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.84batch/s, score=2.4622e+00]
Epoch 5/10: 100%|██████████| 16/16 [00:03<00:00,  5.33batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 11.87batch/s, score=2.4078e+00]
Epoch 6/10: 100%|██████████| 16/16 [00:03<00:00,  4.97batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 11.17batch/s, score=2.3555e+00]
Epoch 7/10: 100%|██████████| 16/16 [00:03<00:00,  5.15batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.00batch/s, score=2.3042e+00]
Epoch 8/10: 100%|██████████| 16/16 [00:03<00:00,  4.81batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 10.22batch/s, score=2.2553e+00]
Epoch 9/10: 100%|██████████| 16/16 [00:03<00:00,  4.89batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00,  8.78batch/s, score=2.2075e+00]
Epoch 10/10: 100%|██████████| 16/16 [00:03<00:00,  4.56batch/s]
Validation: 100%|██████████| 2/2 [00:00<00:00, 11.20batch/s, score=2.1607e+00]

Et voilà! The GNN has started to learn how to solve this basic linear system derived from the DC Power Flow problem, which is a sparse linear system.

Notice that the approach is not limited to linear system and can be used for a much broader set of applications!