Chasing Symmetries: A New Wave of Physics-Inspired Deep Learning#

In the last decade, deep learning has proven its astonishing capacity to extract complex patterns from huge amounts of data. Computers have learned to recognize images, translate text in real time, discover new drugs, and even generate entire worlds with generative adversarial networks (GANs). However, a critical challenge has emerged: designing neural networks that more transparently respect and exploit the fundamental structure of the real world.

This is where physics-inspired deep learning steps in. By incorporating concepts like symmetries, invariances, and physical laws into neural architectures, researchers have started to build models that not only show improved accuracy but also demonstrate better interpretability, efficiency, stability, and robustness to real-world variations. This blog post explores the basics of symmetries in physics, examines deep learning models that exploit these symmetries, and shows you how to build your own physics-inspired neural networks—whether you are new to the field or a veteran seeking advanced, cutting-edge expansions.

Table of Contents#

What Do We Mean by Symmetry?
Why Are Symmetries Important in Physics and ML?
Models Inspired by Symmetries
Breaking Down Equivariance and Invariance
Introduction to Group Theory for Machine Learning
Physics-Informed Neural Networks (PINNs)
Equivariant Neural Networks from the Ground Up
Getting Started with Implementations
Advanced Techniques for Professionals
Further Directions and Conclusion

What Do We Mean by Symmetry?#

When we talk about a system having symmetry, we refer to the system’s invariance under certain transformations. For a physical object, symmetry might mean something remains unchanged when we rotate it, translate it, or reflect it. For example, a perfect sphere remains the same under any rotation about its center. Similarly, for equations governing physical phenomena, certain transformations leave those equations looking the same.

Symmetry in mathematics is captured using the language of group theory. A group is a set equipped with an operation that combines any two of its elements to form another element, in a way that satisfies certain axioms (closure, associativity, identity, inverses). Examples of groups include the set of rotations in 3D space (SO(3)) or the group of symmetries of a square (the dihedral group D4). Each of these transformations can be viewed as an operation in the group.

Why Are Symmetries Important in Physics and ML?#

In physics, fundamental principles like Noether’s theorem link symmetries to conservation laws (e.g., translational symmetry �?conservation of momentum, rotational symmetry �?conservation of angular momentum). Thus, it’s no accident that many of our most powerful physical theories, from classical mechanics to quantum field theory, revolve around symmetry.

In machine learning—especially deep learning—exploiting symmetry helps in two major ways:

Data Efficiency: If a model is designed to be invariant or equivariant under certain transformations, it can learn more efficiently from fewer samples. For instance, convolutional neural networks (CNNs) exploit translational invariance by using shared filters across an image, leading to significantly fewer learnable parameters compared to naive fully connected networks.
Regularization and Generalization: Symmetry can serve as a form of structural regularization. By design, the model obeys constraints that reflect the underlying organization of its input or the physical reality it aims to replicate. This leads to better performance when extrapolating to unseen data or more complex system states.

The new wave of physics-inspired deep learning aims to generalize these concepts to more sophisticated symmetries (rotations, reflections, permutations, continuous gauge transformations, etc.), as well as incorporate actual physical equations (like partial differential equations) into a neural network’s architecture or training objective.

Models Inspired by Symmetries#

Classical CNNs and Translational Symmetry#

The first big success in leveraging symmetry in deep learning was provided by convolutional neural networks (CNNs). By moving a “convolutional kernel�?across an image, the network outputs features that remain meaningful regardless of where an object is located in the image. This explicitly uses the group of translations (in 2D images). CNNs have been the workhorse of computer vision tasks for years, demonstrating that capturing even a single type of symmetry (translational) can be extraordinarily profitable.

Group Equivariant CNNs#

Group Equivariant CNNs (G-CNNs) generalize translational convolution to other symmetry groups (e.g., rotations, flips). Instead of applying the same kernel to just different translations of the data, these networks apply specialized kernels to rotations, reflections, or even more general transformations. By doing so, they capture invariances in a structured way.

Physics-Informed Neural Networks (PINNs)#

G-CNNs focus on symmetry in the architecture. Another line of thought, Physics-Informed Neural Networks (PINNs), enforces physical constraints (e.g., the Navier–Stokes equations for fluid flow, or Maxwell’s equations for electromagnetism) on the training process. PINNs can discover solutions to differential equations or train with significantly less real-world data, leveraging known forms of physical laws as constraints.

Lie-Parameterized Neural Networks (Gauge Symmetries)#

For more advanced tasks, some network designs explicitly incorporate Lie group constraints, relevant for gauge theories in physics (as in electromagnetism or more advanced quantum field theories). These advanced networks enforce local group symmetries, or gauge symmetries, which broaden the concept of rotation evaluations at different points in space-time.

Breaking Down Equivariance and Invariance#

Much of the magic in symmetry-based deep learning can be captured by two phenomena:

Invariance: A function ( f ) is invariant under a group transformation ( g ) if ( f(g \cdot x) = f(x) ). For example, if you rotate an image of a cat, you still want the classifier to recognize it as a cat.
Equivariance: A function ( f ) is equivariant under a group transformation ( g ) if ( f(g \cdot x) = g’ \cdot f(x) ) for some possibly related transformation ( g’ ). For instance, in a rotation-equivariant layer, rotating an input might result in rotating the feature maps in a meaningful way (rather than losing or changing the interpretation).

CNNs are translationally equivariant layers that naturally yield translation-invariant global features (e.g., after pooling). Through carefully designed convolution-like operations, G-CNNs can become rotation- or reflection-equivariant.

Introduction to Group Theory for Machine Learning#

Before diving further, it helps to briefly recap some group theory concepts. We’ll keep it simple here, though you can find entire textbooks on the subject.

Group Elements (( g )): These are the “actions�?you can perform (like rotate by 90°, reflect across an axis, or more generally a transformation in a Lie group).
Identity Element (( e )): The “do nothing�?transformation.
Inverse (( g^{-1} )): The operation that “undoes�?( g ).
Closure: If ( a ) and ( b ) are elements in the group, then ( a \circ b ) is also in the group.
Associativity: ( (a \circ b) \circ c = a \circ (b \circ c) ).

For continuous symmetries (like rotations in 3D), you typically deal with Lie groups and their generating Lie algebras. In constructing equivariant neural networks, we identify the transformations that data might undergo—like rotations in images, or permutations in sets—and build them into the design so that the network’s output respects or utilizes these transformations appropriately.

Below is a brief table illustrating some common groups and their typical use-cases in machine learning:

Group Name	Symbol	Type	Example of Application
Translation	(\mathbb{R}^n)	Continuous	CNNs (2D translations for images)
Rotation	SO(n)	Continuous, Lie	3D object recognition, molecule analysis
Reflection	O(n)	Discrete	Mirror symmetry for images/models
Permutation	S(n)	Discrete	Graph neural networks on sets
Dihedral	Dn	Discrete	2D symmetrical shapes (e.g., polygons)
Gauge Groups	SU(N)	Continuous, Lie	Quantum field theories, advanced ML

Physics-Informed Neural Networks (PINNs)#

Physics-Informed Neural Networks aim to incorporate known physical laws or equations, like partial differential equations (PDEs), directly into the training objective. For instance, if a PDE (like the Navier–Stokes equation) describes fluid flows, a PINN will embed that PDE in the loss function. The network is then penalized not just for deviating from training data, but also for deviations from that PDE. This approach can:

Reduce the need for large labeled datasets, especially when data is expensive or scarce to collect.
Encourage physically consistent solutions without needing to manually enforce boundary or continuity conditions across domain boundaries.

Example: Solving the Poisson Equation with a PINN#

Consider the Poisson equation in 2D: [

\nabla^2 u(x, y) = f(x, y) ] with boundary conditions ( u = 0 ) on the domain boundary. A PINN for this scenario might involve these steps:

Architecture: A feedforward neural network taking ((x, y)) as input and outputting (u).
Loss Function:
- Physics Loss: The PDE residual (\left| -\nabla^2 u - f(x, y) \right|^2).
- Boundary Loss: The mismatch on boundary points ( \left| u_{\text{predicted}}(x_{\text{b}}, y_{\text{b}}) - 0 \right|^2).
- (Optional) Data Loss: If some ground truth (u) is known at certain interior points, incorporate it too.

The total loss is a weighted sum: (\mathrm{Loss} = \alpha \cdot \mathrm{PhysicsLoss} + \beta \cdot \mathrm{BoundaryLoss} + \gamma \cdot \mathrm{DataLoss}).

Below is a simplified code snippet in Python (using a pseudo deep learning interface) that demonstrates the general structure of a PINN:

1
import numpy as np
2
import torch
3
import torch.nn as nn
4

5
# Define our network
6
class PoissonPINN(nn.Module):
7
    def __init__(self, hidden_units=32):
8
        super(PoissonPINN, self).__init__()
9
        self.layers = nn.Sequential(
10
            nn.Linear(2, hidden_units),
11
            nn.ReLU(),
12
            nn.Linear(hidden_units, hidden_units),
13
            nn.ReLU(),
14
            nn.Linear(hidden_units, 1)  # Output is u(x, y)
15
        )
16

17
    def forward(self, x):
18
        return self.layers(x)
19

20
# PDE residual function
21
def poisson_residual(u_pred, coords, f_func):
22
    # coords is shape [N, 2] with x, y
23
    # We'll approximate the Laplacian with autodiff:
24
    # u_pred is shape [N, 1]
25
    x = coords[:, 0].requires_grad_(True)
26
    y = coords[:, 1].requires_grad_(True)
27
    u = pinn(torch.stack([x, y], dim=1))
28

29
    # Gradient of u w.r.t x
30
    grad_u_x = torch.autograd.grad(u, x,
31
                                   torch.ones_like(u),
32
                                   create_graph=True)[0]
33
    # Gradient of u w.r.t y
34
    grad_u_y = torch.autograd.grad(u, y,
35
                                   torch.ones_like(u),
36
                                   create_graph=True)[0]
37

38
    # Second derivatives
39
    grad_u_xx = torch.autograd.grad(grad_u_x, x,
40
                                    torch.ones_like(grad_u_x),
41
                                    create_graph=True)[0]
42
    grad_u_yy = torch.autograd.grad(grad_u_y, y,
43
                                    torch.ones_like(grad_u_y),
44
                                    create_graph=True)[0]
45

46
    laplacian = grad_u_xx + grad_u_yy
47
    # PDE: -laplacian(u) - f(x, y) = 0
48
    return (-laplacian - f_func(x, y))**2
49

50
# Example f(x, y)
51
def f_func(x, y):
52
    return x * 0 + y * 0 + 1.0  # a constant for simplicity
53

54
# Instantiate the PINN
55
pinn = PoissonPINN(hidden_units=32)
56
optimizer = torch.optim.Adam(pinn.parameters(), lr=1e-3)
57

58
# Example training data
59
num_points = 1000
60
xy_inside = torch.rand(num_points, 2)
61
xy_boundary = torch.rand(num_points, 2)  # Suppose we know boundary points
62

63
for epoch in range(1000):
64
    optimizer.zero_grad()
65

66
    # Physics loss
67
    physics_loss = poisson_residual(None, xy_inside, f_func).mean()
68

69
    # Boundary loss (u should be 0)
70
    u_pred_boundary = pinn(xy_boundary)
71
    boundary_loss = torch.mean(u_pred_boundary**2)
72

73
    total_loss = physics_loss + boundary_loss
74
    total_loss.backward()
75
    optimizer.step()
76

77
    if epoch % 100 == 0:
78
        print(f"Epoch {epoch}, Loss: {total_loss.item():.4e}")

This snippet is a toy example, but it demonstrates the concept of enforcing physical laws via the PDE residual in the loss function. As you expand your approach to real-world PDEs or more complicated domains, the same principles will apply.

Equivariant Neural Networks from the Ground Up#

Equivariant neural networks (E-Nets) focus on ensuring that certain transformations of the input lead to predictable transformations of the output. Key subsets of these networks include:

Group Equivariant Convolutional Networks (G-CNN)#

Definition: For an input image, transformations from a group ( G ) (e.g., rotations, reflections) are applied, and the CNN’s convolution kernels are designed to handle those transformations in a structured manner.
Advantages: They require fewer parameters for the same expressive power and are more robust in tasks where orientation or viewpoint changes are common.

Permutation Equivariant Networks#

Applications: Commonly used in set-based or graph-based data. These networks are built so that permuting the elements of the input set results in a corresponding permutation of output indices.
Key Idea: Using sum or mean pooling can achieve permutation invariance, while specialized layers can ensure permutation equivariance.

O(3)-Equivariant Networks for 3D#

Use-Case: Common in computational chemistry (molecular property prediction) and 3D shape analysis. For molecular data, we want the network’s output to be invariant (e.g., total energy) or equivariant (e.g., forces) under rotations and translations in 3D space.

Getting Started with Implementations#

Building equivariant layers from scratch can be rather technical, but various libraries and frameworks have emerged to simplify it. Below are a few:

E2CNN (PyTorch): For 2D rotational/reflectional equivariante convolutions.
Escnn (PyTorch): Another library for group equivariant neural networks in PyTorch.
LieConv: For continuous group convolutions (handling 2D, 3D rotations).
Pytorch Geometric: Not purely “equivariant�?but offers advanced GNN frameworks which you can adapt with group or permutation equivariances.

Here’s an example snippet using a hypothetical e2cnn-style interface for building a rotation-equivariant convolutional layer:

1
import e2cnn
2
from e2cnn import gspaces
3
from e2cnn import nn as e2nn
4

5
# Define the group of rotations by 4 distinct angles: 0°, 90°, 180°, 270°
6
r2_act = gspaces.Rot2dOnR2(N=4)
7
# Input feature type: 1-channel scalar field
8
in_type = e2nn.FieldType(r2_act, [r2_act.trivial_repr])
9
# Output feature type: 16 channels, each being a regular representation
10
out_type = e2nn.FieldType(r2_act, 16 * [r2_act.regular_repr])
11

12
# Build an equivariant convolution
13
conv = e2nn.R2Conv(in_type, out_type, kernel_size=3, padding=1, bias=False)
14

15
# Example forward usage
16
import torch
17
x = torch.randn(1, 1, 32, 32)  # batch=1, channels=1, 32x32 image
18
x_g = e2nn.GeometricTensor(x, in_type)
19
y_g = conv(x_g)
20
y = y_g.tensor  # shape: [1, 16 * size_of_rep, 32, 32]

This example sets up a rotationally equivariant convolution for an image. Under the hood, it manipulates filters so that rotating the input image yields a corresponding rotation in the output feature maps.

Advanced Techniques for Professionals#

Once you have a solid grasp of the fundamentals, you can level up with the following techniques:

1. Combining PINNs with Equivariance#

Imagine you are modeling a physical system governed by PDEs, like fluid flows around an object with inherent rotational symmetry. You could combine the PDE constraints from PINNs with rotational-equivariant architectures to drastically reduce the complexity of your problem. For instance, reflecting or rotating boundary conditions might reduce the domain you need to model.

2. Gauge Equivariant Neural Networks#

Gauge symmetries appear in electromagnetism, quantum chromodynamics, and more. A gauge transformation might vary from point to point in your domain, but the physical observables remain invariant. Building neural networks that incorporate these gauge symmetries allows for stable generalization and physically meaningful representations—particularly relevant in advanced scientific computing and partial differential equation solving.

3. Lie Transformer and Continuous Symmetries#

Transformers have gained considerable traction in recent years. Researchers are exploring how to build Lie Transformers that incorporate transformations from continuous groups. They might replace standard self-attention with a layer that is equivariant under rotations or translations. This approach stands to benefit scenarios where attention-based models (like in natural language processing or 3D geometry tasks) meet continuous group symmetries.

4. Neural Operators (DeepONet, FNO)#

A growing field is focusing on Neural Operators, which map function spaces to function spaces. These handle infinite-dimensional analogs of data, capturing entire fields (like temperature or velocity profiles) rather than discrete points. Frameworks like Deep Operator Networks (DeepONet) or Fourier Neural Operators (FNO) have soared in popularity for solving PDEs quickly. They often embed or can be combined with symmetry considerations, ensuring that the operator respects the underlying symmetries of the physical system.

5. Uncertainty Quantification and Bayesian Approaches#

Real physical systems are seldom deterministic. Combining physics-based priors, group symmetries, and Bayesian inference can yield robust, interpretable models that provide uncertainty estimates—crucial for high-stakes domains like aerospace engineering, seismology, or medical imaging.

Example Code: Building a Simple 2D Equivariant Network#

Below is a more expanded code example demonstrating how you might build a rotation-equivariant network for a simple classification task on 2D images using the e2cnn Python library. This code is conceptual and might need adaptation to run directly.

1
import torch
2
import torch.nn as nn
3
import torch.optim as optim
4
from e2cnn import gspaces
5
from e2cnn import nn as e2nn
6

7
class EquivariantNet(nn.Module):
8
    def __init__(self, N=4, in_channels=1, out_classes=10):
9
        super(EquivariantNet, self).__init__()
10

11
        # Rotations by 0°, 90°, 180°, 270°
12
        r2_act = gspaces.Rot2dOnR2(N=N)
13

14
        # Define input and output field types
15
        in_type  = e2nn.FieldType(r2_act, in_channels * [r2_act.trivial_repr])
16
        mid_type = e2nn.FieldType(r2_act, 16 * [r2_act.regular_repr])
17
        out_type = e2nn.FieldType(r2_act, 32 * [r2_act.regular_repr])
18

19
        # Layers
20
        self.block1 = e2nn.SequentialModule(
21
            e2nn.R2Conv(in_type, mid_type, kernel_size=3, padding=1, bias=False),
22
            e2nn.ReLU(mid_type, inplace=True),
23
            e2nn.PointwiseAvgPool(mid_type, kernel_size=2, stride=2)
24
        )
25

26
        self.block2 = e2nn.SequentialModule(
27
            e2nn.R2Conv(mid_type, out_type, kernel_size=3, padding=1, bias=False),
28
            e2nn.ReLU(out_type, inplace=True),
29
            e2nn.PointwiseAvgPool(out_type, kernel_size=2, stride=2)
30
        )
31

32
        # Final fully connected layer operates on trivial representaion
33
        # after pooling or flattening.
34
        # We'll do global average pooling to get an invariant descriptor
35
        self.gpool = e2nn.GroupPooling(out_type)
36

37
        # Once we have pooled, the representation is trivial -> single channel
38
        # But overall channel dimension is out_type.size
39
        self.linear = nn.Linear(32, out_classes)
40

41
    def forward(self, x):
42
        x_g = e2nn.GeometricTensor(x, self.block1[0].in_type)
43
        x_g = self.block1(x_g)
44
        x_g = self.block2(x_g)
45

46
        # Global pooling over the spatial dimensions to get 1x1
47
        x_pooled = x_g.tensor.mean(dim=[2, 3])  # shape: [batch, channel]
48

49
        # Now group pool to remove group dimension -> shape: [batch, 32]
50
        x_pooled_g = e2nn.GeometricTensor(x_pooled.unsqueeze(-1).unsqueeze(-1), self.block2[-1].out_type)
51
        x_gpooled = self.gpool(x_pooled_g).tensor.squeeze(-1).squeeze(-1)
52

53
        # Classification
54
        logits = self.linear(x_gpooled)
55
        return logits
56

57
# Usage Example:
58
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
59
model = EquivariantNet().to(device)
60
optimizer = optim.Adam(model.parameters(), lr=1e-3)
61
criterion = nn.CrossEntropyLoss()
62

63
# Suppose train_loader yields (images, labels), where images shape: [B, 1, 28, 28]
64
for epoch in range(10):
65
    for images, labels in train_loader:
66
        images = images.to(device)
67
        labels = labels.to(device)
68

69
        optimizer.zero_grad()
70
        logits = model(images)
71
        loss = criterion(logits, labels)
72
        loss.backward()
73
        optimizer.step()
74
    print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

Key Takeaways of This Example#

We used the Rot2dOnR2(N=4) group from e2cnn to build a network that is equivariant to 4 discrete rotations.
Each convolution changes the field type (number and type of channels).
GroupPooling leads to a globally invariant descriptor, suitable for classification.

Further Directions and Conclusion#

Learning Symmetries from Data#

Although we have described how to hard-code known symmetries into a network, an ongoing area of research is learning symmetries directly from data. For example, if you do not know the true transformation group that preserves your data distribution, you might try to discover it automatically. These approaches typically involve:

Generative modeling frameworks that decode latent variables in a way that unrolls transformations observed in the dataset.
Self-supervised learning tasks that guess transformation invariants, encouraging the model to learn or reveal the underlying group structure.

Extensions to Large-Scale Systems#

In big engineering problems—like climate modeling or nuclear fusion simulations—the synergy between data-driven learning and physics-based constraints holds enormous promise. The ability to handle multiscale phenomena while respecting physical invariances can substantially cut computational costs and provide breakthroughs in high-complexity scenarios.

Final Thoughts#

The pursuit of symmetries in neural networks, reminiscent of the search for symmetries in fundamental physics, seeks universality, efficiency, and truth. Models that intrinsically recognize physical laws or symmetrical structures can better generalize, require less data, and produce more trustworthy outputs in real-world applications.

In summary, the new wave of physics-inspired deep learning is about:

Incorporating known transformations (translations, rotations, reflections, permutations, gauge transformations) into network architectures.
Enforcing physical constraints (e.g., PDEs) in the training process to produce models that comply with real-world laws out of the box.
Building robust, generalized solutions that scale to complex, potentially high-dimensional systems.

By mastering these techniques, you’ll be poised to contribute to cutting-edge research, tackle industrial-scale scientific problems, and perhaps discover new symmetries or principles along the way. The synergy of deep learning and physics is still in its infancy, and we can only imagine what revolutions await as these fields unite even further.