Topological features for PyTorch classifiers#

This tutorial shows how to turn point cloud data into fixed-size topological feature vectors using masspcf, and feed them into a PyTorch neural network for classification.

The pipeline looks like this:

Point clouds  -->  Persistent homology  -->  Functional summaries (PCFs)
                                                     |
                                                Evaluate on grid
                                                     |
                                                NumPy feature vectors
                                                     |
                                                torch.from_numpy()
                                                     |
                                                PyTorch model

The key insight is that functional summaries (stable ranks, Betti curves, APFs) are piecewise constant functions, and evaluating them on a fixed grid of time points produces ordinary numeric vectors that any neural network can consume.

Step 1: Generate data#

We create two classes of point clouds with different topological structure:

  • Class 0: points sampled uniformly from a filled disk (no persistent loop)

  • Class 1: points sampled from an annulus (one persistent H1 loop)

[1]:
import numpy as np
import masspcf as mpcf
from masspcf import persistence as mpers

rng = np.random.default_rng(42)
n_samples = 200
n_points = 60

def sample_disk(n, rng):
    """Uniform samples from a filled unit disk."""
    r = np.sqrt(rng.uniform(0, 1, n))
    theta = rng.uniform(0, 2 * np.pi, n)
    return np.column_stack([r * np.cos(theta), r * np.sin(theta)]).astype(np.float32)

def sample_annulus(n, rng):
    """Uniform samples from an annulus (r in [0.8, 1.2])."""
    r = np.sqrt(rng.uniform(0.8**2, 1.2**2, n))
    theta = rng.uniform(0, 2 * np.pi, n)
    return np.column_stack([r * np.cos(theta), r * np.sin(theta)]).astype(np.float32)

pclouds = mpcf.zeros((n_samples,), dtype=mpcf.pcloud32)
labels = np.zeros(n_samples, dtype=np.int64)

for i in range(n_samples):
    if i < n_samples // 2:
        pclouds[i] = sample_disk(n_points, rng)
        labels[i] = 0
    else:
        pclouds[i] = sample_annulus(n_points, rng)
        labels[i] = 1

print(f"Point clouds: {pclouds.shape}, Labels: {labels.shape}")
Point clouds: Shape(200), Labels: (200,)

Let’s visualize one example from each class:

[2]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(8, 3.5))
for ax, idx, title in zip(axes, [0, n_samples // 2], ["Disk (class 0)", "Annulus (class 1)"]):
    pts = pclouds[idx].to_numpy() if hasattr(pclouds[idx], 'to_numpy') else np.array(pclouds[idx])
    ax.scatter(pts[:, 0], pts[:, 1], s=10, alpha=0.7)
    ax.set_aspect("equal")
    ax.set_title(title)
plt.tight_layout()
plt.show()
../_images/tutorial_notebooks_pytorch_tda_classifier_4_0.png

Step 2: Compute topological features#

We compute persistent homology up to dimension 1, convert the barcodes to stable rank PCFs, and then evaluate them on a fixed grid to get feature vectors.

[3]:
# Persistent homology (H0 and H1)
barcodes = mpers.compute_persistent_homology(pclouds, max_dim=1)
print(f"Barcodes shape: {barcodes.shape}")  # (200, 2)

# Convert to stable rank PCFs
sranks = mpers.barcode_to_stable_rank(barcodes)
print(f"Stable ranks shape: {sranks.shape}")  # (200, 2)
Barcodes shape: Shape(200, 2)
Stable ranks shape: Shape(200, 2)
[4]:
# Evaluate H1 stable ranks on a grid to get fixed-size feature vectors
grid = np.linspace(0, 2.0, 50, dtype=np.float32)
features = sranks[:, 1](grid)   # shape (200, 50)
print(f"Feature matrix shape: {features.shape}")
Feature matrix shape: (200, 50)

Let’s visualize the H1 stable ranks for each class. The annulus class should show more persistent H1 features (loops).

[5]:
from masspcf.plotting import plot as plotpcf

fig, axes = plt.subplots(1, 2, figsize=(10, 3.5), sharey=True)

half = n_samples // 2
plotpcf(sranks[:half, 1], ax=axes[0], alpha=0.15, color="C0")
axes[0].set_title("H1 stable ranks: Disk (class 0)")
axes[0].set_xlabel("Threshold")

plotpcf(sranks[half:, 1], ax=axes[1], alpha=0.15, color="C1")
axes[1].set_title("H1 stable ranks: Annulus (class 1)")
axes[1].set_xlabel("Threshold")

plt.tight_layout()
plt.show()
../_images/tutorial_notebooks_pytorch_tda_classifier_9_0.png

Step 3: Train a PyTorch classifier#

Now we convert the features to PyTorch tensors and train a simple feedforward network.

[6]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

# Convert to PyTorch tensors
X = torch.from_numpy(features)
y = torch.from_numpy(labels)

# Train/test split
n_train = 160
X_train, X_test = X[:n_train], X[n_train:]
y_train, y_test = y[:n_train], y[n_train:]

train_loader = DataLoader(
    TensorDataset(X_train, y_train),
    batch_size=32, shuffle=True
)
[7]:
# A small feedforward network
model = nn.Sequential(
    nn.Linear(50, 32),
    nn.ReLU(),
    nn.Linear(32, 2),
)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

# Training loop
for epoch in range(50):
    for X_batch, y_batch in train_loader:
        loss = loss_fn(model(X_batch), y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    if (epoch + 1) % 10 == 0:
        with torch.no_grad():
            train_acc = (model(X_train).argmax(1) == y_train).float().mean()
            print(f"Epoch {epoch+1:3d}  loss={loss.item():.4f}  train_acc={train_acc:.0%}")
Epoch  10  loss=0.1324  train_acc=100%
Epoch  20  loss=0.0457  train_acc=100%
Epoch  30  loss=0.0155  train_acc=100%
Epoch  40  loss=0.0119  train_acc=100%
Epoch  50  loss=0.0102  train_acc=100%
[8]:
# Evaluate on test set
with torch.no_grad():
    preds = model(X_test).argmax(dim=1)
    acc = (preds == y_test).float().mean().item()
    print(f"Test accuracy: {acc:.0%}")
Test accuracy: 100%

Because the annulus has a prominent H1 feature (a persistent loop) that the filled disk lacks, the stable rank vectors for the two classes are quite distinct, and even a small network can learn to separate them.

Variations#

Different summaries#

Replace barcode_to_stable_rank with barcode_to_betti_curve or barcode_to_accumulated_persistence for alternative representations. You can also concatenate features from multiple summaries or homology dimensions.

Kernel methods#

Instead of vectorizing, use mpcf.l2_kernel to compute a kernel matrix directly from the PCFs, then convert to PyTorch. This is useful with kernel-based models or as a precomputed similarity matrix for contrastive learning.

[9]:
K = mpcf.l2_kernel(sranks[:, 1]).to_dense()
K_torch = torch.from_numpy(K)
print(f"Kernel matrix shape: {K_torch.shape}")
Kernel matrix shape: torch.Size([200, 200])

Distance matrices#

Similarly, mpcf.pdist gives pairwise distances that can be used for metric learning or as input to distance-based classifiers.

[10]:
D = mpcf.pdist(sranks[:, 1]).to_dense()
D_torch = torch.from_numpy(D)
print(f"Distance matrix shape: {D_torch.shape}")
Distance matrix shape: torch.Size([200, 200])