Core Concepts#
This page introduces the foundational ideas behind masspcf: what piecewise constant functions are, how they are stored in tensors, and how the type system works.
Piecewise constant functions#
A piecewise constant function (PCF) is a function that takes a constant value on each of a finite number of intervals. For example, the function
is a PCF with three pieces.
In masspcf, a PCF is represented as an \(n \times 2\) array of (time, value) pairs, where each row gives a breakpoint time and the value the function takes starting at that time. The function above would be represented as:
[[0, 1],
[2, 3],
[5, 0]]
The value at each breakpoint is the value on the interval starting at that time and continuing until the next breakpoint (or until the end of the function’s domain).
Why PCFs?#
Many invariants in Topological Data Analysis (TDA) are naturally piecewise constant. Examples include:
Stable rank functions
Betti curves – the Betti number as a function of the filtration parameter
Euler characteristic curves – the Euler characteristic as a function of the filtration parameter
By representing these invariants as PCFs, masspcf enables efficient statistical analysis: computing means, distances, and norms over large collections of such functions, potentially leveraging GPU acceleration.
The Pcf class#
An individual PCF is represented by Pcf. You create one from a NumPy array or a list:
import numpy as np
import masspcf as mpcf
# From a NumPy array
data = np.array([[0.0, 1.0],
[2.0, 3.0],
[5.0, 0.0]], dtype=np.float32)
f = mpcf.Pcf(data)
# From a list (defaults to float32)
g = mpcf.Pcf([[0, 1], [2, 3], [5, 0]])
You can convert a Pcf back to a NumPy array with to_numpy():
arr = f.to_numpy() # shape (3, 2), dtype float32
Individual PCFs support arithmetic (+, -, *, /, **) with
other PCFs and with scalars:
f = mpcf.Pcf([[0, 4.0], [1, 9.0]])
g = f ** 0.5 # square root: values become [2.0, 3.0]
h = f * 2.0 # scale: values become [8.0, 18.0]
See Arithmetic & Comparisons for the full arithmetic reference, including broadcasting.
Iterating over rectangles#
Given two PCFs \(f\) and \(g\), iterate_rectangles() produces the list of intervals on which both functions are constant. Each interval is returned as a Rectangle with four properties: left and right (the time boundaries) and f_value and g_value (the values of each function on that interval).
This corresponds to the rectangle decomposition described in [1] and is useful for inspecting how two PCFs interact or for implementing custom integration-like operations:
import masspcf as mpcf
f = mpcf.Pcf([[0, 1.0], [2, 3.0], [5, 0.0]])
g = mpcf.Pcf([[0, 2.0], [3, 1.0], [5, 0.0]])
rects = mpcf.iterate_rectangles(f, g)
for r in rects:
print(f"[{r.left}, {r.right}): f={r.f_value}, g={r.g_value}")
# [0.0, 2.0): f=1.0, g=2.0
# [2.0, 3.0): f=3.0, g=2.0
# [3.0, 5.0): f=3.0, g=1.0
# [5.0, inf): f=0.0, g=0.0
Optional a and b parameters restrict the iteration to a sub-interval:
rects = mpcf.iterate_rectangles(f, g, a=1.0, b=4.0)
Note
iterate_rectangles is intended for exploration and prototyping. For
performance-critical workloads such as computing distances or norms over
large collections, use the dedicated functions (pdist(),
lp_norm(), etc.) which are implemented in optimized
C++/CUDA.
Tensors#
While you can work with individual Pcf objects, masspcf is designed for working with collections of PCFs. These collections are stored in tensors – multidimensional arrays, similar to NumPy’s ndarray.
A tensor can have any number of dimensions. For example:
A 1-D tensor of shape
(100,)holds 100 PCFs.A 2-D tensor of shape
(10, 50)holds 500 PCFs arranged in a 10-by-50 grid.
Creating tensors#
The primary way to create a tensor is with zeros():
import masspcf as mpcf
# A 1-D tensor of 100 "zero" PCFs (32-bit, the default)
X = mpcf.zeros((100,))
# A 2-D tensor of 64-bit PCFs
Y = mpcf.zeros((10, 50), dtype=mpcf.pcf64)
# A tensor of scalar floats
Z = mpcf.zeros((5, 5), dtype=mpcf.float32)
What “zero” means depends on the dtype: for PCF types, it is a function that is identically zero; for numeric types, it is the number 0; for point cloud types, it is an empty point cloud.
You can also generate random PCF tensors for experimentation:
from masspcf.random import noisy_sin, noisy_cos
# 200 noisy sin(2*pi*t) functions, each sampled at 100 time points
sines = noisy_sin((200,), n_points=100)
# A 2-D array: 10 x 50 noisy cosine functions
cosines = noisy_cos((10, 50), n_points=30)
Indexing and slicing#
Tensors support NumPy-style indexing and slicing:
X = mpcf.zeros((10, 5, 4))
# Single element -- returns a Pcf
f = X[3, 2, 1]
# Slicing -- returns a tensor (view)
row = X[3, :, :] # shape (5, 4)
sub = X[2:8, 1:, 2] # shape (6, 4)
You can also assign into tensors:
from masspcf.random import noisy_sin
A = mpcf.zeros((2, 10))
A[0, :] = noisy_sin((10,), n_points=100)
A[1, :] = noisy_sin((10,), n_points=50)
Boolean masks can select elements by condition:
import numpy as np
X = mpcf.FloatTensor(np.arange(12, dtype=np.float32).reshape(3, 4))
mask = mpcf.BoolTensor(np.array([True, False, True, False]))
X[:, mask] # shape (3, 2) — select columns where mask is True
X[X > threshold] # flat 1D — all elements matching the condition
See Indexing and Masking for full details on boolean masking.
Tensor types#
There are several concrete tensor types, each corresponding to a dtype:
Tensor class |
dtype |
Contents |
|---|---|---|
|
|
Piecewise constant functions |
|
|
Integer-valued piecewise constant functions |
|
|
Floating-point scalars |
|
|
Integer scalars |
|
|
Point clouds |
|
|
Persistence barcodes |
|
|
Symmetric matrices |
|
|
Distance matrices |
|
|
Boolean values (returned by comparison operators) |
You can construct tensor types directly from Python lists:
# Numeric tensors
X = mpcf.FloatTensor([1.0, 2.0, 3.0])
Y = mpcf.IntTensor([[1, 2], [3, 4]])
# Non-numeric tensors from lists of elements
f = mpcf.Pcf([[0, 1.0], [1, 2.0]])
g = mpcf.Pcf([[0, 3.0], [2, 4.0]])
T = mpcf.PcfTensor([f, g])
You can also use zeros() or functions like noisy_sin() that return the appropriate tensor type automatically. See Working with Tensors for all construction methods.
Evaluation#
Since PCFs represent functions, they can be evaluated by calling them as such. Pass a single time to get a single value, or an array of times to evaluate at many points at once:
f = mpcf.Pcf([[0, 1], [2, 3], [5, 0]])
f(1.0) # 1.0 (on the interval [0, 2))
f(3.5) # 3.0 (on the interval [2, 5))
PCF tensors are also callable – evaluating every element at the given time(s):
X = mpcf.zeros((3, 4), dtype=mpcf.pcf32)
# ... fill X with PCFs ...
X(1.5) # shape (3, 4) -- one value per PCF
X([0, 1, 5]) # shape (3, 4, 3) -- each PCF evaluated at 3 times
See Working with Tensors for full details on scalar, array, and tensor evaluation, including accepted input types and output shapes.
The dtype system#
The dtype parameter controls the element type of a tensor, analogous to NumPy’s dtype. masspcf defines the following dtypes in masspcf.typing (also re-exported from the top-level masspcf module):
dtype |
Precision |
Description |
|---|---|---|
|
float |
Piecewise constant functions ( |
|
int |
Integer-valued piecewise constant functions |
|
float |
Scalar floating-point values |
|
int |
Scalar integer values (signed and unsigned) |
|
float |
Point clouds |
|
float |
Persistence barcodes |
|
float |
Symmetric matrices — n(n+1)/2 storage |
|
float |
Distance matrices — n(n-1)/2 storage, zero diagonal, nonnegative |
PCF types#
pcf32– 32-bit floating-point piecewise constant functions (the default dtype)pcf64– 64-bit floating-point piecewise constant functionspcf32i– 32-bit integer piecewise constant functionspcf64i– 64-bit integer piecewise constant functions
Use pcf32 for most work. Use pcf64 when you need higher numerical precision.
pcf32i and pcf64i provide integer-valued PCFs. They support construction,
evaluation, arithmetic, and serialization, but not norms or distances.
Numeric types#
float32– 32-bit floating-point scalarsfloat64– 64-bit floating-point scalars
These are used for tensors that hold scalar values, such as the results of norm or distance computations.
int32– 32-bit signed integer scalarsint64– 64-bit signed integer scalarsuint32– 32-bit unsigned integer scalarsuint64– 64-bit unsigned integer scalars
These are used for tensors that hold integer values.
Point cloud types#
pcloud32– 32-bit point cloudspcloud64– 64-bit point clouds
Used when working with point cloud data, e.g., as input to persistent homology computations.
Barcode types#
barcode32– 32-bit persistence barcodesbarcode64– 64-bit persistence barcodes
Used to store persistence barcodes produced by homology computations.
Symmetric matrix types#
symmat32– 32-bit symmetric matricessymmat64– 64-bit symmetric matrices
Compressed symmetric matrices using lower-triangular storage (n*(n+1)/2 elements for an n×n matrix).
Distance matrix types#
distmat32– 32-bit distance matricesdistmat64– 64-bit distance matrices
Compressed distance matrices with implicit zero diagonal and nonnegative entries (n*(n-1)/2 elements for an n×n matrix).
Precision: 32-bit vs. 64-bit#
Each dtype family comes in 32-bit and 64-bit variants. The 32-bit variants use less memory and are faster, especially on GPUs where single-precision throughput is typically much higher. Use 64-bit variants when numerical precision is important for your application.
CPU and GPU execution#
masspcf automatically detects available NVIDIA GPUs and uses them for computations when beneficial. The library decides at runtime whether to execute a given operation on the CPU or GPU based on problem size. See GPU Acceleration for details on GPU detection, controlling execution, and performance considerations.