masspcf#

Core library for piecewise constant functions, tensors, and computations.

pcf#

class masspcf.functional.pcf.Pcf(arr: ndarray | Pcf | list[list[float | int] | tuple[float | int, ...]], dtype=None)#

Bases: object

A piecewise constant function (PCF).

A PCF is defined by a sequence of (time, value) pairs \((t_0, v_0), (t_1, v_1), \ldots, (t_{n-1}, v_{n-1})\) with \(t_0 = 0\) and \(t_0 < t_1 < \cdots < t_{n-1}\). The function takes the value \(v_i\) on the interval \([t_i, t_{i+1})\) for \(0 \leq i < n-1\), and \(v_{n-1}\) on \([t_{n-1}, \infty)\).

The first breakpoint must have \(t_0 = 0\). An InvalidArgument error is raised if the first time coordinate is not zero.

Parameters:
  • arr (numpy.ndarray or Pcf or list) – Input data. If an ndarray or list, should have shape (n, 2) where each row is a (time, value) pair. Can also be an existing Pcf to copy.

  • dtype (type, optional) – Data type for the PCF (pcf32, pcf64, pcf32i, or pcf64i). If None, the dtype is inferred from the input array (e.g. a numpy.float32 array produces a 32-bit PCF, a numpy.int32 array produces a 32-bit integer PCF).

Examples

>>> import numpy as np
>>> import masspcf as mpcf
>>> f = mpcf.Pcf(np.array([[0.0, 1.0], [1.0, 2.0], [3.0, 0.0]], dtype=np.float32))
>>> f.size
3
astype(dtype)#

Return a copy of the PCF cast to the given dtype (pcf32, pcf64, pcf32i, or pcf64i).

property size#

Number of breakpoints (time-value pairs) in this PCF.

to_numpy()#

Convert the PCF to a numpy array of shape (n, 2) with (time, value) rows.

class masspcf.functional.pcf.Rectangle(data)#

Bases: object

A rectangle produced by iterating over a pair of PCFs.

property f_value#

Value of the first PCF on this interval.

property g_value#

Value of the second PCF on this interval.

property left#

Left time boundary.

property right#

Right time boundary.

masspcf.functional.pcf.iterate_rectangles(f: Pcf, g: Pcf, a=0.0, b=inf)#

Iterate over the rectangles formed by two PCFs.

Parameters:
  • f (Pcf) – The two piecewise constant functions.

  • g (Pcf) – The two piecewise constant functions.

  • a (float, optional) – Left integration bound (default 0).

  • b (float, optional) – Right integration bound (default infinity).

Returns:

Rectangles in chronological order.

Return type:

list[Rectangle]

tensor#

class masspcf.tensor.BoolTensor(data: _Mock())#

Bases: Tensor

Tensor of boolean values, typically produced by elementwise comparisons.

class masspcf.tensor.FloatTensor(data: _Mock(), dtype=None)#

Bases: NumericTensor

class masspcf.tensor.IntPcfTensor(data)#

Bases: _PcfTensorBase

class masspcf.tensor.IntTensor(data: _Mock(), dtype=None)#

Bases: NumericTensor

class masspcf.tensor.NumericTensor#

Bases: Tensor, ArithmeticTensorMixin

array_equal(other) bool#

Test whether two tensors have the same shape and all equal elements.

Parameters:

rhs (Tensor) – The tensor to compare with.

Returns:

True if the tensors are elementwise equal, False otherwise.

Return type:

bool

class masspcf.tensor.PcfTensor(data)#

Bases: _PcfTensorBase

class masspcf.tensor.PointCloudTensor(data: _Mock())#

Bases: Tensor

tensor_create#

masspcf.tensor_create.array_split(tensor, indices_or_sections, axis=0)#

Split a tensor into sub-tensors, allowing uneven splits.

Like split, but when indices_or_sections is an integer and the axis size is not evenly divisible, the first sections are one element larger.

Parameters:
  • tensor (Tensor) – The tensor to split.

  • indices_or_sections (int or list of int) – If an int, the tensor is split into that many parts (uneven allowed). If a list, it gives the indices where splits occur (same as split).

  • axis (int) – The axis along which to split (default 0).

Returns:

A list of tensor views sharing data with the original.

Return type:

list of Tensor

See also

split

Split requiring equal divisions.

masspcf.tensor_create.concatenate(tensors, axis=0)#

Concatenate tensors along an existing axis (outer indexing).

masspcf.tensor_create.split(tensor, indices_or_sections, axis=0)#

Split a tensor into sub-tensors along an axis.

Parameters:
  • tensor (Tensor) – The tensor to split.

  • indices_or_sections (int or list of int) – If an int, the tensor is split into that many equal parts. If a list, it gives the indices where splits occur.

  • axis (int) – The axis along which to split (default 0).

Returns:

A list of tensor views sharing data with the original.

Return type:

list of Tensor

See also

array_split

Split allowing uneven divisions.

masspcf.tensor_create.stack(tensors, axis=0)#

Stack tensors along a new axis. All tensors must have the same shape.

masspcf.tensor_create.zeros(shape: _Mock(), dtype: dtype = masspcf.pcf32)#

Creates a new Tensor of the specified shape and dtype whose entries are “zero.” What “zero” means depends on the dtype:

dtype=pcf32/64: A PCF that takes the value 0 for all times. dtype=pcf32i/64i: An integer PCF that takes the value 0 for all times. dtype=float32/float64: The number 0. dtype=pcloud32/64: An empty point cloud. dtype=barcode32/64: An empty barcode. dtype=symmat32/64: A 0×0 symmetric matrix. dtype=distmat32/64: A 0×0 distance matrix.

Parameters:
  • shape (ShapeLike) – Shape of the returned tensor

  • dtype – The data type of the elements

Returns:

The newly created tensor

Return type:

Tensor

reductions#

masspcf.reductions.max_time(fs: Tensor | list[Pcf] | Pcf, dim: int = 0)#

Compute the maximum breakpoint time along the given dimension.

For each PCF \(f_i\) with breakpoints \((t_0^{(i)}, t_1^{(i)}, \ldots, t_{n_i-1}^{(i)})\), let \(T_i = t_{n_i-1}^{(i)}\) be the last breakpoint. For functions \(f_1, f_2, \ldots, f_n\) being reduced, this returns

\[\max(T_1, T_2, \ldots, T_n).\]

The result is numeric, not a PCF.

See How dim works for a detailed explanation of dimension reduction semantics.

Parameters:
  • fs (PcfContainerLike) – A PcfTensor with dtype pcf32 or pcf64.

  • dim (int, optional) – Dimension along which to reduce, by default 0.

Returns:

A numeric tensor with the reduced dimension removed.

Return type:

FloatTensor

masspcf.reductions.mean(fs: Tensor | list[Pcf] | Pcf, dim: int = 0)#

Compute the pointwise mean of a PCF tensor along the given dimension.

The mean is computed pointwise in time: for functions \(f_1, f_2, \ldots, f_n\) being reduced, the resulting function \(\bar{f}\) satisfies

\[\bar{f}(t) = \frac{1}{n} \sum_{i=1}^{n} f_i(t)\]

for all \(t\).

See How dim works for a detailed explanation of dimension reduction semantics.

Parameters:
  • fs (PcfContainerLike) – A PcfTensor with dtype pcf32 or pcf64.

  • dim (int, optional) – Dimension along which to reduce, by default 0.

Returns:

A PcfTensor with the reduced dimension removed.

Return type:

PcfTensor

distance#

masspcf.distance.cdist(X: Tensor | list[Pcf] | Pcf, Y: Tensor | list[Pcf] | Pcf, p=1, verbose=False) FloatTensor#

Compute the pairwise \(L_p\) distances between two tensors of PCFs.

For tensors \(X\) of shape \((m_1, \ldots, m_n)\) and \(Y\) of shape \((k_1, \ldots, k_l)\), returns a tensor of shape \((m_1, \ldots, m_n, k_1, \ldots, k_l)\) where

\[D_{i_1, \ldots, i_n, j_1, \ldots, j_l} = \Vert X_{i_1, \ldots, i_n} - Y_{j_1, \ldots, j_l} \Vert_p.\]
Parameters:
  • X (PcfContainerLike) – A tensor of PCFs (any shape).

  • Y (PcfContainerLike) – A tensor of PCFs (any shape, same dtype as X).

  • p (float, optional) – The \(p\) parameter in the \(L_p\) distance (must be \(\geq 1\)), by default 1.

  • verbose (bool, optional) – Show progress information during computation, by default False.

Returns:

A tensor of shape (*X.shape, *Y.shape) containing pairwise distances.

Return type:

FloatTensor

masspcf.distance.lp_distance(f: Pcf, g: Pcf, p=1) float#

Compute the \(L_p\) distance between two PCFs.

\[\Vert f - g \Vert_p = \left(\int_0^\infty |f(t) - g(t)|^p\, dt\right)^{1/p}\]
Parameters:
  • f (Pcf) – First piecewise constant function.

  • g (Pcf) – Second piecewise constant function.

  • p (float, optional) – The \(p\) parameter in the \(L_p\) distance (must be \(\geq 1\)), by default 1.

Returns:

The \(L_p\) distance between f and g.

Return type:

float

Raises:
  • ValueError – If p < 1.

  • TypeError – If f and g have different dtypes, or are integer PCFs.

masspcf.distance.pdist(fs: Tensor | list[Pcf] | Pcf, p=1, verbose=False) DistanceMatrix#

Compute the pairwise \(L_p\) distance matrix for a 1-D tensor of PCFs.

For a tensor \((f_0, f_1, \ldots, f_{n-1})\), returns an \(n \times n\) matrix \(D\) where

\[D_{ij} = \Vert f_i - f_j \Vert_p.\]
Parameters:
  • fs (PcfContainerLike) – A 1-D tensor of PCFs.

  • p (float, optional) – The \(p\) parameter in the \(L_p\) distance (must be \(\geq 1\)), by default 1.

  • verbose (bool, optional) – Show progress information during computation, by default False.

Returns:

A compressed symmetric distance matrix.

Return type:

DistanceMatrix

Raises:

ValueError – If fs is not 1-dimensional.

symmetric_matrix#

class masspcf.symmetric_matrix.SymmetricMatrix(n_or_data: int | SymmetricMatrix | CppSymmetricMatrix, dtype: float32 | float64 | None = None)#

Bases: object

Compressed symmetric matrix using lower-triangular storage.

Stores only n*(n+1)/2 elements for an n×n symmetric matrix. Supports subscript access with matrix[i, j].

Parameters:
  • n_or_data (int | SymmetricMatrix | CppSymmetricMatrix) – If an int, creates a zero-initialized matrix of that size. If a SymmetricMatrix or C++ symmetric matrix, wraps it directly.

  • dtype (float32 | float64 | None, optional) – Element precision. float32 stores entries as 32-bit floats, float64 as 64-bit floats. Defaults to float64 when n_or_data is an int. Ignored otherwise.

property dtype#

Element precision (float32 or float64).

classmethod from_dense(array)#

Create a SymmetricMatrix from a dense n×n numpy array.

property size: int#
property storage_count: int#
to_dense() ndarray#

Return the full n×n symmetric matrix as a numpy array.

class masspcf.symmetric_matrix.SymmetricMatrixTensor(data: _Mock())#

Bases: Tensor

norms#

masspcf.norms.lp_norm(fs: Tensor | list[Pcf] | Pcf, p=1, verbose=False) FloatTensor#

Computes the \(L_p\) norm of each PCF in fs. For example, if fs is an \(m \times n\) array with elements indexed as \(f_{ij}\), \(0 \leq i < m, 0 \leq j < n\), we compute

\[\begin{split}\begin{pmatrix} \Vert f_{11} \Vert_p & \Vert f_{12} \Vert_p & \cdots & \Vert f_{1n} \Vert_p \\ \Vert f_{21} \Vert_p & \Vert f_{22} \Vert_p & \cdots & \Vert f_{2n} \Vert_p \\ \vdots & \vdots & \ddots & \vdots & \\ \Vert f_{m1} \Vert_p & \Vert f_{m2} \Vert_p & \cdots & \Vert f_{mn} \Vert_p \\ \end{pmatrix},\end{split}\]

where

\[\Vert f_{ij} \Vert_p = \left(\int_0^\infty |f_i(t)|^p\, dt\right)^{1/p}.\]
Parameters:
  • fs (PcfContainerLike) – PCFs whose norms are to be computed.

  • p (int, optional) – \(p\) parameter in the \(L_p\) norm, by default 1

  • verbose (bool, optional) – Print additional information during the computation, by default False

Returns:

Tensor of the same shape as fs with \(L_p\) norms of the input functions.

Return type:

FloatTensor

inner_product#

masspcf.inner_product.l2_kernel(fs: Tensor | list[Pcf] | Pcf, verbose=False) SymmetricMatrix#

Compute the pairwise \(L_2\) kernel matrix for a 1-D tensor of PCFs.

For a tensor \((f_0, f_1, \ldots, f_{n-1})\), returns an \(n \times n\) matrix \(K\) where

\[K_{ij} = \langle f_i, f_j \rangle_{L_2} = \int_0^\infty f_i(t) \, f_j(t) \, dt.\]
Parameters:
  • fs (PcfContainerLike) – A 1-D tensor of PCFs.

  • verbose (bool, optional) – Show progress information during computation, by default False.

Returns:

A compressed symmetric kernel matrix.

Return type:

SymmetricMatrix

Raises:

ValueError – If fs is not 1-dimensional.

comparison#

masspcf.comparison.allclose(a, b, atol=1e-08, rtol=1e-05) bool#

Test whether two objects are element-wise equal within a tolerance.

Returns True when, for every pair of corresponding elements \(a_i\) and \(b_i\),

\[|a_i - b_i| \leq \texttt{atol} + \texttt{rtol} \cdot |b_i|.\]
Parameters:
  • a (FloatTensor | DistanceMatrix | SymmetricMatrix) – First object.

  • b (FloatTensor | DistanceMatrix | SymmetricMatrix) – Second object (must be the same type as a).

  • atol (float, optional) – Absolute tolerance, by default 1e-8.

  • rtol (float, optional) – Relative tolerance, by default 1e-5.

Return type:

bool

Raises:

TypeError – If the inputs are not a supported type or are not the same type.

io#

masspcf.io.load(file)#

Load a tensor or object from a file in masspcf’s binary format.

The returned item will have the same type and dtype as what was saved.

Parameters:

file (str or file-like) – A file path or an open file object in binary read mode.

Returns:

The loaded item.

Return type:

Tensor or Pcf or Barcode or DistanceMatrix or SymmetricMatrix

masspcf.io.save(item, file)#

Save a tensor or object to a file in masspcf’s binary format.

All tensor types and standalone objects (Pcf, Barcode, DistanceMatrix, SymmetricMatrix) are supported.

Parameters:
  • item (Tensor or Pcf or Barcode or DistanceMatrix or SymmetricMatrix) – The item to save.

  • file (str or file-like) – A file path or an open file object in binary write mode.

serialize#

masspcf.serialize.from_serial_content(content: ndarray, enumeration: ndarray, dtype=None) PcfTensor#

Creates a Tensor of PCFs from serial numpy data.

Parameters:
  • content (np.ndarray) – (m, 2) array of points, where m is the sum of lengths of the individual PCFs

  • enumeration (np.ndarray) – (n_1, n_2, ..., n_k, 2) array of (start, end) pointers into the content array.

  • dtype (datatype) – Sets the dtype of the resulting PCF Array. If None, uses the dtype of the supplied content array. By default, None.

Returns:

PcfTensor of shape (n_1, n_2, ..., n_k), where element (i_1, i_2, ..., i_k) is a Pcf with points content[start, stop] with start=enumeration[i_1,...,i_k, 0] and stop=enumeration[i_1,...,i_k, 1].

Return type:

PcfTensor

plotting#

masspcf.plotting.plot(f: Tensor | list[Pcf] | Pcf, fmt='', ax=None, auto_label=False, max_time=None, **kwargs)#

Plot one or more PCFs using matplotlib’s step function.

Parameters:
  • f (PcfContainerLike) – A single Pcf or a 1-D PcfTensor.

  • fmt (str, optional) – A matplotlib format string (e.g. 'r--'), by default ''.

  • ax (matplotlib axes, optional) – Axes to plot on. If None, uses matplotlib.pyplot directly.

  • auto_label (bool, optional) – If True and f is a tensor, label each PCF as f0, f1, etc. By default False.

  • max_time (float, optional) – Extend the plot so the final constant segment reaches this time. If None, single PCFs are not extended and tensors extend to the latest breakpoint across all elements.

  • **kwargs – Additional keyword arguments passed to matplotlib.pyplot.step (e.g. color, linewidth, alpha, label).

Raises:

ValueError – If f is a tensor with more than one dimension.

masspcf.plotting.plot_barcode(bc, ax=None, y_offset=0, **kwargs)#

Plot a persistence barcode as horizontal line segments.

Each bar is drawn as a horizontal segment from birth to death. Bars with infinite death are drawn as arrows extending to the right edge of the plot.

Parameters:
  • bc (Barcode or BarcodeTensor) – A single Barcode or a 1-D BarcodeTensor. For a tensor, the barcodes are stacked vertically in order.

  • ax (matplotlib axes, optional) – Axes to plot on. If None, uses matplotlib.pyplot directly.

  • y_offset (int, optional) – Starting y position for the first bar. Useful when stacking multiple barcodes on the same axes.

  • **kwargs – Additional keyword arguments passed to matplotlib.collections.LineCollection (e.g. color, linewidth, alpha, label).

Returns:

The next available y position (for stacking).

Return type:

int

random#

class masspcf.random.Generator(seed=None)#

Bases: object

Seedable random number generator for masspcf.

Parameters:

seed (int, optional) – Seed for deterministic generation. If None, a non-deterministic seed is used.

seed(seed)#

Re-seed the generator.

masspcf.random.noisy_cos(shape, n_points=20, dtype=masspcf.pcf32, generator=None)#

Generate a tensor of noisy \(\cos(2\pi t)\) PCFs.

Each generated PCF has the form

\[f(t) = \cos(2\pi t) + \varepsilon(t)\]

where \(\varepsilon(t) \sim \mathcal{N}(0, 0.1)\) is sampled independently at each breakpoint. The breakpoints are drawn uniformly from \([0, 1]\) and sorted, with the first breakpoint fixed at \(t = 0\) and the last value set to \(0\).

Parameters:
  • shape (tuple of int) – Shape of the output tensor.

  • n_points (int, optional) – Number of breakpoints per PCF, by default 20.

  • dtype (type, optional) – pcf32 or pcf64, by default pcf32.

  • generator (Generator, optional) – Random number generator. If None, the global generator is used.

Returns:

Tensor of noisy cosine PCFs with the given shape.

Return type:

PcfTensor

masspcf.random.noisy_sin(shape, n_points=20, dtype=masspcf.pcf32, generator=None)#

Generate a tensor of noisy \(\sin(2\pi t)\) PCFs.

Each generated PCF has the form

\[f(t) = \sin(2\pi t) + \varepsilon(t)\]

where \(\varepsilon(t) \sim \mathcal{N}(0, 0.1)\) is sampled independently at each breakpoint. The breakpoints are drawn uniformly from \([0, 1]\) and sorted, with the first breakpoint fixed at \(t = 0\) and the last value set to \(0\).

Parameters:
  • shape (tuple of int) – Shape of the output tensor.

  • n_points (int, optional) – Number of breakpoints per PCF, by default 20.

  • dtype (type, optional) – pcf32 or pcf64, by default pcf32.

  • generator (Generator, optional) – Random number generator. If None, the global generator is used.

Returns:

Tensor of noisy sine PCFs with the given shape.

Return type:

PcfTensor

masspcf.random.seed(s)#

Seed the global random number generator.

Parameters:

s (int) – Seed value.

system#

The masspcf.system module provides access to system-wide library settings. Note that these settings are per session and must be reconfigured for each Python kernel run.

Most users should not need to make any changes but we do provide the capability for advanced/expert users. No core functionality in the package requires manual modification of any of these options.

masspcf.system.build_type() str#

Return the build type of the masspcf backend.

Returns:

"CUDA" if built with GPU support, "CPU" otherwise.

Return type:

str

masspcf.system.force_cpu(on: bool)#

Set forced execution on CPU. By default, execution may happen on either CPU or GPU (if using a GPU-enabled build of masspcf).

Parameters:

on (bool) – If True, force execution on CPU for all operations. If False, execution may happen on either CPU or GPU (if using a GPU-enabled build of masspcf).

masspcf.system.get_parallel_eval_threshold() int#

Return the current parallel evaluation threshold.

masspcf.system.limit_cpus(n: int)#

Sets the upper limit on the number of CPU threads that can be used for computations.

Typically, the default corresponding to the number of hardware CPU threads is a good choice but it can be warranted to limit the number of threads in, e.g., multi-user environments. For normal use, we recommend using the default.

Parameters:

n (int) – Number of CPU threads to use

masspcf.system.limit_gpus(n: int)#

Sets the number of GPUs that can be used by masspcf. By default, all available GPUs are used.

This option only has an effect if masspcf is compiled with GPU support.

Parameters:

n (int) – Number of GPUs to use

masspcf.system.set_block_size(x: int, y: int)#

Set CUDA block size for (GPU) matrix computations. This is an advanced option that should only be modified by expert users.

Parameters:
  • x (int) – Horizontal block size

  • y (int) – Vertical block size

masspcf.system.set_cuda_threshold(n: int)#

Sets how many PCFs are required in a matrix computation before computations are moved from CPU to GPU. By default, the threshold is set to 500 PCFs.

Parameters:

n (int) – Number of PCFs required before (supported) matrix computations are moved to GPU

masspcf.system.set_device_verbose(on: bool)#

Enable verbose device output. In this mode, when operations that may occur on GPU are invoked, a message is logged stating whether the operation will be performed on CPU or GPU.

Parameters:

on (bool) – Enable verbose device logging

masspcf.system.set_min_block_side(n: int)#

Set the minimum block side length for the CUDA block scheduler.

This controls the minimum number of threads per GPU kernel launch, ensuring good GPU occupancy. A value of 0 (the default) auto-detects from the GPU hardware (SM count), targeting ~50% max occupancy.

This is an advanced option that should only be modified by expert users.

Parameters:

n (int) – Minimum block side length. 0 = auto-detect from GPU hardware.

masspcf.system.set_parallel_eval_threshold(n: int)#

Set the minimum tensor size for parallel tensor evaluation.

When a tensor has at least n elements, tensor_eval distributes the work across threads. Below this threshold evaluation is sequential. The default is 500.

Parameters:

n (int) – Minimum number of elements to trigger parallel evaluation.

gpu#

Detect CUDA-capable NVIDIA GPUs without requiring CUDA libraries.

Uses the C++ _gpu_detect module (direct OS API calls) when available, falling back to a pure-Python implementation using subprocess.

masspcf.gpu.detect_nvidia_gpus()#

Detect NVIDIA GPUs present on the system.

Uses OS-level tools (lspci, sysfs, PowerShell, system_profiler). Does not require CUDA or any NVIDIA drivers/libraries.

Returns:

A list of dicts, each with a "name" key describing the GPU. An empty list means no NVIDIA GPUs were found.

Return type:

list[dict]

masspcf.gpu.has_nvidia_gpu()#

Check whether the system has at least one NVIDIA GPU.

Returns:

True if at least one NVIDIA GPU is detected.

Return type:

bool

masspcf.gpu.nvidia_gpu_count()#

Return the number of NVIDIA GPUs detected.

Returns:

Number of NVIDIA GPUs found.

Return type:

int

typing#

masspcf.typing.Dtype#

alias of dtype

class masspcf.typing.dtype(name: str, doc: str = '')#

Describes the element type of a masspcf tensor.

Each dtype is a singleton instance (e.g. masspcf.pcf32, masspcf.float64). Use isinstance(x, masspcf.dtype) to check whether a value is a masspcf dtype.

property name: str#

Short name of this dtype (e.g. 'pcf32').