====================== Working with Tensors ====================== This guide covers the practical details of creating, manipulating, and persisting tensors in masspcf. Creating tensors ================ Using zeros ----------- The most common way to create a tensor is :py:func:`~masspcf.zeros`, which allocates a tensor of a given shape filled with "zero" elements:: import masspcf as mpcf # 1-D tensor of 100 PCFs (32-bit, the default) X = mpcf.zeros((100,)) # 3-D tensor of 64-bit PCFs Y = mpcf.zeros((4, 10, 25), dtype=mpcf.pcf64) # Scalar float tensor Z = mpcf.zeros((5, 5), dtype=mpcf.float64) For PCF dtypes, "zero" is a function that is identically zero. For numeric dtypes, it is the number 0. For point cloud dtypes, it is an empty point cloud. Generating random data ----------------------- For quick experimentation, :py:mod:`masspcf.random` provides functions that generate tensors of noisy trigonometric PCFs:: from masspcf.random import noisy_sin, noisy_cos # 200 noisy sin(2*pi*t) functions, each with 100 breakpoints sines = noisy_sin((200,), n_points=100) # 2-D: 10 x 50 noisy cosine functions with 30 breakpoints each cosines = noisy_cos((10, 50), n_points=30) These functions return ``PcfTensor`` by default. Pass ``dtype=mpcf.pcf64`` for 64-bit. From lists ---------- All tensor types can be constructed directly from Python lists or tuples:: import masspcf as mpcf # Numeric tensors — same as wrapping in np.array() X = mpcf.FloatTensor([1.0, 2.0, 3.0]) Y = mpcf.IntTensor([[1, 2], [3, 4]]) Z = mpcf.BoolTensor([True, False, True]) For non-numeric tensors, nested lists define the shape:: f = mpcf.Pcf([[0, 1.0], [1, 2.0]]) g = mpcf.Pcf([[0, 3.0], [2, 4.0]]) # 1-D tensor t = mpcf.PcfTensor([f, g]) # shape (2,) # 2-D tensor from nested lists t2 = mpcf.PcfTensor([[f, g], [g, f]]) # shape (2, 2) This works the same way for ``IntPcfTensor`` and ``BarcodeTensor``. The precision (32- or 64-bit) is inferred from the elements. An empty list produces a shape ``(0,)`` tensor. From serialized NumPy data --------------------------- :py:func:`~masspcf.from_serial_content` constructs a tensor from PCF data already stored in NumPy arrays — a flat content array and an enumeration array that describes how to split it:: import numpy as np import masspcf as mpcf # Three PCFs packed into a single content array content = np.array([ [0.0, 2.5], [1.5, 1.2], [3.14, 0.0], # PCF 0 (3 points) [0.0, 7.0], [3.8, 5.5], [4.5, 1.5], [7.0, 0.0], # PCF 1 (4 points) [0.0, 3.0], [2.0, 0.0], # PCF 2 (2 points) ]) # Each row gives (start, end) indices into content enumeration = np.array([[0, 3], [3, 7], [7, 9]]) F = mpcf.from_serial_content(content, enumeration) # F is a PcfTensor of shape (3,) The enumeration array can be multidimensional. If it has shape ``(n1, n2, ..., nk, 2)``, the resulting tensor has shape ``(n1, n2, ..., nk)``. Shape and copying ================= Every tensor has a :py:attr:`shape` property, along with ``ndim``, ``size``, and ``len()`` — matching the NumPy interface:: X = mpcf.zeros((10, 5, 4)) X.shape # (10, 5, 4) X.ndim # 3 X.size # 200 len(X) # 10 (first axis) To create an independent copy (not a view):: Y = X.copy() To collapse all dimensions into one:: flat = X.flatten() # shape (200,) To change the shape without changing the data, use ``reshape``. One dimension may be ``-1`` to infer its size:: X = mpcf.FloatTensor(np.arange(12, dtype=np.float32)) X.reshape((3, 4)) # shape (3, 4) X.reshape((2, -1)) # shape (2, 6) — inferred For contiguous tensors, ``reshape`` returns a view (shared data). For non-contiguous tensors (e.g. from slicing with a step), it copies first. To reverse the order of axes, use the ``.T`` property. For finer control, ``transpose`` accepts an explicit axis permutation:: A = mpcf.FloatTensor(np.arange(12, dtype=np.float32).reshape(3, 4)) A.T # shape (4, 3) A.transpose((1, 0)) # same as .T for 2-D B = mpcf.FloatTensor(np.arange(24, dtype=np.float32).reshape(2, 3, 4)) B.transpose((2, 0, 1)) # shape (4, 2, 3) Transpose always returns a view. To swap exactly two axes, use ``swapaxes``:: C = mpcf.FloatTensor(np.arange(24, dtype=np.float32).reshape(2, 3, 4)) C.swapaxes(0, 2) # shape (4, 3, 2) C.swapaxes(-1, -3) # same — negative indices count from the last axis To remove size-1 dimensions, use ``squeeze``. With no argument it removes all of them; with an axis argument it removes only that one:: X = mpcf.FloatTensor(np.arange(6, dtype=np.float32).reshape(1, 6, 1)) X.squeeze() # shape (6,) X.squeeze(0) # shape (6, 1) Squeeze always returns a view. Squeezing an axis whose size is not 1 raises ``ValueError``. The inverse operation, ``expand_dims``, inserts a size-1 dimension at the given position (negative indexing supported):: Y = mpcf.FloatTensor(np.arange(6, dtype=np.float32)) Y.expand_dims(0) # shape (1, 6) Y.expand_dims(-1) # shape (6, 1) Expand dims also returns a view. Type casting ============ ``astype`` converts a tensor to a different dtype. Same-family precision changes (e.g. float32 to float64) and numeric cross-family casts (e.g. int to float) are supported:: X = mpcf.FloatTensor(np.array([1.5, 2.5, 3.5], dtype=np.float32)) X.astype(mpcf.float64) # FloatTensor, float64 X.astype(mpcf.int32) # IntTensor, int32 (truncates) PCF and point cloud tensors support precision changes within their family:: F = mpcf.zeros((5,), dtype=mpcf.pcf32) F.astype(mpcf.pcf64) # PcfTensor, pcf64 ``astype`` always returns a new tensor (copy). Joining tensors =============== ``concatenate`` joins tensors along an existing axis:: A = mpcf.FloatTensor(np.array([[1, 2], [3, 4]], dtype=np.float32)) # (2, 2) B = mpcf.FloatTensor(np.array([[5, 6]], dtype=np.float32)) # (1, 2) mpcf.concatenate((A, B), axis=0) # (3, 2) All tensors must have the same shape except along the join axis. ``stack`` joins tensors along a new axis (all shapes must match):: X = mpcf.FloatTensor(np.array([1, 2, 3], dtype=np.float32)) # (3,) Y = mpcf.FloatTensor(np.array([4, 5, 6], dtype=np.float32)) # (3,) mpcf.stack((X, Y), axis=0) # (2, 3) mpcf.stack((X, Y), axis=1) # (3, 2) Splitting tensors ================= ``split`` divides a tensor into parts along an axis. Pass an integer for equal splits, or a list of indices for custom split points:: X = mpcf.FloatTensor(np.arange(12, dtype=np.float32).reshape(4, 3)) # Equal split: 4 rows into 2 parts of 2 rows each a, b = mpcf.split(X, 2, axis=0) # each shape (2, 3) # Index split: split at rows 1 and 3 p, q, r = mpcf.split(X, [1, 3], axis=0) # shapes (1,3), (2,3), (1,3) The returned parts are views sharing data with the original tensor. An equal split raises ``ValueError`` if the axis size is not divisible by the number of sections. ``array_split`` works the same way but allows uneven divisions — the first sections get one extra element when the size is not evenly divisible:: Y = mpcf.FloatTensor(np.arange(9, dtype=np.float32)) parts = mpcf.array_split(Y, 4) # sizes: 3, 2, 2, 2 Iterating ========= Iterating over a tensor yields sub-tensors along the first axis, just like NumPy:: X = mpcf.FloatTensor(np.arange(12, dtype=np.float32).reshape(3, 4)) for row in X: print(row.shape) # (4,) For a 1-D tensor, iteration yields scalar elements. This also enables ``list()``, ``tuple()``, and unpacking:: a, b, c = X # three rows Nested iteration works as expected:: Y = mpcf.FloatTensor(np.arange(24, dtype=np.float32).reshape(2, 3, 4)) for matrix in Y: # shape (3, 4) for row in matrix: # shape (4,) print(row)