Quick start¶
HilbertSFC provides simple Hilbert and Morton encode/decode APIs for Python scalars, NumPy arrays, and PyTorch tensors.
Installation¶
Install the base package with either pip or uv:
pip install hilbertsfc
uv add hilbertsfc
PyTorch support¶
To enable the optional PyTorch extension, install with the torch extra:
pip install "hilbertsfc[torch]"
uv add "hilbertsfc[torch]"
Note
By default, installing hilbertsfc[torch] pulls in a platform-default PyTorch build:
- Windows: CPU-only
- Linux: CUDA-enabled
If you need a specific PyTorch, CUDA, or ROCm version, follow the official
PyTorch installation instructions. Then install hilbertsfc[torch] as shown above.
Alternatively, you can install it in one step by specifying the appropriate PyTorch wheel index, e.g., for CUDA 13.0:
pip install "hilbertsfc[torch]" --extra-index-url https://download.pytorch.org/whl/cu130
uv add "hilbertsfc[torch]" --index https://download.pytorch.org/whl/cu130 --index-strategy unsafe-best-match
First steps¶
Python scalars¶
Use hilbert_encode_2d and hilbert_decode_2d directly on Python integers:
from hilbertsfc import hilbert_decode_2d, hilbert_encode_2d
index = hilbert_encode_2d(17, 23, nbits=10) # index = 534
x, y = hilbert_decode_2d(index, nbits=10) # x = 17, y = 23
nbits controls the coordinate domain [0, 2**nbits) on each axis. For best performance, pass the smallest value that covers your input range.
The 3D API follows the same pattern via hilbert_encode_3d and hilbert_decode_3d. Morton/z-order functions mirror these names with morton_encode_2d, morton_decode_2d, morton_encode_3d, and morton_decode_3d.
nbits compatibility
Hilbert indices obtained with a certain nbits are compatible with those from another nbits, given that the coordinates are within the valid range. This is because the kernels resolve the starting state parity to ensure compatibility.
NumPy arrays¶
The same functions also accept NumPy integer arrays:
import numpy as np
from hilbertsfc import hilbert_decode_2d, hilbert_encode_2d
nbits = 10
shape = (256, 256)
rng = np.random.default_rng(0)
xs = rng.integers(0, 2**nbits, size=shape, dtype=np.uint32)
ys = rng.integers(0, 2**nbits, size=shape, dtype=np.uint32)
indices = hilbert_encode_2d(xs, ys, nbits=nbits) # indices.shape = (256, 256)
xs2, ys2 = hilbert_decode_2d(indices, nbits=nbits) # xs2 = xs, ys2 = ys
Arbitrary shapes are supported with zero-copy access. Strided views also work but they can reduce performance since the kernels are close to memory-bandwidth bound. You can optionally provide out=... buffers for encode and out_xs/out_ys/... buffers for decode to reuse memory or write into memory-mapped arrays.
Parallel execution
Use parallel=True to dispatch the parallel kernel. The number of threads can be controlled with the environment variable NUMBA_NUM_THREADS or at runtime via numba.set_num_threads().
PyTorch tensors¶
Use the torch frontend API hilbertsfc.torch for PyTorch tensors on CPU and accelerator devices. By default on CUDA, contiguous tensors take the Triton path when available; otherwise the implementation falls back to the Torch backend.
import torch
from hilbertsfc.torch import hilbert_decode_2d, hilbert_encode_2d
device = "cuda" if torch.cuda.is_available() else "cpu"
nbits = 10
n = 4096
xs = torch.randint(0, 2**nbits, (n,), dtype=torch.int32, device=device)
ys = torch.randint(0, 2**nbits, (n,), dtype=torch.int32, device=device)
indices = hilbert_encode_2d(xs, ys, nbits=nbits)
xs2, ys2 = hilbert_decode_2d(indices, nbits=nbits)
Use with torch.compile¶
If you plan to use torch.compile, call precache_compile_luts first so LUT materialization happens outside the compiled region. This avoids graph breaks and extra overhead, and is required for fullgraph=True.
import torch
from hilbertsfc.torch import hilbert_encode_2d, precache_compile_luts
device = "cuda" if torch.cuda.is_available() else "cpu"
precache_compile_luts(device=device, op="hilbert_encode_2d")
def encode_then_scale(x: torch.Tensor, y: torch.Tensor, nbits: int) -> torch.Tensor:
return hilbert_encode_2d(x, y, nbits=nbits) * 2
compiled_encode_then_scale = torch.compile(encode_then_scale, fullgraph=True)
Free LUT cache
Torch-side LUT tensors are cached per device for reuse. The LUT cache is only a couple KiB, so clearing it is rarely necessary, but you can free the associated device memory with clear_torch_lut_caches.
Next steps¶
For advanced usage, including embedding scalar kernels into your own Numba code, see Advanced usage.