`hilbertsfc.torch`¶

Core APIs¶

hilbertsfc.torch ¶

PyTorch-API for HilbertSFC.

This subpackage provides 2D/3D Hilbert and Morton encode/decode functions that operate on integer torch.Tensor inputs.

hilbert_encode_2d ¶

hilbert_encode_2d(
    x: Tensor,
    y: Tensor,
    *,
    nbits: int | None = None,
    out: Tensor | None = None,
    lut_cache: TorchCacheMode = "device",
    cpu_parallel: bool | None = None,
    cpu_backend: CPUBackend = "auto",
    gpu_backend: GPUBackend = "auto",
    triton_tuning: TritonTuningMode = "heuristic",
) -> Tensor

Encode 2D integer coordinates to Hilbert indices.

This function provides a PyTorch equivalent of hilbert_encode_2d. It accepts integer torch.Tensor of arbitrary shape on any device, and dispatches to backend-specific implementations depending on device and backend settings.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Integer coordinate tensors to encode. Must have identical `shape` and be on the same device.	required
`y`	`Tensor`	Integer coordinate tensors to encode. Must have identical `shape` and be on the same device.	required
`nbits`	`int \| None`	Number of bits per coordinate axis. This defines a coordinate domain of `[0, 2**nbits)` on each axis. For inputs outside that domain, only the low `nbits` bits of each coordinate are used. Must satisfy `1 <= nbits <= 32`. If provided, it must also fit within the usable bits of the coordinate dtype. If `None`: Array mode: inferred from the coordinate dtype using its usable bit width, capped at 32. For example, `uint16` -> 16, `int16` -> 15 (sign bit excluded), and `uint64`/`int64` -> 32. Scalar mode: defaults to 32. For best performance and tighter output dtypes, pass the smallest value that covers the input coordinate range.	`None`
`out`	`Tensor \| None`	Optional output tensor. Must have the same shape and device as `x` and `y` and an integer dtype wide enough to hold `2 * nbits` bits.	`None`
`lut_cache`	`TorchCacheMode`	Cache mode for look-up tables (LUTs) used by the Torch/Triton kernels. `"device"` (default): cache the converted LUT tensors per-device for reuse across calls. `"host_only"`: do not keep a torch-side LUT cache; materialize on demand from the (process-wide) NumPy LUT cache. This setting is ignored by the CPU Numba path.	`'device'`
`cpu_parallel`	`bool \| None`	Controls whether the CPU Numba kernel may execute in parallel. Only applies when dispatching to the CPU Numba backend and the input is not a scalar tensor. If `None`, a heuristic is used.	`None`
`cpu_backend`	`CPUBackend`	CPU backend selection. `"auto"` (default): use the Numba kernel unless inside `torch.compile`, in which case the torch backend is used. `"numba"`: always use the Numba kernel. This mode is not `torch.compile`-friendly. `"torch"`: always use the torch implementation.	`'auto'`
`gpu_backend`	`GPUBackend`	GPU (accelerator) backend selection. `"auto"` (default): on CUDA, use the Triton kernel when available and all tensors are contiguous; otherwise fall back to the Torch kernel. Fallbacks due to non-contiguity or Triton runtime failure emit a `UserWarning`. `"triton"`: force the Triton kernel. Requires CUDA tensors, Triton availability, and contiguous inputs/outputs; raises on violation or kernel failure. `"torch"`: force the Torch implementation.	`'auto'`
`triton_tuning`	`TritonTuningMode`	Triton launch config selection policy. `"heuristic"` (default): use static launch heuristics. `"autotune_bucketed"`: autotune from a fixed config set and cache by input size bucket. `"autotune_exact"`: autotune from the same config set and cache by exact input size. Only applies when the Triton backend is used.	`'heuristic'`

Returns:

Type	Description
`Tensor`	Hilbert indices. Has the same shape/device as the inputs. If `out` is provided, returns `out`. Otherwise, chooses a minimal integer dtype that can represent `2 * nbits` bits, preferring unsigned if all inputs are unsigned and a fitting unsigned dtype is available.

Raises:

Type	Description
`TypeError`	If a non-integer tensor is provided.
`ValueError`	If inputs are on different devices, have mismatched shapes, if `nbits` is invalid or does not fit in the input/output dtypes, or if backend arguments are invalid.
`RuntimeError`	If `gpu_backend='triton'` is requested but Triton is unavailable or the Triton kernel fails at runtime.

Notes

When using this function with torch.compile, call precache_compile_luts before compilation. This avoids materialization of LUTs inside the compiled region, which causes graph breaks, extra overhead, and failure with fullgraph=True.

hilbert_decode_2d ¶

hilbert_decode_2d(
    index: Tensor,
    *,
    nbits: int | None = None,
    out_x: Tensor | None = None,
    out_y: Tensor | None = None,
    lut_cache: TorchCacheMode = "device",
    cpu_parallel: bool | None = None,
    cpu_backend: CPUBackend = "auto",
    gpu_backend: GPUBackend = "auto",
    triton_tuning: TritonTuningMode = "heuristic",
) -> tuple[Tensor, Tensor]

Decode Hilbert indices to 2D integer coordinates.

This function provides a PyTorch equivalent of hilbert_decode_2d. It accepts integer torch.Tensor of arbitrary shape on any device, and dispatches to backend-specific implementations depending on device and backend settings.

Parameters:

Name	Type	Description	Default
`index`	`Tensor`	Integer Hilbert index tensor to decode.	required
`nbits`	`int \| None`	Number of bits per coordinate axis. This defines a coordinate domain of `[0, 2nbits)` on each axis and a Hilbert index range of `[0, 2(2 * nbits))`. For indices outside that range, only the low `2 * nbits` bits are used. Must satisfy `1 <= nbits <= 32`. If provided, it must also fit within the usable bits of the index dtype. If `None`: Inferred from the index dtype as half its usable bit width, capped at 32. For example, `uint16` -> 8, `uint64` -> 32, and `int64` -> 31 (sign bit excluded). For best performance and tighter output dtypes, pass the smallest value that covers the input index range.	`None`
`out_x`	`Tensor \| None`	Optional output coordinate tensors. Either provide both or neither. Each must have the same shape and device as `index` and an integer dtype wide enough to hold `nbits` bits.	`None`
`out_y`	`Tensor \| None`	Optional output coordinate tensors. Either provide both or neither. Each must have the same shape and device as `index` and an integer dtype wide enough to hold `nbits` bits.	`None`
`lut_cache`	`TorchCacheMode`	Cache mode for look-up tables (LUTs) used by the Torch/Triton kernels. `"device"` (default): cache the converted LUT tensors per-device for reuse across calls. `"host_only"`: do not keep a torch-side LUT cache; materialize on demand from the (process-wide) NumPy LUT cache. This setting is ignored by the CPU Numba path.	`'device'`
`cpu_parallel`	`bool \| None`	Controls whether the CPU Numba kernel may execute in parallel. Only applies when dispatching to the CPU Numba backend and the input is not a scalar tensor. If `None`, a heuristic is used.	`None`
`cpu_backend`	`CPUBackend`	CPU backend selection. `"auto"` (default): use the Numba kernel unless inside `torch.compile`, in which case the torch backend is used. `"numba"`: always use the Numba kernel. This mode is not `torch.compile`-friendly. `"torch"`: always use the torch implementation.	`'auto'`
`gpu_backend`	`GPUBackend`	GPU (accelerator) backend selection. `"auto"` (default): on CUDA, use the Triton kernel when available and all tensors are contiguous; otherwise fall back to the Torch kernel. Fallbacks due to non-contiguity or Triton runtime failure emit a `UserWarning`. `"triton"`: require CUDA tensors, Triton availability, and contiguous inputs/outputs; raises on violation or kernel failure. `"torch"`: force the Torch implementation.	`'auto'`
`triton_tuning`	`TritonTuningMode`	Triton launch config selection policy. `"heuristic"` (default): use static launch heuristics. `"autotune_bucketed"`: autotune from a fixed config set and cache by input size bucket. `"autotune_exact"`: autotune from the same config set and cache by exact input size. Only applies when the Triton backend is used.	`'heuristic'`

Returns:

Type	Description
`tuple[Tensor, Tensor]`	Decoded coordinates `(x, y)`. Each tensor has the same shape/device as `index`. If `out_x` and `out_y` are provided, returns `(out_x, out_y)`. Otherwise, each result uses a minimal integer dtype that can represent `nbits` bits, preferring unsigned if the input is unsigned and a fitting unsigned dtype is available.

Raises:

Type	Description
`TypeError`	If a non-integer tensor is provided.
`ValueError`	If `nbits` is invalid or does not fit in the input/output dtypes, if outputs are inconsistent or have incorrect shapes/devices, or if backend arguments are invalid.
`RuntimeError`	If `gpu_backend='triton'` is requested but Triton is unavailable or the Triton kernel fails at runtime.

Notes

When using this function with torch.compile, call precache_compile_luts before compilation. This avoids materialization of LUTs inside the compiled region, which causes graph breaks, extra overhead, and failure with fullgraph=True.

hilbert_encode_3d ¶

hilbert_encode_3d(
    x: Tensor,
    y: Tensor,
    z: Tensor,
    *,
    nbits: int | None = None,
    out: Tensor | None = None,
    lut_cache: TorchCacheMode = "device",
    cpu_parallel: bool | None = None,
    cpu_backend: CPUBackend = "auto",
    gpu_backend: GPUBackend = "auto",
    triton_tuning: TritonTuningMode = "heuristic",
) -> Tensor

Encode 3D integer coordinates to Hilbert indices.

This function provides a PyTorch equivalent of hilbert_encode_3d. It accepts integer torch.Tensor of arbitrary shape on any device, and dispatches to backend-specific implementations depending on device and backend settings.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Integer coordinate tensors to encode. Must have identical `shape` and be on the same device.	required
`y`	`Tensor`	Integer coordinate tensors to encode. Must have identical `shape` and be on the same device.	required
`z`	`Tensor`	Integer coordinate tensors to encode. Must have identical `shape` and be on the same device.	required
`nbits`	`int \| None`	Number of bits per coordinate axis. This defines a coordinate domain of `[0, 2**nbits)` on each axis. For inputs outside that domain, only the low `nbits` bits of each coordinate are used. Must satisfy `1 <= nbits <= 21`. If provided, it must also fit within the usable bits of the coordinate dtype. If `None`: Array mode: inferred from the coordinate dtype using its usable bit width, capped at 21. For example, `uint16` -> 16, `int16` -> 15 (sign bit excluded), and `uint64`/`int64` -> 21. Scalar mode: defaults to 21. For best performance and tighter output dtypes, pass the smallest value that covers the input coordinate range.	`None`
`out`	`Tensor \| None`	Optional output tensor. Must have the same shape and device as `x`, `y`, and `z` and an integer dtype wide enough to hold `3 * nbits` bits.	`None`
`lut_cache`	`TorchCacheMode`	Cache mode for look-up tables (LUTs) used by the Torch/Triton kernels. `"device"` (default): cache the converted LUT tensors per-device for reuse across calls. `"host_only"`: do not keep a torch-side LUT cache; materialize on demand from the (process-wide) NumPy LUT cache. This setting is ignored by the CPU Numba path.	`'device'`
`cpu_parallel`	`bool \| None`	Controls whether the CPU Numba kernel may execute in parallel. Only applies when dispatching to the CPU Numba backend and the input is not a scalar tensor. If `None`, a heuristic is used.	`None`
`cpu_backend`	`CPUBackend`	CPU backend selection. `"auto"` (default): use the Numba kernel unless inside `torch.compile`, in which case the torch backend is used. `"numba"`: always use the Numba kernel. This mode is not `torch.compile`-friendly. `"torch"`: always use the torch implementation.	`'auto'`
`gpu_backend`	`GPUBackend`	GPU (accelerator) backend selection. `"auto"` (default): on CUDA, use the Triton kernel when available and all tensors are contiguous; otherwise fall back to the Torch kernel. Fallbacks due to non-contiguity or Triton runtime failure emit a `UserWarning`. `"triton"`: require CUDA tensors, Triton availability, and contiguous inputs/outputs; raises on violation or kernel failure. `"torch"`: force the Torch implementation.	`'auto'`
`triton_tuning`	`TritonTuningMode`	Triton launch config selection policy. `"heuristic"` (default): use static launch heuristics. `"autotune_bucketed"`: autotune from a fixed config set and cache by input size bucket. `"autotune_exact"`: autotune from the same config set and cache by exact input size. Only applies when the Triton backend is used.	`'heuristic'`

Returns:

Type	Description
`Tensor`	Hilbert indices. Has the same shape/device as the inputs. If `out` is provided, returns `out`. Otherwise, chooses a minimal integer dtype that can represent `3 * nbits` bits, preferring unsigned if all inputs are unsigned and a fitting unsigned dtype is available.

Raises:

Type	Description
`TypeError`	If a non-integer tensor is provided.
`ValueError`	If inputs are on different devices, have mismatched shapes, if `nbits` is invalid or does not fit in the input/output dtypes, or if backend arguments are invalid.
`RuntimeError`	If `gpu_backend='triton'` is requested but Triton is unavailable or the Triton kernel fails at runtime.

Notes

When using this function with torch.compile, call precache_compile_luts before compilation. This avoids materialization of LUTs inside the compiled region, which causes graph breaks, extra overhead, and failure with fullgraph=True.

hilbert_decode_3d ¶

hilbert_decode_3d(
    index: Tensor,
    *,
    nbits: int | None = None,
    out_x: Tensor | None = None,
    out_y: Tensor | None = None,
    out_z: Tensor | None = None,
    lut_cache: TorchCacheMode = "device",
    cpu_parallel: bool | None = None,
    cpu_backend: CPUBackend = "auto",
    gpu_backend: GPUBackend = "auto",
    triton_tuning: TritonTuningMode = "heuristic",
) -> tuple[Tensor, Tensor, Tensor]

Decode Hilbert indices to 3D integer coordinates.

This function provides a PyTorch equivalent of hilbert_decode_3d. It accepts integer torch.Tensor of arbitrary shape on any device, and dispatches to backend-specific implementations depending on device and backend settings.

Parameters:

Name	Type	Description	Default
`index`	`Tensor`	Integer Hilbert index tensor to decode.	required
`nbits`	`int \| None`	Number of bits per coordinate axis. This defines a coordinate domain of `[0, 2nbits)` on each axis and a Hilbert index range of `[0, 2(3 * nbits))`. For indices outside that range, only the low `3 * nbits` bits are used. Must satisfy `1 <= nbits <= 21`. If provided, it must also fit within the usable bits of the index dtype. If `None`: Inferred from the index dtype as one third of its usable bit width, capped at 21. For example, `uint16` -> 5, `uint64` -> 21, and `int64` -> 21 (sign bit excluded). For best performance and tighter output dtypes, pass the smallest value that covers the input index range.	`None`
`out_x`	`Tensor \| None`	Optional output coordinate tensors. Either provide all three or none. Each must have the same shape and device as `index` and an integer dtype wide enough to hold `nbits` bits.	`None`
`out_y`	`Tensor \| None`	Optional output coordinate tensors. Either provide all three or none. Each must have the same shape and device as `index` and an integer dtype wide enough to hold `nbits` bits.	`None`
`out_z`	`Tensor \| None`	Optional output coordinate tensors. Either provide all three or none. Each must have the same shape and device as `index` and an integer dtype wide enough to hold `nbits` bits.	`None`
`lut_cache`	`TorchCacheMode`	Cache mode for look-up tables (LUTs) used by the Torch/Triton kernels. `"device"` (default): cache the converted LUT tensors per-device for reuse across calls. `"host_only"`: do not keep a torch-side LUT cache; materialize on demand from the (process-wide) NumPy LUT cache. This setting is ignored by the CPU Numba path.	`'device'`
`cpu_parallel`	`bool \| None`	Controls whether the CPU Numba kernel may execute in parallel. Only applies when dispatching to the CPU Numba backend and the input is not a scalar tensor. If `None`, a heuristic is used.	`None`
`cpu_backend`	`CPUBackend`	CPU backend selection. `"auto"` (default): use the Numba kernel unless inside `torch.compile`, in which case the torch backend is used. `"numba"`: always use the Numba kernel. This mode is not `torch.compile`-friendly. `"torch"`: always use the torch implementation.	`'auto'`
`gpu_backend`	`GPUBackend`	GPU (accelerator) backend selection. `"auto"` (default): on CUDA, use the Triton kernel when available and all tensors are contiguous; otherwise fall back to the Torch kernel. Fallbacks due to non-contiguity or Triton runtime failure emit a `UserWarning`. `"triton"`: require CUDA tensors, Triton availability, and contiguous inputs/outputs; raises on violation or kernel failure. `"torch"`: force the Torch implementation.	`'auto'`
`triton_tuning`	`TritonTuningMode`	Triton launch config selection policy. `"heuristic"` (default): use static launch heuristics. `"autotune_bucketed"`: autotune from a fixed config set and cache by input size bucket. `"autotune_exact"`: autotune from the same config set and cache by exact input size. Only applies when the Triton backend is used.	`'heuristic'`

Returns:

Type	Description
`tuple[Tensor, Tensor, Tensor]`	Decoded coordinates `(x, y, z)`. Each tensor has the same shape/device as `index`. If `out_x`, `out_y`, and `out_z` are provided, returns `(out_x, out_y, out_z)`. Otherwise, each result uses a minimal integer dtype that can represent `nbits` bits, preferring unsigned if the input is unsigned and a fitting unsigned dtype is available.

Raises:

Type	Description
`TypeError`	If a non-integer tensor is provided.
`ValueError`	If `nbits` is invalid or does not fit in the input/output dtypes, if outputs are inconsistent or have incorrect shapes/devices, or if backend arguments are invalid.
`RuntimeError`	If `gpu_backend='triton'` is requested but Triton is unavailable or the Triton kernel fails at runtime.

Notes

When using this function with torch.compile, call precache_compile_luts before compilation. This avoids materialization of LUTs inside the compiled region, which causes graph breaks, extra overhead, and failure with fullgraph=True.

morton_encode_2d ¶

morton_encode_2d(
    x: Tensor,
    y: Tensor,
    *,
    nbits: int | None = None,
    out: Tensor | None = None,
    cpu_parallel: bool | None = None,
    cpu_backend: CPUBackend = "auto",
    gpu_backend: GPUBackend = "auto",
    triton_tuning: TritonTuningMode = "heuristic",
) -> Tensor

Encode 2D integer coordinate tensors to Morton (Z-order) indices.

API semantics for parameters, returns, and errors match hilbert_encode_2d, except that Morton kernels do not use lookup tables and therefore do not accept lut_cache.

morton_decode_2d ¶

morton_decode_2d(
    index: Tensor,
    *,
    nbits: int | None = None,
    out_x: Tensor | None = None,
    out_y: Tensor | None = None,
    cpu_parallel: bool | None = None,
    cpu_backend: CPUBackend = "auto",
    gpu_backend: GPUBackend = "auto",
    triton_tuning: TritonTuningMode = "heuristic",
) -> tuple[Tensor, Tensor]

Decode Morton (Z-order) index tensors to 2D integer coordinates.

API semantics for parameters, returns, and errors match hilbert_decode_2d, except that Morton kernels do not use lookup tables and therefore do not accept lut_cache.

morton_encode_3d ¶

morton_encode_3d(
    x: Tensor,
    y: Tensor,
    z: Tensor,
    *,
    nbits: int | None = None,
    out: Tensor | None = None,
    cpu_parallel: bool | None = None,
    cpu_backend: CPUBackend = "auto",
    gpu_backend: GPUBackend = "auto",
    triton_tuning: TritonTuningMode = "heuristic",
) -> Tensor

Encode 3D integer coordinate tensors to Morton (Z-order) indices.

API semantics for parameters, returns, and errors match hilbert_encode_3d, except that Morton kernels do not use lookup tables and therefore do not accept lut_cache.

morton_decode_3d ¶

morton_decode_3d(
    index: Tensor,
    *,
    nbits: int | None = None,
    out_x: Tensor | None = None,
    out_y: Tensor | None = None,
    out_z: Tensor | None = None,
    cpu_parallel: bool | None = None,
    cpu_backend: CPUBackend = "auto",
    gpu_backend: GPUBackend = "auto",
    triton_tuning: TritonTuningMode = "heuristic",
) -> tuple[Tensor, Tensor, Tensor]

Decode Morton (Z-order) index tensors to 3D integer coordinates.

API semantics for parameters, returns, and errors match hilbert_decode_3d, except that Morton kernels do not use lookup tables and therefore do not accept lut_cache.

precache_compile_luts ¶

precache_compile_luts(
    device: TorchDeviceLike = None,
    *,
    op: TorchHilbertOp = "all",
) -> None

Pre-cache Torch LUT tensors for use with torch.compile.

When using HilbertSFC Torch functions with torch.compile, call this before compilation to avoid materializing LUT tensors inside the compiled region, which can cause graph breaks, extra overhead, and failure with fullgraph=True.

Parameters:

Name	Type	Description	Default
`device`	`TorchDeviceLike`	Device for which to cache LUT tensors. `None` means CPU.	`None`
`op`	`TorchHilbertOp`	Operation used to select which LUT tensors are pre-cached. `"all"` (default): pre-cache all LUT tensors needed for supported operations. Otherwise: pre-cache only the LUT tensors used by that operation.	`'all'`

Notes

It is generally not useful to pre-cache LUT tensors with this function when not using torch.compile, as this function materializes LUT tensors that may not be used outside compiled regions.

clear_torch_lut_caches ¶

clear_torch_lut_caches(
    device: TorchDeviceLike = None,
    *,
    op: TorchHilbertOp = "all",
) -> None

Clear Torch-side LUT caches.

Parameters:

Name	Type	Description	Default
`device`	`TorchDeviceLike`	Device whose cached LUT tensors should be cleared. If `None` (default), clears cached LUT tensors for all devices.	`None`
`op`	`TorchHilbertOp`	Operation used to filter which cached LUT tensors are cleared. `"all"` (default): clear all cached LUT tensors. Otherwise: clear only the cached LUT tensors used by that operation (for example, `"hilbert_encode_2d"`).	`'all'`

Notes

This does not clear the root process-wide LUT cache. Use clear_lut_caches for that.

hilbertsfc.torch¶

Core APIs¶

Hilbert¶

Morton¶

Cache Management¶