katsdpsigproc package
Subpackages
- katsdpsigproc.asyncio package
- katsdpsigproc.rfi package
- Submodules
- katsdpsigproc.rfi.device module
AbstractBackgroundDeviceAbstractBackgroundDeviceTemplateAbstractNoiseEstDeviceAbstractNoiseEstDeviceTemplateAbstractThresholdDeviceAbstractThresholdDeviceTemplateBackgroundFlagsBackgroundHostFromDeviceBackgroundMedianFilterDeviceBackgroundMedianFilterDeviceTemplateFlaggerDeviceFlaggerDeviceTemplateFlaggerHostFromDeviceNoiseEstHostFromDeviceNoiseEstMADDeviceNoiseEstMADDeviceTemplateNoiseEstMADTDeviceNoiseEstMADTDeviceTemplateThresholdHostFromDeviceThresholdSimpleDeviceThresholdSimpleDeviceTemplateThresholdSumDeviceThresholdSumDeviceTemplate
- katsdpsigproc.rfi.host module
- katsdpsigproc.rfi.twodflag module
- Module contents
- katsdpsigproc.test package
Submodules
katsdpsigproc.abc module
Abstract base classes for opencl and cuda.
- class katsdpsigproc.abc.AbstractCommandQueue(*args, **kwds)[source]
Bases:
ABC,Generic[_B,_C,_E,_K]Abstraction of a command queue.
- context: _C
- abstract enqueue_copy_buffer_rect(src_buffer: _B, dest_buffer: _B, src_origin: int, dest_origin: int, shape: Sequence[int], src_strides: Sequence[int], dest_strides: Sequence[int]) None[source]
Copy a subregion of one buffer to another.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
copy_region()for a high-level interface.- Parameters:
src_buffer – Source and destination buffers
dest_buffer – Source and destination buffers
src_origin – Offsets for the start of the copy, in bytes
dest_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
src_strides – Strides for the source and destination memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
dest_strides – Strides for the source and destination memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
- abstract enqueue_kernel(kernel: _K, args: Sequence[Any], global_size: Tuple[int, ...], local_size: Tuple[int, ...]) None[source]
Enqueue a kernel to the command queue.
Warning
It is not thread-safe to call this function in two threads on the same kernel at the same time.
- Parameters:
kernel – Kernel to run
args – Arguments to pass to the kernel. Refer to the PyOpenCL/CUDA documentation for details. Additionally, this function allows a low-level device array to be passed.
global_size – Number of work-items in each global dimension
local_size – Number of work-items in each local dimension. Must divide exactly into global_size.
- abstract enqueue_marker() AbstractEvent[source]
Create an event at this point in the command queue.
- abstract enqueue_read_buffer(buffer: _B, data: Any, blocking: bool = True) None[source]
Copy data from the device to the host.
Only whole-buffer copies are supported, and the shape and type must match. In general, one should use the convenience functions in
accel.DeviceArray.- Parameters:
buffer – Source
data – Target
blocking – If true (default) the call blocks until the copy is complete
- abstract enqueue_read_buffer_rect(buffer: _B, data: Any, buffer_origin: int, data_origin: int, shape: Sequence[int], buffer_strides: Sequence[int], data_strides: Sequence[int], blocking: bool = True) None[source]
Copy a region of a buffer to host memory.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
set_region()for a high-level interface.- Parameters:
buffer – Source
data (array-like) – Target
buffer_origin – Offsets for the start of the copy, in bytes
data_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
buffer_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
data_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
blocking – If true, block until the transfer is complete.
- abstract enqueue_wait_for_events(events: Sequence[_E]) None[source]
Enqueue a barrier to wait for all events in events.
- abstract enqueue_write_buffer(buffer: _B, data: Any, blocking=True) None[source]
Copy data from the host to the device.
Only whole-buffer copies are supported, and the shape and type must match. In general, one should use the convenience functions in
accel.DeviceArray.- Parameters:
buffer – Target
data (array-like) – Source
blocking – If true (default), the call blocks until the source has been fully read (it has not necessarily reached the device).
- abstract enqueue_write_buffer_rect(buffer: _B, data: Any, buffer_origin: int, data_origin: int, shape: Sequence[int], buffer_strides: Sequence[int], data_strides: Sequence[int], blocking: bool = True) None[source]
Copy a region of host memory to a buffer.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
set_region()for a high-level interface.- Parameters:
buffer – Target
data (array-like) – Source
buffer_origin – Offsets for the start of the copy, in bytes
data_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
buffer_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
data_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
blocking – If true, block until the transfer is complete.
- class katsdpsigproc.abc.AbstractContext(*args, **kwds)[source]
Bases:
ABC,Generic[_B,_RB,_RS,_D,_P,_Q,_TQ]Abstraction of an OpenCL/CUDA context.
- abstract allocate(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], raw: _RB | None = None) _B[source]
Create a typed buffer on the device.
- Parameters:
shape – Shape for the array
dtype – Type for the data
raw – Memory backing the array (automatically allocated if
None)
- abstract allocate_pinned(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any]) ndarray[source]
Create a buffer in host memory that can be efficiently copied to and from the device.
- Parameters:
shape – Shape for the array
dtype – Type for the data
- abstract allocate_svm(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], raw: _RS | None = None) ndarray[source]
Allocate shared virtual memory.
- abstract allocate_svm_raw(n_bytes: int) _RS[source]
Allocate raw storage that can be passed to
allocate_svm().
- abstract compile(source: str, extra_flags: List[str] | None = None) AbstractProgram[source]
Build a program object from source.
- Parameters:
source – Source code
extra_flags – Extra parameters to pass to the compiler
- abstract create_command_queue(profile: bool = False) AbstractCommandQueue[source]
Create a new command queue associated with this context.
- Parameters:
profile – If true, the command queue will support timing kernels
- abstract create_tuning_command_queue() AbstractTuningCommandQueue[source]
Create a new command queue for doing autotuning.
- abstract property device: AbstractDevice
Return the device associated with the context (or the first device, if multiple).
- class katsdpsigproc.abc.AbstractDevice(*args, **kwds)[source]
-
Abstraction of a device.
- abstract classmethod get_devices() Sequence[_D][source]
Return a list of all devices on all platforms.
- abstract classmethod get_devices_by_platform() Sequence[Sequence[_D]][source]
Return a list of all devices, with a sub-list per platform.
- abstract property is_accelerator: bool
Whether device is an accelerator (as defined by OpenCL device types).
- abstract make_context() AbstractContext[source]
Create a new context associated with this device.
- class katsdpsigproc.abc.AbstractEvent[source]
Bases:
ABCAbstraction of an event.
This is more akin to a CUDA event than an OpenCL event, in that it is a marker in a command queue rather than associated with a specific command.
- abstract time_since(prior_event: _E) float[source]
Return the time in seconds from prior_event to self.
Unlike the PyCUDA method of the same name, this will wait for the events to complete if they have not already.
- abstract time_till(next_event: _E) float[source]
Return the time in seconds from this event to next_event.
See
time_since().
- class katsdpsigproc.abc.AbstractKernel(program: _P, name: str)[source]
-
Abstraction of a kernel object.
The object can be enqueued using
AbstractCommandQueue.enqueue_kernel().The recommended way to create this object is via
AbstractProgram.get_kernel().
- class katsdpsigproc.abc.AbstractProgram(*args, **kwds)[source]
-
Abstraction of a program object.
- abstract get_kernel(name: str) AbstractKernel[source]
Create a new kernel.
- Parameters:
name – Name of the kernel function
- class katsdpsigproc.abc.AbstractTuningCommandQueue(*args, **kwds)[source]
Bases:
AbstractCommandQueue[_B,_C,_E,_K]Command queue with extra facilities for autotuning.
It keeps track of kernels that are enqueued since the last call to
start_tuning(), and reports the total time they consume whenstop_tuning()is called.- context: _C
katsdpsigproc.accel module
Utilities for interfacing with accelerator hardware.
Both OpenCL and CUDA are supported, as well as multiple devices.
The modules cuda and opencl provide the abstraction layer,
but most code will not import these directly (and it might not be possible
to import them). Instead, use create_some_context() to set up a context
on whatever device is available.
- katsdpsigproc.accel.have_cuda
True if PyCUDA could be imported (does not guarantee any CUDA devices)
- Type:
- katsdpsigproc.accel.have_opencl
True if PyOpenCL could be imported (does not guarantee any OpenCL devices)
- Type:
- class katsdpsigproc.accel.AbstractAllocator(*args, **kwds)[source]
-
Interface for allocating device memory.
- abstract allocate(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], padded_shape: Tuple[int, ...] | None = None, raw: _RB | None = None) DeviceArray[source]
- context: AbstractContext
- class katsdpsigproc.accel.AliasIOSlot(children: Iterable[IOSlotBase])[source]
Bases:
IOSlotBaseSlot that allocates a single low-level buffer to back multiple children.
The child slots need not have the same type or shape. This is typically used when logically distinct slots can share memory because they do not contain live data at the same time.
- Parameters:
children –
- Raises:
ValueError – If children is empty or contains non-attachable elements
- class katsdpsigproc.accel.CompoundIOSlot(children: Iterable[IOSlot])[source]
Bases:
IOSlotIO slot that owns multiple child slots, and presents the combined requirement.
The children must all have the same type and shape. This is used for connecting a single buffer to multiple operations.
- Parameters:
children – Child slots
- Raises:
ValueError – If children is empty, or if elements have inconsistent shapes
ValueError – If any child is not attachable
TypeError – If children contains inconsistent data types
- class katsdpsigproc.accel.DeviceAllocator(context: AbstractContext[_B, _RB, _RS, _D, _P, _Q, _TQ])[source]
Bases:
Generic[_B,_RB,_RS,_D,_P,_Q,_TQ],AbstractAllocator[_RB]Allocate
DeviceArrayobjects from a context.- allocate(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], padded_shape: Tuple[int, ...] | None = None, raw: _RB | None = None) DeviceArray[source]
- context: AbstractContext
- class katsdpsigproc.accel.DeviceArray(context: AbstractContext, shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], padded_shape: Tuple[int, ...] | None = None, raw: Any | None = None)[source]
Bases:
objectA light-weight array-like wrapper around a device buffer.
It that handles padding better than PyCUDA (which has very poor support).
It only supports C-order arrays where the inner-most dimension is contiguous. Transfers are designed to use an
HostArrayof the same shape and padding, but fall back to using a copy when necessary.- Parameters:
context – Context in which to allocate the memory
shape – Shape for the usable data
dtype – Data type
padded_shape – Shape for memory allocation (defaults to shape)
raw – If specified, provides the backing memory. It must be created from the allocate_raw method of an allocator.
- asarray_like(ary: ndarray) HostArray[source]
Return an array with the same content as ary, but the same memory layout as self.
- copy_region(command_queue: AbstractCommandQueue, dest: DeviceArray, src_region: int | slice | None | Tuple[int | slice | None, ...], dest_region: int | slice | None | Tuple[int | slice | None, ...]) None[source]
Perform a device-to-device copy of a subregion of self to dest.
If the source and destination memory overlap, the result is undefined.
The regions to copy are specified using a subset of numpy array indexing syntax. The following are supported:
slices with positive strides
integers
If fewer indices than axes are specified, all elements on the remaining axes are used.
Ellipses are not yet supported, but it would be straightforward to add support.
- Parameters:
- Raises:
TypeError – if the source and destination do not have the same dtype
ValueError – if the source and destination regions select different shapes
IndexError – if the source or destination regions are unsupported or out-of-range
- get(command_queue: AbstractCommandQueue, ary: ndarray | None = None) ndarray[source]
Copy synchronously from self to ary.
If ary is None, or if it is not suitable as a target, the copy is to a newly-allocated
HostArray. The actual target is returned.
- get_async(command_queue: AbstractCommandQueue, ary: ndarray | None = None) ndarray[source]
Copy asynchronously from self to ary (see
get()).
- get_region(command_queue: AbstractCommandQueue, ary: ndarray, device_region: int | slice | None | Tuple[int | slice | None, ...], ary_region: int | slice | None | Tuple[int | slice | None, ...], blocking: bool = True) None[source]
Perform a device-to-host copy of a subregion of self to ary.
See
copy_region()for a description of how regions are specified.- Parameters:
command_queue – Command queue for the operation.
ary – Target of the copy
device_region – Index expressions constructed by
np.s_ornp.index_exp, to specify the source and target regions.ary_region – Index expressions constructed by
np.s_ornp.index_exp, to specify the source and target regions.blocking – If false, the operation will be asynchronous.
- Raises:
TypeError – if the source and destination do not have the same dtype
ValueError – if the source and destination regions select different shapes
ValueError – if ary is not a
HostArrayor is a view of oneIndexError – if the source or destination regions are unsupported or out-of-range
- set(command_queue: AbstractCommandQueue, ary: ndarray) None[source]
Copy synchronously from ary to self.
- set_async(command_queue: AbstractCommandQueue, ary: ndarray) None[source]
Copy asynchronously from ary to self.
- set_region(command_queue: AbstractCommandQueue, ary: ndarray, device_region: int | slice | None | Tuple[int | slice | None, ...], ary_region: int | slice | None | Tuple[int | slice | None, ...], blocking: bool = True) None[source]
Perform a host-to-device copy of a subregion ary to self.
See
copy_region()for a description of how regions are specified.- Parameters:
command_queue – Command queue for the operation.
ary – Source of the copy
device_region – Index expressions constructed by
np.s_ornp.index_exp, to specify the source and target regions.ary_region – Index expressions constructed by
np.s_ornp.index_exp, to specify the source and target regions.blocking – If false, the operation will be asynchronous.
- Raises:
TypeError – if the source and destination do not have the same dtype
ValueError – if the source and destination regions select different shapes
IndexError – if the source or destination regions are unsupported or out-of-range
- zero(command_queue: AbstractCommandQueue) None[source]
Memset with zeros (asynchronously).
- class katsdpsigproc.accel.Dimension(size: int, min_padded_round: int | None = None, min_padded_size: int | None = None, alignment: int = 1, align_dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any] = None, exact: bool = False)[source]
Bases:
objectA single dimension of an
IOSlot.It represents padding and alignment requirements. Instances can be linked together, with all linked instances (whether directly or indirectly linked) exposing the intersection of their requirements. Internally this is represented using a union-find tree with path compression.
Note that min_padded_round and alignment provide similar but different requirements. The former is used to determine a minimum amount of padding, but any larger amount is acceptable, even if not a multiple of min_padded_round. The latter is stricter, always requiring a multiple of this value. The latter is not expected to be frequently used except when type-punning.
Dimensions can also be frozen, which is done by
IOSlotonce a buffer is bound. This prevents the requirements from changing later (via linking or otherwise) and hence invalidating the bound buffer, or allowing two buffers with a shared dimension but different strides to be used.- Parameters:
size – Size of the actual data, before padding
min_padded_round – The min_padded_size will be computed by rounding size up to a multiple of this.
min_padded_size – Minimum size of the padded allocation, overriding min_padded_round.
alignment – Padded size is required to be a multiple of this. This must be a power of 2.
align_dtype – If specified, it is a hint that this data type is the fastest-varying axis of a multidimensional array. The padded size may be chosen to be such that the stride is a multiple of ALIGN_BYTES, to ensure efficient access by GPUs. The hint will be ignored if exact is true.
exact – If true, padding is forbidden
- ALIGN_BYTES = 128
- add_align_dtype(dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any]) None[source]
Add an alignment hint.
Indicates that this will be used with an array whose fastest-varying dimension is of type dtype. If the size is not a power of 2, it is ignored.
- link(other: Dimension) None[source]
Make both self and other reference a single shared requirement.
- Raises:
ValueError – If the resulting requirement is exact and unsatisfiable
- class katsdpsigproc.accel.HostArray(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], padded_shape: Tuple[int, ...] | None = None, context: AbstractContext | None = None)[source]
Bases:
ndarrayA restricted array class that can be used to initialise a
DeviceArray.It uses C ordering and allows padding, which is always in units of the dtype. It optionally uses pinned memory to allow fast transfer to and from device memory.
Because of limitations in numpy and PyCUDA (which do not support non-contiguous memory very well), it works by taking a slice from the origin of contiguous storage, and using that contiguous storage in host-device copies. While one can create views, those views cannot be used in fast copies because there is no way to know whether the view is anchored at the origin.
See the numpy documentation on subclassing for an explanation of why it is written the way it is.
- Parameters:
shape – Shape for the array
dtype – Data type for the array
padded_shape – Total size of memory allocation (defaults to shape)
context – If specified, the memory will be allocated in a way that allows efficient copies to and from this context.
- class katsdpsigproc.accel.IOSlot(dimensions: Tuple[Dimension | int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any])[source]
Bases:
IOSlotBaseAn input/output slot with type and shape information.
It contains a reference to a buffer (initially None) and the shape, type, padding and alignment requirements for that buffer. It can allocate the buffer on request. Each dimension is represesented by a
Dimension.Slots are arranged in a tree, and only the root can be manipulated directly. A slot cannot be reattached to a new parent.
- Parameters:
- allocate(allocator: AbstractAllocator[_RB], raw: _RB | None = None, *, bind: bool = True) DeviceArray[source]
Allocate and optionally bind a buffer satisfying the requirements.
Warning
When raw is provided, there is no check that the storage is large enough.
- Parameters:
allocator – Memory allocator from which to obtain the memory
raw (PyCUDA DeviceAllocation or PyOpenCL Buffer, optional) – Backing storage for the allocation
bind – If true (the default), the allocated buffer is immediately bound. If false, the buffer is returned and not bound, and can be bound later.
- Raises:
ValueError – If this is not a root slot
- bind(buffer: DeviceArray | None) None[source]
Set the internal buffer reference.
Always use this function rather than writing it directly.
If the buffer is not None, it is validated (see
validate()).- Parameters:
buffer – Buffer to store
- Raises:
ValueError – If this is not a root slot
- required_padded_shape() Tuple[int, ...][source]
Return padded shape required to satisfy only this slot.
- validate(buffer: DeviceArray) None[source]
Check that buffer is suitable for binding.
- Parameters:
buffer – Buffer to validate (must not be None)
- Raises:
TypeError – If the data type does not match
ValueError – If the dimensions or shape do not match
ValueError – If a padded size does not match
Dimension.required_padded_size()
- class katsdpsigproc.accel.IOSlotBase[source]
Bases:
ABCAn input/output slot of an operation.
A slot can be bound to storage, or can allocate storage itself. This base class is untyped and unshaped, so in most cases one will use
IOSlotinstead.Slots are arranged in a tree, and only the root can be manipulated directly. The entire tree shares the same storage. A slot cannot be reattached to a new parent.
- allocate(allocator: AbstractAllocator[_RB], raw: _RB | None = None, *, bind: bool = True) Any[source]
Allocate and optionally bind a buffer satisfying the requirements.
Warning
When raw is provided, there is no check that the storage is large enough.
- Parameters:
allocator – Memory allocator from which to obtain the memory
raw (PyCUDA DeviceAllocation or PyOpenCL Buffer, optional) – Backing storage for the allocation
bind – If true (the default), the allocated buffer is immediately bound. If false, the buffer is returned and not bound, and can be bound later.
- Raises:
ValueError – If this is not a root slot
- allocate_host(context: AbstractContext) HostArray[source]
Allocate a HostArray compatible with this slot.
- class katsdpsigproc.accel.LinenoLexer(*args, **kw)[source]
Bases:
LexerInsert #line directives into the source code.
It is used by passing lexer_cls to the mako template constructor.
- class katsdpsigproc.accel.Operation(command_queue: AbstractCommandQueue, allocator: AbstractAllocator | None = None)[source]
Bases:
ABCAn instance of a device operation.
Typically one first creates a template (which contains the program code, and is expensive to create) and then instantiates it for use with specific buffers.
An instance of this class contains slots, which are named instances of
IOSlot. The user binds specific buffers to these slots to specify the memory used in the operation.This class is only useful when subclassed. The subclass will populate the slots. Subclasses also provide a _run function that handles the implementation of __call__.
Operations are arranged in a tree, with internal nodes subclassing
OperationSequence. Internal nodes provide slots that proxy for the slots of their children; the children’s slots should not be manipulated directly. When a parent ceases to exist, its children should no longer be used i.e., they cannot be reattached to a new parent.- Parameters:
command_queue – Command queue for the operation
allocator – Allocator used to allocate unbound slots
- slots
Maps names to slot instances
- Type:
dictionary
Extra slots which are not root slots, and hence cannot have buffers bound to them, but whose buffers can still be referenced. This is generally used by
OperationSequencewhen several slots are aliased to the same memory.- Type:
dictionary
- is_root
True if this operation is not part of an
OperationSequence.- Type:
boolean
- bind(**kwargs) None[source]
Bind buffers to slots by keyword.
Each keyword argument name specifies a slot name.
- buffer(name: str) DeviceArray[source]
Retrieve the buffer bound to a slot.
It will consult both
slotsandhidden_slots.- Parameters:
name (str) – Name of the slot to access
- Returns:
Buffer bound to slot name.
- Return type:
- Raises:
KeyError – If no slot with this name exists
TypeError – If the slot exists but it is an alias slot
ValueError – If the slot exists but does not yet have a buffer bound
- ensure_all_bound() None[source]
Make sure that all slots have a buffer bound, allocating if necessary.
- ensure_bound(name: str) None[source]
Make sure that a specific slot has a buffer bound, allocating if necessary.
- class katsdpsigproc.accel.OperationSequence(command_queue: AbstractCommandQueue, operations: Iterable[Tuple[str, Operation]], compounds: Mapping[str, Iterable[str]] | None = None, aliases: Mapping[str, Iterable[str]] | None = None, allocator: AbstractAllocator | None = None)[source]
Bases:
OperationConvenience class for an operation that is built up of smaller named operations.
Slots are mapped to share data between these smaller operations. Initially, each slot named slot in a child named op is remapped to a parent slot named op:slot. After this, each set provided in compounds is removed from the slots and combined into a single compound slot. Finally, each set in aliases is removed and combined into an alias slot.
For both compounds and aliases, if a child does not exist, it is skipped, and if none of the children exist, no action is taken.
- Parameters:
command_queue – Command queue for the operation
operations (sequence of 2-tuples) – Name, operation pairs to add. Calling the operation executes them in order
compounds (mapping of str to sequence of str, optional) – Names for compound slots, mapped to the original slot names that are replaced
aliases (mapping of str to sequence of str, optional) – Names for alias slots, mapped to the original slot names that are replaced
allocator (
DeviceAllocatororSVMAllocator, optional) – Allocator used to allocate unbound slots
- class katsdpsigproc.accel.SVMAllocator(context: AbstractContext[_B, _RB, _RS, _D, _P, _Q, _TQ])[source]
Bases:
Generic[_B,_RB,_RS,_D,_P,_Q,_TQ],AbstractAllocator[_RS]Allocate
SVMArrayobjects from a context.- allocate(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], padded_shape: Tuple[int, ...] | None = None, raw: _RS | None = None) SVMArray[source]
- context: AbstractContext
- class katsdpsigproc.accel.SVMArray(context: AbstractContext, shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], padded_shape: Tuple[int, ...] | None = None, raw: Any | None = None)[source]
Bases:
HostArray,DeviceArrayAn array that uses shared virtual memory to be accessible from both the host and the device.
Shared virtual memory is also known as managed memory (in CUDA).
It should not be used as a source or target for copies to the device, since it already resides on the device.
Due to limitations in PyOpenCL, this is currently only available for CUDA, and CUDA’s restrictions on managed memory apply. Only the base array (not views) can be passed to kernels.
- Parameters:
context – Context in which to allocate the memory
shape – Shape for the array
dtype – Data type for the array
padded_shape – Total size of memory allocation (defaults to shape)
raw (PyCUDA DeviceAllocation or PyOpenCL Buffer, optional) – If specified, provides the backing memory
- get(command_queue: AbstractCommandQueue, ary: ndarray | None = None) ndarray[source]
Copy synchronously copy from self to ary.
If ary is None, or if it is not suitable as a target, the copy is to a newly allocated
HostArray. The actual target is returned. For SVMArray, this is a CPU copy.
- get_async(command_queue: AbstractCommandQueue, ary: ndarray | None = None) ndarray[source]
Copy asynchronously from self to ary (see get).
This is implemented synchronously for SVMArray, but exists for compatibility.
- get_region(command_queue: AbstractCommandQueue, ary: ndarray, device_region: int | slice | None | Tuple[int | slice | None, ...], ary_region: int | slice | None | Tuple[int | slice | None, ...], blocking: bool = True) None[source]
Perform a device-to-host copy of a subregion of self to ary.
See
copy_region()for a description of how regions are specified.- Parameters:
command_queue – Command queue for the operation.
ary – Target of the copy
device_region – Index expressions constructed by
np.s_ornp.index_exp, to specify the source and target regions.ary_region – Index expressions constructed by
np.s_ornp.index_exp, to specify the source and target regions.blocking – If false, the operation will be asynchronous.
- Raises:
TypeError – if the source and destination do not have the same dtype
ValueError – if the source and destination regions select different shapes
ValueError – if ary is not a
HostArrayor is a view of oneIndexError – if the source or destination regions are unsupported or out-of-range
- set(command_queue: AbstractCommandQueue, ary: ndarray) None[source]
Copy synchronously from ary to self.
For SVMArray, this is a CPU copy.
- set_async(command_queue: AbstractCommandQueue, ary: ndarray) None[source]
Copy asynchronously from ary to self.
This is implemented synchronously for SVMArray, but exists for compatibility.
- set_region(command_queue: AbstractCommandQueue, ary: ndarray, device_region: int | slice | None | Tuple[int | slice | None, ...], ary_region: int | slice | None | Tuple[int | slice | None, ...], blocking: bool = True) None[source]
Perform a host-to-device copy of a subregion ary to self.
See
copy_region()for a description of how regions are specified.- Parameters:
command_queue – Command queue for the operation.
ary – Source of the copy
device_region – Index expressions constructed by
np.s_ornp.index_exp, to specify the source and target regions.ary_region – Index expressions constructed by
np.s_ornp.index_exp, to specify the source and target regions.blocking – If false, the operation will be asynchronous.
- Raises:
TypeError – if the source and destination do not have the same dtype
ValueError – if the source and destination regions select different shapes
IndexError – if the source or destination regions are unsupported or out-of-range
- katsdpsigproc.accel.all_devices() List[AbstractDevice][source]
Return a list of all discovered devices.
- katsdpsigproc.accel.build(context: AbstractContext, name: str, render_kws: Mapping[str, Any] | None = None, extra_dirs: List[str | PathLike] | None = None, extra_flags: List[str] | None = None, source: str | None = None) AbstractProgram[source]
Build a source module from a mako template.
- Parameters:
context – Context for which to compile the code
name – Source file name, relative to the katsdpsigproc module
render_kws – Keyword arguments to pass to mako
extra_dirs – Extra directories to search for source code
extra_flags – Flags to pass to the compiler
source – If specified, provides the source, overriding name
- Returns:
Compiled module
- Return type:
program
- katsdpsigproc.accel.candidate_devices(device_filter: Callable[[AbstractDevice], bool] | None = None) Sequence[AbstractDevice][source]
Get devices that are considered for
create_some_context().Refer to
create_some_context()for documentation of how this list is affected by device_filter and environment variables. If no matching devices are found, returns an empty list. If an environment variable is out of range, raisesRuntimeError.
- katsdpsigproc.accel.create_some_context(interactive: bool = True, device_filter: Callable[[AbstractDevice], bool] | None = None) AbstractContext[source]
Create a single-device context, selecting a device automatically.
This is similar to pyopencl.create_some_context. A number of environment variables can be set to limit the choice to a single device:
KATSDPSIGPROC_DEVICE: device number from amongst all devices
CUDA_DEVICE: CUDA device number (compatible with PyCUDA)
PYOPENCL_CTX: OpenCL platform and optionally device number (compatible with PyOpenCL)
The first of these that is encountered takes effect. If it does not exist, an exception is thrown.
- Parameters:
interactive – If true, and sys.stdin.isatty() is true, and there are multiple choices, it will prompt the user. Otherwise, it will choose the first available device, favouring CUDA over OpenCL, then GPU over accelerators over other OpenCL devices.
device_filter – If specified, each device in turn is passed to it, and it must return True to keep the device as a candidate or False to reject it.
- Raises:
RuntimeError – If no device could be found or the user made an invalid selection
- katsdpsigproc.accel.visualize_operation(operation: Operation, filename: str) None[source]
Write a visualization of an
Operationto file.This requires the graphviz package to be installed.
- Parameters:
operation – Operation or operation sequence to visualize
filename – Base filename to write. It should end in
.gv, and a rendered PDF will automatically be produced with a filename ending in.gv.pdf.
katsdpsigproc.cuda module
Abstraction layer over PyCUDA.
It implements the abstract interfaces defined in katsdpsigproc.abc.
- class katsdpsigproc.cuda.CommandQueue(context: Context, pycuda_stream: pycuda.driver.Stream | None = None, profile: bool = False)[source]
Bases:
AbstractCommandQueue[GPUArray,Context,Event,Kernel]- context: _C
- enqueue_copy_buffer_rect(src_buffer: pycuda.gpuarray.GPUArray | ndarray, dest_buffer: pycuda.gpuarray.GPUArray | ndarray, src_origin: int, dest_origin: int, shape: Sequence[int], src_strides: Sequence[int], dest_strides: Sequence[int]) None[source]
Copy a subregion of one buffer to another.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
copy_region()for a high-level interface.- Parameters:
src_buffer – Source and destination buffers
dest_buffer – Source and destination buffers
src_origin – Offsets for the start of the copy, in bytes
dest_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
src_strides – Strides for the source and destination memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
dest_strides – Strides for the source and destination memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
- enqueue_kernel(kernel: Kernel, args: Sequence[Any], global_size: Tuple[int, ...], local_size: Tuple[int, ...]) None[source]
Enqueue a kernel to the command queue.
Warning
It is not thread-safe to call this function in two threads on the same kernel at the same time.
- Parameters:
kernel – Kernel to run
args – Arguments to pass to the kernel. Refer to the PyOpenCL/CUDA documentation for details. Additionally, this function allows a low-level device array to be passed.
global_size – Number of work-items in each global dimension
local_size – Number of work-items in each local dimension. Must divide exactly into global_size.
- enqueue_read_buffer(buffer: pycuda.gpuarray.GPUArray, data: Any, blocking: bool = True) None[source]
Copy data from the device to the host.
Only whole-buffer copies are supported, and the shape and type must match. In general, one should use the convenience functions in
accel.DeviceArray.- Parameters:
buffer – Source
data – Target
blocking – If true (default) the call blocks until the copy is complete
- enqueue_read_buffer_rect(buffer: pycuda.gpuarray.GPUArray | ndarray, data: Any, buffer_origin: int, data_origin: int, shape: Sequence[int], buffer_strides: Sequence[int], data_strides: Sequence[int], blocking: bool = True) None[source]
Copy a region of a buffer to host memory.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
set_region()for a high-level interface.- Parameters:
buffer – Source
data (array-like) – Target
buffer_origin – Offsets for the start of the copy, in bytes
data_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
buffer_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
data_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
blocking – If true, block until the transfer is complete.
- enqueue_wait_for_events(events: Sequence[Event]) None[source]
Enqueue a barrier to wait for all events in events.
- enqueue_write_buffer(buffer: pycuda.gpuarray.GPUArray, data: Any, blocking: bool = True) None[source]
Copy data from the host to the device.
Only whole-buffer copies are supported, and the shape and type must match. In general, one should use the convenience functions in
accel.DeviceArray.- Parameters:
buffer – Target
data (array-like) – Source
blocking – If true (default), the call blocks until the source has been fully read (it has not necessarily reached the device).
- enqueue_write_buffer_rect(buffer: pycuda.gpuarray.GPUArray | ndarray, data: Any, buffer_origin: int, data_origin: int, shape: Sequence[int], buffer_strides: Sequence[int], data_strides: Sequence[int], blocking: bool = True) None[source]
Copy a region of host memory to a buffer.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
set_region()for a high-level interface.- Parameters:
buffer – Target
data (array-like) – Source
buffer_origin – Offsets for the start of the copy, in bytes
data_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
buffer_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
data_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
blocking – If true, block until the transfer is complete.
- enqueue_zero_buffer(buffer: pycuda.gpuarray.GPUArray | ndarray) None[source]
Fill a buffer with zero bytes.
- class katsdpsigproc.cuda.Context(pycuda_context: pycuda.driver.Context)[source]
Bases:
AbstractContext[GPUArray,DeviceAllocation,_RawManaged,Device,Program,CommandQueue,TuningCommandQueue]- allocate(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], raw: pycuda.driver.DeviceAllocation | None = None) pycuda.gpuarray.GPUArray[source]
Create a typed buffer on the device.
- Parameters:
shape – Shape for the array
dtype – Type for the data
raw – Memory backing the array (automatically allocated if
None)
- allocate_pinned(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any]) ndarray[source]
Create a buffer in host memory that can be efficiently copied to and from the device.
- Parameters:
shape – Shape for the array
dtype – Type for the data
- allocate_raw(n_bytes: int) pycuda.driver.DeviceAllocation[source]
Create an untyped buffer on the device.
- allocate_svm(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], raw: _RawManaged | None = None) ndarray[source]
Allocate shared virtual memory.
- allocate_svm_raw(n_bytes: int) _RawManaged[source]
Allocate raw storage that can be passed to
allocate_svm().
- compile(source: str, extra_flags: List[str] | None = None) Program[source]
Build a program object from source.
- Parameters:
source – Source code
extra_flags – Extra parameters to pass to the compiler
- create_command_queue(profile: bool = False) CommandQueue[source]
Create a new command queue associated with this context.
- Parameters:
profile – If true, the command queue will support timing kernels
- create_tuning_command_queue() TuningCommandQueue[source]
Create a new command queue for doing autotuning.
- class katsdpsigproc.cuda.Device(pycuda_device: pycuda.driver.Device)[source]
Bases:
AbstractDevice[Context]- classmethod get_devices_by_platform() Sequence[Sequence[_D]][source]
Return a list of all devices, with a sub-list per platform.
- class katsdpsigproc.cuda.Event(pycuda_event: pycuda.driver.Event)[source]
Bases:
AbstractEvent- time_since(prior_event: Event) float[source]
Return the time in seconds from prior_event to self.
Unlike the PyCUDA method of the same name, this will wait for the events to complete if they have not already.
- time_till(next_event: Event) float[source]
Return the time in seconds from this event to next_event.
See
time_since().
- class katsdpsigproc.cuda.Kernel(program: Program, name: str)[source]
Bases:
AbstractKernel[Program]
- class katsdpsigproc.cuda.Program(pycuda_module: pycuda.driver.Module)[source]
Bases:
AbstractProgram[Kernel]
katsdpsigproc.fft module
Fast Fourier Transforms.
Currently this module only supports CUDA, using cuFFT.
It does not use scikit-cuda, because that insists on creating a cublas context to obtain a version number, which then permanently consumes GPU memory on some arbitrary GPU. It also seems the maintainer no longer has time.
Only Linux is supported. Windows and MacOS support would require changing the code for locating the library.
- class katsdpsigproc.fft.Fft(template: FftTemplate, command_queue: AbstractCommandQueue, mode: FftMode, allocator: AbstractAllocator | None = None)[source]
Bases:
OperationForward or inverse Fourier transformation.
Slots
- src
Input data
- dest
Output data
- work_area
Scratch area for work. The contents should not be used; it is made available so that it can be aliased with other scratch areas. If the implementation does not need any device memory for scratch space, this slot will not exist.
- Parameters:
template – Operation template
command_queue (katsdpsigproc.cuda.CommandQueue) – Command queue for the operation
mode – FFT direction
allocator – Allocator used to allocate unbound slots
- command_queue: CommandQueue
- class katsdpsigproc.fft.FftMode(value)[source]
Bases:
EnumAn enumeration.
- FORWARD = 0
- INVERSE = 1
- class katsdpsigproc.fft.FftTemplate(context: AbstractContext, N: int, shape: Tuple[int, ...], dtype_src: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], dtype_dest: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], padded_shape_src: Tuple[int, ...], padded_shape_dest: Tuple[int, ...], tuning: Dict[str, Any] | None = None)[source]
Bases:
objectOperation template for a forward or reverse FFT.
The transformation is done over the last N dimensions, with the remaining dimensions for batching multiple arrays to be transformed. Dimensions before the first N must not be padded.
This template bakes in more information than most (data shapes), which is due to constraints in CUFFT.
The template can specify real->complex, complex->real, or complex->complex. In the last case, the same template can be used to instantiate forward or inverse transforms. Otherwise, real->complex transforms must be forward, and complex->real transforms must be inverse.
For real<->complex transforms, the final dimension of the padded shape need only be \(\lfloor\frac{L}{2}\rfloor + 1\), where \(L\) is the last element of shape.
The transform is unnormalised: performing a forward followed by a reverse transform will scale the result by the number of elements.
- Parameters:
context – Context for the operation
N – Number of dimensions for the transform
shape – Shape of the data (N or more dimensions). For real->complex or complex->real transformation, this is the size of the real side of the transform.
dtype_src ({np.float32, np.float64, np.complex64, np.complex128}) – Data type for input
dtype_dest ({np.float32, np.float64, np.complex64, np.complex128}) – Data type for output
padded_shape_src – Padded shape of the input
padded_shape_dest – Padded shape of the output
tuning – Tuning parameters (currently unused)
katsdpsigproc.fill module
Fill device array with a constant value.
- class katsdpsigproc.fill.Fill(template: FillTemplate, command_queue: AbstractCommandQueue, shape: Tuple[int, ...], allocator: AbstractAllocator | None = None)[source]
Bases:
OperationConcrete instance of
FillTemplate.Slots
- data
Array to be filled (padding will be filled too)
- Parameters:
template – Operation template
command_queue – Command queue for the operation
shape – Shape for the data slot
allocator – Allocator used to allocate unbound slots
- class katsdpsigproc.fill.FillTemplate(context: AbstractContext, dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], ctype: str, tuning: Mapping[str, Any] | None = None)[source]
Bases:
objectFill a device array with a constant value.
The pad elements are also filled with this value.
Note
To fill with zeros, use
DeviceArray.zero()- Parameters:
context – Context for which kernels will be compiled
dtype – Type of data elements
ctype – Type (in C/CUDA, not numpy) of data elements
tuning –
Kernel tuning parameters; if omitted, will autotune. The possible parameters are
wgs: number of workitems per workgroup
- classmethod autotune(context: AbstractContext, dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], ctype: str) Mapping[str, Any][source]
- instantiate(command_queue: AbstractCommandQueue, shape: Tuple[int, ...], allocator: AbstractAllocator | None = None) Fill[source]
katsdpsigproc.maskedsum module
Perform on-device percentile calculation of 2D arrays.
- class katsdpsigproc.maskedsum.MaskedSum(template: MaskedSumTemplate, command_queue: AbstractCommandQueue, shape: Tuple[int, int], allocator: AbstractAllocator | None = None)[source]
Bases:
OperationConcrete instance of
MaskedSumTemplate.Slots
- src
Input type complex64 Shape is number of rows by number of columns, masked sum is calculated along the rows, independently per column.
- mask
Input type float32 Shape is (number of rows of input).
- dest
Output type complex64 Shape is (number of columns of input)
- class katsdpsigproc.maskedsum.MaskedSumTemplate(context: AbstractContext, use_amplitudes: bool = False, tuning: _TuningDict | None = None)[source]
Bases:
objectKernel for calculating masked sums of a 2D array of data.
Masked sums are calculated per column (along rows, independently per column).
- Parameters:
context – Context for which kernels will be compiled
use_amplitudes – If true, the amplitudes of the inputs rather than the inputs themselves will be summed.
tuning –
Kernel tuning parameters; if omitted, will autotune. The possible parameters are
size: number of workitems per workgroup
- classmethod autotune(context: AbstractContext, use_amplitudes: bool) _TuningDict[source]
- autotune_version = 1
- instantiate(command_queue: AbstractCommandQueue, shape: Tuple[int, int], allocator: AbstractAllocator | None = None) MaskedSum[source]
katsdpsigproc.opencl module
Abstraction layer over PyOpenCL.
It implements the abstract interfaces defined by katsdpsigproc.abc.
- class katsdpsigproc.opencl.CommandQueue(context: Context, pyopencl_command_queue: pyopencl.CommandQueue | None = None, profile: bool = False)[source]
Bases:
AbstractCommandQueue[Array,Context,Event,Kernel]Abstraction of a command queue.
If no existing command queue is passed to the constructor, a new one is created.
- Parameters:
context (katsdpsigproc.abc._C) – Context owning the queue
pyopencl_command_queue – Existing command queue to wrap
profile – If true and pyopencl_command_queue is omitted, enabling profiling (timing) on the queue.
- context: _C
- enqueue_copy_buffer(src_buffer: _DummyArray | pyopencl.array.Array, dest_buffer: _DummyArray | pyopencl.array.Array) None[source]
- enqueue_copy_buffer_rect(src_buffer: _DummyArray | pyopencl.array.Array, dest_buffer: _DummyArray | pyopencl.array.Array, src_origin: int, dest_origin: int, shape: Sequence[int], src_strides: Sequence[int], dest_strides: Sequence[int]) None[source]
Copy a subregion of one buffer to another.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
copy_region()for a high-level interface.- Parameters:
src_buffer – Source and destination buffers
dest_buffer – Source and destination buffers
src_origin – Offsets for the start of the copy, in bytes
dest_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
src_strides – Strides for the source and destination memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
dest_strides – Strides for the source and destination memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
- enqueue_kernel(kernel: Kernel, args: Sequence[Any], global_size: Tuple[int, ...], local_size: Tuple[int, ...]) None[source]
Enqueue a kernel to the command queue.
Warning
It is not thread-safe to call this function in two threads on the same kernel at the same time.
- Parameters:
kernel – Kernel to run
args – Arguments to pass to the kernel. Refer to the PyOpenCL/CUDA documentation for details. Additionally, this function allows a low-level device array to be passed.
global_size – Number of work-items in each global dimension
local_size – Number of work-items in each local dimension. Must divide exactly into global_size.
- enqueue_read_buffer(buffer: pyopencl.array.Array, data: Any, blocking: bool = True) None[source]
Copy data from the device to the host.
Only whole-buffer copies are supported, and the shape and type must match. In general, one should use the convenience functions in
accel.DeviceArray.- Parameters:
buffer – Source
data – Target
blocking – If true (default) the call blocks until the copy is complete
- enqueue_read_buffer_rect(buffer: pyopencl.array.Array, data: Any, buffer_origin: int, data_origin: int, shape: Sequence[int], buffer_strides: Sequence[int], data_strides: Sequence[int], blocking: bool = True) None[source]
Copy a region of a buffer to host memory.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
set_region()for a high-level interface.- Parameters:
buffer – Source
data (array-like) – Target
buffer_origin – Offsets for the start of the copy, in bytes
data_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
buffer_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
data_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
blocking – If true, block until the transfer is complete.
- enqueue_wait_for_events(events: Sequence[Event]) None[source]
Enqueue a barrier to wait for all events in events.
- enqueue_write_buffer(buffer: pyopencl.array.Array, data: Any, blocking: bool = True) None[source]
Copy data from the host to the device.
Only whole-buffer copies are supported, and the shape and type must match. In general, one should use the convenience functions in
accel.DeviceArray.- Parameters:
buffer – Target
data (array-like) – Source
blocking – If true (default), the call blocks until the source has been fully read (it has not necessarily reached the device).
- enqueue_write_buffer_rect(buffer: pyopencl.array.Array, data: Any, buffer_origin: int, data_origin: int, shape: Sequence[int], buffer_strides: Sequence[int], data_strides: Sequence[int], blocking: bool = True) None[source]
Copy a region of host memory to a buffer.
This is a low-level interface that ignores the shape, strides etc of the buffers, and treats them as byte arrays. It also only supports 3 or fewer dimensions. Use
set_region()for a high-level interface.- Parameters:
buffer – Target
data (array-like) – Source
buffer_origin – Offsets for the start of the copy, in bytes
data_origin – Offsets for the start of the copy, in bytes
shape – Shape of the region to copy (1-3 elements). The first dimension is a byte count.
buffer_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
data_strides – Strides for the destination and source memory layout, with the same length as shape. The first element of each must be 1, and each element must be a factor of the next element.
blocking – If true, block until the transfer is complete.
- enqueue_zero_buffer(buffer: pyopencl.array.Array) None[source]
Fill a buffer with zero bytes.
- class katsdpsigproc.opencl.Context(pyopencl_context: pyopencl.Context)[source]
Bases:
AbstractContext[Array,Buffer,None,Device,Program,CommandQueue,TuningCommandQueue]Abstraction of an OpenCL context.
- allocate(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], raw: pyopencl.Buffer | None = None) pyopencl.array.Array[source]
Create a typed buffer on the device.
- Parameters:
shape – Shape for the array
dtype – Type for the data
raw – Memory backing the array (automatically allocated if
None)
- allocate_pinned(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any]) ndarray[source]
Create a buffer in host memory that can be efficiently copied to and from the device.
- Parameters:
shape – Shape for the array
dtype – Type for the data
- allocate_raw(n_bytes: int) pyopencl.Buffer[source]
Create an untyped buffer on the device.
- allocate_svm(shape: Tuple[int, ...], dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], raw: None = None) ndarray[source]
Allocate shared virtual memory.
- allocate_svm_raw(n_bytes: int) None[source]
Allocate raw storage that can be passed to
allocate_svm().
- compile(source: str, extra_flags: List[str] | None = None) Program[source]
Build a program object from source.
- Parameters:
source – Source code
extra_flags – Extra parameters to pass to the compiler
- create_command_queue(profile: bool = False) CommandQueue[source]
Create a new command queue associated with this context.
- Parameters:
profile – If true, the command queue will support timing kernels
- create_tuning_command_queue() TuningCommandQueue[source]
Create a new command queue for doing autotuning.
- class katsdpsigproc.opencl.Device(pyopencl_device: pyopencl.Device)[source]
Bases:
AbstractDevice[Context]Abstraction of an OpenCL device.
- classmethod get_devices_by_platform() Sequence[Sequence[_D]][source]
Return a list of all devices, with a sub-list per platform.
- class katsdpsigproc.opencl.Event(pyopencl_event: pyopencl.Event)[source]
Bases:
AbstractEvent- time_since(prior_event: Event) float[source]
Return the time in seconds from prior_event to self.
Unlike the PyCUDA method of the same name, this will wait for the events to complete if they have not already.
- time_till(next_event: Event) float[source]
Return the time in seconds from this event to next_event.
See
time_since().
- class katsdpsigproc.opencl.Kernel(program: Program, name: str)[source]
Bases:
AbstractKernel[Program]
- class katsdpsigproc.opencl.Program(pyopencl_program: pyopencl.Program)[source]
Bases:
AbstractProgram[Kernel]
- class katsdpsigproc.opencl.TuningCommandQueue(*args, **kwargs)[source]
Bases:
CommandQueue,AbstractTuningCommandQueue[Array,Context,Event,Kernel]Command queue with extra facilities for autotuning.
It keeps track of kernels that are enqueued since the last call to
start_tuning(), and reports the total time they consume whenstop_tuning()is called.- context: _C
- enqueue_kernel(kernel: Kernel, args: Sequence[Any], global_size: Tuple[int, ...], local_size: Tuple[int, ...]) None[source]
Enqueue a kernel to the command queue.
Warning
It is not thread-safe to call this function in two threads on the same kernel at the same time.
- Parameters:
kernel – Kernel to run
args – Arguments to pass to the kernel. Refer to the PyOpenCL/CUDA documentation for details. Additionally, this function allows a low-level device array to be passed.
global_size – Number of work-items in each global dimension
local_size – Number of work-items in each local dimension. Must divide exactly into global_size.
katsdpsigproc.percentile module
Perform on-device percentile calculation of 2D arrays.
- class katsdpsigproc.percentile.Percentile5(template: Percentile5Template, command_queue: AbstractCommandQueue, shape: Tuple[int, int], column_range: Tuple[int, int] | None, allocator: AbstractAllocator | None = None)[source]
Bases:
OperationConcrete instance of
PercentileTemplate.Warning
Assumes all values are positive when template.is_amplitude is True.
Slots
- src
Input type float32 or complex64. Shape is number of rows by number of columns, where 5 percentiles are computed along the columns, per row.
- dest
Output type float32. Shape is (5, number of rows of input)
- Parameters:
template – Operation template
command_queue – Command queue for the operation
shape – Shape of the source data
column_range – Half-open interval of columns that will be processed. If not specified, all columns are processed.
allocator – Allocator used to allocate unbound slots
- class katsdpsigproc.percentile.Percentile5Template(context: AbstractContext, max_columns: int, is_amplitude: bool = True, tuning: _TuningDict | None = None)[source]
Bases:
objectKernel for calculating percentiles of a 2D array of data.
5 percentiles [0,100,25,75,50] are calculated per row (along columns, independently per row). The lower percentile element, rather than a linear interpolation is chosen. WARNING: assumes all values are positive.
- Parameters:
context – Context for which kernels will be compiled
max_columns – Maximum number of columns
is_amplitude – If true, the inputs are scalar amplitudes; if false, they are complex numbers and the answers are computed on the absolute values
tuning –
Kernel tuning parameters; if omitted, will autotune. The possible parameters are
size: number of workitems per workgroup along each row
wgsy: number of workitems per workgroup along each column
- classmethod autotune(context: AbstractContext, max_columns: int, is_amplitude: bool) _TuningDict[source]
- autotune_version = 8
- instantiate(command_queue: AbstractCommandQueue, shape: Tuple[int, int], column_range: Tuple[int, int] | None = None, allocator: AbstractAllocator | None = None) Percentile5[source]
katsdpsigproc.pytest_plugin module
Plugin for testing with pytest.
See the pytest section of the documentation for details.
- katsdpsigproc.pytest_plugin.command_queue(context: AbstractContext) AbstractCommandQueue[source]
- katsdpsigproc.pytest_plugin.context(device: AbstractDevice, patch_autotune) Generator[AbstractContext, None, None][source]
katsdpsigproc.reduce module
Reduction algorithms.
- class katsdpsigproc.reduce.HReduce(template: HReduceTemplate, command_queue: AbstractCommandQueue, shape: Tuple[int, int], column_range: Tuple[int, int] | None = None, allocator: AbstractAllocator | None = None)[source]
Bases:
OperationConcrete instance of
HReduceTemplate.In each row, the elements in the specified column range are reduced using the reduction operator supplied to the template.
Slots
- srcrows × columns
Input data
- destrows
Output reductions
- Parameters:
template – Operation template
command_queue – Command queue for the operation
shape – Shape for the source slot
column_range – Half-open range of columns to reduce (defaults to the entire array)
allocator – Allocator used to allocate unbound slots
- class katsdpsigproc.reduce.HReduceTemplate(context: AbstractContext, dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], ctype: str, op: str, identity: str, extra_code: str = '', tuning: _TuningDict | None = None)[source]
Bases:
objectPerforms reduction along rows in a 2D array.
Only commutative reduction operators are supported.
- Parameters:
context – Context for which kernels will be compiled
dtype – Type of data elements
ctype – Type (in C/CUDA, not numpy) of data elements
op – C expression to combine two variables, a and b
identity – C expression for an identity value for op
extra_code – Arbitrary C code to paste in (for use by op or identity)
tuning –
Kernel tuning parameters; if omitted, will autotune. The possible parameters are
wgsx: number of workitems per workgroup per row
wgsy: number of rows to handle per workgroup
katsdpsigproc.resource module
Utilities for scheduling device operations with asyncio.
- class katsdpsigproc.resource.JobQueue[source]
Bases:
objectMaintain a list of in-flight asynchronous jobs.
- class katsdpsigproc.resource.Resource(value: _T, loop: AbstractEventLoop | None = None)[source]
Bases:
Generic[_T]Abstraction of a contended resource, which may exist on a device.
Passing of ownership is done via futures. Acquiring a resource is a non-blocking operation that returns two futures: a future to wait for before use, and a future to be signalled with a result when done. The value of each of these futures is a (possibly empty) list of device events which must be waited on before more device work is scheduled.
- Parameters:
value (object) – Underlying resource to manage
loop – Event loop used for asynchronous operations. If not specified, defaults to
asyncio.get_event_loop().
- acquire() ResourceAllocation[_T][source]
Indicate intent to acquire the resource.
This does not actually acquire the resource, but instead returns a handle that can be used to acquire and release it later. Acquisitions always occur in the order in which calls to
acquire()are made.See
ResourceAllocationfor further details.
- class katsdpsigproc.resource.ResourceAllocation(start: asyncio.Future[List[AbstractEvent]], end: asyncio.Future[List[AbstractEvent]], value: _T, loop: AbstractEventLoop)[source]
Bases:
Generic[_T]A handle representing a future acquisition of a resource.
There are two ways to make the acquisition current:
Call
wait(), which returns a future. The result of this future is a list of device events that must complete before it is safe to use the resource; they can either be waited for on the host (for example, usingrun_in_executor), or by enqueuing a device wait before device operations on the resource.The above steps (with a blocking host wait) can be combined using
wait_events().
Instances of this class should never be constructed directly. Instead, use
Resource.acquire().This class implements the context manager protocol, providing the underlying resource as the return value. This handles some cleanup if the method using the resource raises an exception without releasing the resource cleanly.
- ready(events: List[AbstractEvent] | None = None) None[source]
Indicate that we are done with the resource, and that subsequent acquirers may use it.
Note that even if the caller decides that it doesn’t need to use the resource, it must not call this until the resource is ready.
- Parameters:
events – Device events that must be waited on before the resource is ready
- wait() asyncio.Future[List[AbstractEvent]][source]
Return a future that will be set to a list of device events to waited for.
katsdpsigproc.transpose module
Transpose 2D arrays on a device.
- class katsdpsigproc.transpose.Transpose(template: TransposeTemplate, command_queue: AbstractCommandQueue, shape: Tuple[int, int], allocator: AbstractAllocator | None = None)[source]
Bases:
OperationConcrete instance of
TransposeTemplate.Slots
- src
Input
- dest
Output
- class katsdpsigproc.transpose.TransposeTemplate(context: AbstractContext, dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], ctype: str, tuning: _TuningDict | None = None)[source]
Bases:
objectKernel for transposing a 2D array of data.
- Parameters:
context – Context for which kernels will be compiled
dtype – Type of data elements
ctype – Type (in C/CUDA, not numpy) of data elements
tuning –
Kernel tuning parameters; if omitted, will autotune. The possible parameters are
block: number of workitems per workgroup in each dimension
vtx, vty: elements per workitem in each dimension
- classmethod autotune(context: AbstractContext, dtype: dtype | None | type | _SupportsDType | str | Tuple[Any, int] | Tuple[Any, int | Sequence[int]] | List[Any] | _DTypeDict | Tuple[Any, Any], ctype: str) _TuningDict[source]
- autotune_version = 1
- instantiate(command_queue: AbstractCommandQueue, shape: Tuple[int, int], allocator: AbstractAllocator | None = None) Transpose[source]
katsdpsigproc.tune module
Tools for autotuning algorithm parameters and caching the results.
Note that this is intended to apply only to parameters that affect performance, such as work-group sizes, and not the outputs.
The design is that each computation class that implements autotuning will provide a class method (typically called autotune) that takes a context and user-supplied parameters (related to the problem rather than the implementation), and returns a dictionary of values. A decorator is applied to the autotuning method to cause the results to be cached.
The class may define an autotune_version class attribute to version the table.
At present, caching is implemented in an sqlite3 database. There is a table corresponding to each autotuning method. The table has the following columns:
device_name: string, name for the compute device
device_platform: string, name for the compute device’s platform
device_version: version string for the driver
arg_*: for each parameter passed to the autotuner
value_*: for each value returned from the autotuner
The database is stored in the user cache directory.
- katsdpsigproc.tune.adapt_value(value: Any) Any[source]
Convert value to a type that can be used in sqlite3.
This is not done through the sqlite3 adapter interface, because that is global rather than per-connection. This also only applies to lookup keys, not results, because it is not a symmetric relationship.
- katsdpsigproc.tune.autotune(generate: Callable[[...], Callable[[int], float] | None], time_limit: float = 0.1, threads: int | None = None, **kwargs: Any) Mapping[str, Any][source]
Run a number of tuning experiments and find the optimal combination of parameters.
Each argument is a iterable. The generate function is passed each element of the Cartesian product (by keyword), and returns a callable. This callable is passed an iteration count, and returns a score: lower is better. If either generate or the function it returns raises an exception, it is suppressed. Returns a dictionary with the best combination of values.
Instead of a callable, the generate function may return None to skip a configuration that it knows will be unsuitable. Throwing an exception has essentially the same effect (and is used in code written before returning None was allowed), but returning None is preferred since an exception just clutters the log.
The scoring function should not do a warmup pass: that is handled by this function.
- Parameters:
generate – function that creates a scoring function
time_limit – amount of time to spend testing each configuration (excluding setup time)
threads – number of parallel calls to generate to make at a time (defaults to CPU count)
- Raises:
Exception – if every combination throws an exception, the last exception is re-raised.
ValueError – if the Cartesian product is empty
- katsdpsigproc.tune.autotuner(test: Mapping[str, Any]) Callable[[_T], _T][source]
Decorate a function to make it an autotuning function that caches the result.
The function must take a class and a context as the first two arguments. The remaining arguments form a cache key, along with properties of the device and the name of the function.
Every argument to the function must have a name, which implies that the *args construct may not be used.
- Parameters:
test (dictionary) – A value that will be returned by
stub_autotuner().
- katsdpsigproc.tune.autotuner_impl(test: Mapping[str, Any], fn: _TuningFunc, *args: Any, **kwargs: Any) Mapping[str, Any][source]
Implement
autotuner().It is split into a separate function so that mocks can patch it.
- katsdpsigproc.tune.force_autotuner(test: Mapping[str, Any], fn: _TuningFunc, *args: Any, **kwargs: Any) Mapping[str, Any][source]
Drop-in replacement for
autotuner_impl()that does not do any caching.It is intended to be used with a mocking framework.
- katsdpsigproc.tune.make_measure(queue: AbstractTuningCommandQueue, function: Callable[[], None]) Callable[[int], float][source]
Generate a measurement function.
The result can be returned by the function passed to
autotune(). It calls function (with no arguments) the appropriate number of times and returns the averaged elapsed time as measured by queue.
Module contents
Karoo Array Telescope accelerated signal processing tools.