Skip to content

GPU scalar indexing when transpose(::CuArray) #970

@mattsignorelli

Description

@mattsignorelli

It's weird because x = CUDA.zeros(1,4) works fine but x = transpose(CUDA.zeros(4)) gives the scalar indexing problem.

MWE:

using KernelAbstractions, CUDA
import DifferentiationInterface as DI

@kernel function foo!(y, x)
    i = @index(Global)
    a = 2*i - 1
    b = 2*i
    offset = (i-1)*4
    y[a] = (offset+1)*x[a] + (offset+2)*x[b]
    y[b] = (offset+3)*x[a] + (offset+4)*x[b]
end

kernel! = foo!(CUDA.CUDABackend())
f!(y,x) = kernel!(y, x, ndrange=2)

# This works fine:
x = CUDA.zeros(1,4)
y = CUDA.rand(1,4)
prep = DI.prepare_jacobian(f!, y, DI.AutoForwardFromPrimitive(DI.AutoForwardDiff()), x);
DI.value_and_jacobian!(fun!, y, jac, prep, DI.AutoForwardFromPrimitive(DI.AutoForwardDiff()), x)
#= Output:
4×4 CuArray{Float32, 2, CUDA.DeviceMemory}:
 1.0  2.0  0.0  0.0
 3.0  4.0  0.0  0.0
 0.0  0.0  5.0  6.0
 0.0  0.0  7.0  8.0
=#

# This causes scalar indexing:
x = transpose(CUDA.zeros(4))
y = transpose(CUDA.rand(4))
prep = DI.prepare_jacobian(f!, y, DI.AutoForwardFromPrimitive(DI.AutoForwardDiff()), x);
DI.value_and_jacobian!(fun!, y, jac, prep, DI.AutoForwardFromPrimitive(DI.AutoForwardDiff()), x)

Metadata

Metadata

Assignees

No one assigned

    Labels

    gpuRun GPU tests with buildkite

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions