For example, I can use the following code to solve linear equation in GPU, but this package `Krylov.jl`

cannot support the complex matrix input, so when solving the complex linear equation, it has a poor efficiency. I found this package `ParallelStencil.jl`

had a high efficiency in GPU calculation. Therefore, can I use this package to solve complex linear equation in GPU?

```
# Example with a general square system
using CUDA, Krylov, LinearOperators, BenchmarkTools
using CUDA.CUSPARSE, SparseArrays, LinearAlgebra, IncompleteLU
# CPU Arrays
n=2000
A_cpu = sprand(n,n,0.3);
A_cpu = A_cpu'*A_cpu
b_cpu = rand(n)
# GPU Arrays
@time A_gpu = CuSparseMatrixCSC(A_cpu)
@time b_gpu = CuVector(b_cpu)
println("A_gpu=",typeof(A_gpu))
println("b_gpu=",typeof(b_gpu))
# LU â A for CuSparseMatrixCSC matrices
P = ilu02(A_gpu, 'O')
# Solve Py = x
function ldiv!(y, P, x)
copyto!(y, x) # Variant for CuSparseMatrixCSR
sv2!('N', 'L', 'N', 1.0, P, y, 'O') # sv2!('N', 'L', 'U', 1.0, P, y, 'O')
sv2!('N', 'U', 'U', 1.0, P, y, 'O') # sv2!('N', 'U', 'N', 1.0, P, y, 'O')
return y
end
# Operator that model Pâ»Âč
n = length(b_gpu)
T = eltype(b_gpu)
println(T)
opM = LinearOperator(T, n, n, false, false, (y, x, Î±, ÎČ) -> ldiv!(y, P, x))
println(typeof(opM))
# Solve an unsymmetric system with an incomplete LU preconditioner on GPU
@time (x, stats) = bicgstab(A_gpu, b_gpu, M=opM)
```