FUTURE plans for GraphBLAS:

    GrB_IndexBinaryOp : very important for many graph algorithms

    JIT package: don't just check 1st line of GraphBLAS.h when deciding to
        unpack the src in user cache folder. Use a crc test.

    cumulative sum (or other monoid)

    Raye: link-time optimization with binary for operators, for Julia

    pack/unpack COO

    kernel fusion

    CUDA kernels
        CUDA: finding src
        CUDA: kernel source location, and name

    distributed framework

    fine-grain parallelism for dot-product based mxm, mxv, vxm,
        then add GxB_vxvt (outer product) and GxB_vtxv (inner product)
        (or call them GxB_outerProduct and GxB_innerProduct?)

    aggregators

    GrB_extract with GrB_Vectors instead of (GrB_Index *) arrays for I and J

    iso: set a flag with GrB_get/set to disable iso.  useful if the matrix is
    about to become non-iso anyway. Pagerank does:

        r = teleport (becomes iso)
        r += A*x     (becomes non-iso)

    apply: C = f(A), A dense, no mask or accum, C already dense: do in place

    JIT: allow a flag to be set in a type or operator to selectively control
        the JIT

    JIT: requires GxB_BinaryOp_new to give the string that defines the op.
    Allow use of
        GrB_BinaryOp_new (...)
        GrB_set (op, GxB_DEFN, "string"
    also for all ops

    candidates for kernel fusion:
        * triangle counting: mxm then reduce to scalar
        * lcc: mxm then reduce to vector
        * FusedMM: see https://arxiv.org/pdf/2011.06391.pdf

    more:
        * consider algorithms where fusion can occur
        * performance monitor, or revised burble, to detect generic cases
        * check if vectorization of GrB_mxm is effective when using clang
        * see how HNSW vector search could be implemented in GraphBLAS

    for CUDA JIT:

        https://developer.nvidia.com/blog/cuda-12-0-compiler-support-for-runtime-lto-using-nvjitlink-library/
            Developer webpage talking about ways to do nvJit with link time
            optimization using CUDA 12.0  Shows precompiled path and JIT path to
            generate kernels

