A production fork of llama.cpp stripped to the CPU backend and optimized for ARM Android devices. All non-CPU backends (CUDA, Metal, Vulkan, OpenCL, etc.) are removed. Four custom engine layers are ...
// the first instance is for the non-SWA layers of the model and the second instance is for the SWA layers llama_memory_context_ptr init_full() override; llama_memory_context_ptr ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results