โ๐ต๐ฅ
Parallelism
๐ RServe: Overlapping Encoding and Prefill for Efficient LMM Inference
๐ Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices