โ๐ต๐ฅ
Serving
๐ Empower Vision Applications with LoRA LMM
๐ Elastic On-Device LLM Service
๐ RServe: Overlapping Encoding and Prefill for Efficient LMM Inference
๐ Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices
๐ Efficiently Serving Large Multimodal Models Using EPD Disaggregation
๐ ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism