๐Ÿ“ Empirical Recipes for Efficient and Compact Vision-Language Models

March 24, 2026

๐Ÿ“ Empower Vision Applications with LoRA LMM

December 15, 2025

๐Ÿ“ Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding

December 12, 2025

๐Ÿ“ Elastic On-Device LLM Service

December 8, 2025

๐Ÿ“ RServe: Overlapping Encoding and Prefill for Efficient LMM Inference

December 6, 2025

๐Ÿ“ ModServe - Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving

November 27, 2025

๐Ÿ“ Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices

November 26, 2025

๐Ÿ“ Breaking the Wall: Unifying Edge GPUs and NPUs into Pipeline Parallelism for Efficient LLM Fine-Tuning

November 17, 2025

๐Ÿ“ Efficiently Serving Large Multimodal Models Using EPD Disaggregation

November 15, 2025

๐Ÿ“ ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism

November 13, 2025