System | JJ's Blog

📝 Empirical Recipes for Efficient and Compact Vision-Language Models

📝 Empower Vision Applications with LoRA LMM

📝 Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding

📝 Elastic On-Device LLM Service

📝 RServe: Overlapping Encoding and Prefill for Efficient LMM Inference

📝 ModServe - Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving

📝 Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices

📝 Breaking the Wall: Unifying Edge GPUs and NPUs into Pipeline Parallelism for Efficient LLM Fine-Tuning

📝 Efficiently Serving Large Multimodal Models Using EPD Disaggregation

📝 ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism