Shangming Cai
@ShangmingCaiCurrently working at Alibaba Cloud Apsara Lab. Research Interests: Efficient LLM serving system.
Language Breakdown
Lines of code distribution across 1 owned repositories
Collaboration Network
Global Impact visualization
Repos
18
PRs
0
Growth
+18%
Top Collaborators
No collaborator data yet.
Coding Streak
Contribution activity over the past year
Alison Shao
@alisonshao
Zhonghua Deng
@Abatom
Brayden Zhong
@b8zhong
dongjiyingdjy
@dongjiyingdjy
Yueming Yuan
@yueming-yuan
Top Repositories
A high-throughput and memory-efficient inference and serving engine for LLMs
Make SGLang go brrr
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
SGLang is a fast serving framework for large language models and vision language models.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Open Source Impact
Contributions to external projects