Shangming Cai

@ShangmingCai

Currently working at Alibaba Cloud Apsara Lab. Research Interests: Efficient LLM serving system.

Alibaba Cloud

235

Followers

Following

Public Repos

Private Repos

Language Breakdown

Lines of code distribution across 1 owned repositories

0 Total LOC

Collaboration Network

Global Impact visualization

LIVE

0 active collaborators

Repos

PRs

Growth

+18%

Top Collaborators

No collaborator data yet.

Coding Streak

Contribution activity over the past year

29 days

1,088

Contributions

257

Commits

213

Pull Requests

Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun

Based on GitHub activity

Less

Followers 235

Loggie

@Loggie666

Brandon Ye

@ysyisyourbrother

escflee

@escflee

Zhonghua Deng

@Abatom

Ray Cao

@RayaCoo

View All

Following

45 total

Alison Shao

@alisonshao

Zhonghua Deng

@Abatom

Brayden Zhong

@b8zhong

dongjiyingdjy

@dongjiyingdjy

Yueming Yuan

@yueming-yuan

View All Network

Synced via GitHub

Top Repositories

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Make SGLang go brrr

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

SGLang is a fast serving framework for large language models and vision language models.

0 0

Python

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

0 0

C++

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

0 0

C++