claudeindex
Plugin

vllm-ascend

vLLM inference engine for Huawei Ascend NPU. Deploy LLMs with OpenAI-compatible API, offline batch inference, quantized model serving (W4A8, W8A8), tensor/pipeline parallelism for distributed inference, and performance optimization. Supports Qwen, DeepSeek, GLM, LLaMA models with Ascend-optimized kernels.

Installation

1

Add the marketplace

/plugin marketplace add ascend-ai-coding/awesome-ascend-skills
2

Install plugins

/plugin

Run these commands in Claude Code to add this plugin to your environment. The marketplace must be added before you can install its plugins.