Author

Orchestra Research

@Orchestra-Research

Marketplaces

Plugins

Skills

Agents

Commands

Marketplaces

Marketplace

ai-research-skills

Comprehensive library of 89 AI research engineering skills enabling autonomous AI research from hypothesis to experimental verification

Plugins:22

Skills:90

5,779

447

View Details

Plugins

Plugin

model-architecture

LLM architectures and implementations including LitGPT, Mamba, NanoGPT, RWKV, and TorchTitan. Use when implementing, training, or understanding transformer and alternative architectures.

View Details

Plugin

tokenization

Text tokenization for LLMs including HuggingFace Tokenizers and SentencePiece. Use when training custom tokenizers or handling multilingual text.

View Details

Plugin

fine-tuning

LLM fine-tuning frameworks including Axolotl, LLaMA-Factory, PEFT, and Unsloth. Use when fine-tuning models with LoRA, QLoRA, or full fine-tuning.

View Details

Plugin

mechanistic-interpretability

Neural network interpretability tools including TransformerLens, SAELens, NNSight, and pyvene. Use when analyzing model internals, finding circuits, or understanding how models compute.

View Details

Plugin

data-processing

Data curation and processing at scale including NeMo Curator and Ray Data. Use when preparing training datasets or processing large-scale data.

View Details

Plugin

RLHF and preference alignment including TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, and torchforge. Use when aligning models with human preferences, training reward models, or large-scale RL training.

View Details

Plugin

safety-alignment

AI safety and content moderation including Constitutional AI, LlamaGuard, NeMo Guardrails, and Prompt Guard. Use when implementing safety filters, content moderation, or prompt injection detection.

View Details

Plugin

distributed-training

Multi-GPU and multi-node training including DeepSpeed, PyTorch FSDP, Accelerate, Megatron-Core, PyTorch Lightning, and Ray Train. Use when training large models across GPUs.

View Details

Plugin

infrastructure

GPU cloud and compute orchestration including Modal, Lambda Labs, and SkyPilot. Use when deploying training jobs or managing GPU resources.

View Details

Plugin

optimization

Model optimization and quantization including Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, and HQQ. Use when reducing memory, accelerating inference, or quantizing models.

View Details

Plugin

evaluation

LLM benchmarking and evaluation including lm-evaluation-harness, BigCode Evaluation Harness, and NeMo Evaluator. Use when benchmarking models or measuring performance.

View Details

Plugin

inference-serving

Production LLM inference including vLLM, TensorRT-LLM, llama.cpp, and SGLang. Use when deploying models for production inference.

View Details

Plugin

mlops

ML experiment tracking and lifecycle including Weights & Biases, MLflow, and TensorBoard. Use when tracking experiments or managing models.

View Details

Plugin

agents

LLM agent frameworks including LangChain, LlamaIndex, CrewAI, and AutoGPT. Use when building chatbots, autonomous agents, or tool-using systems.

View Details

Plugin

rag

Retrieval-Augmented Generation including Chroma, FAISS, Pinecone, Qdrant, and Sentence Transformers. Use when building semantic search or document retrieval systems.

View Details

Plugin

prompt-engineering

Structured LLM outputs including DSPy, Instructor, Guidance, and Outlines. Use when extracting structured data or constraining LLM outputs.

View Details

Plugin

observability

LLM application monitoring including LangSmith and Phoenix. Use when debugging LLM apps or monitoring production systems.

View Details

Plugin

multimodal

Vision, audio, and multimodal models including CLIP, Whisper, LLaVA, BLIP-2, Segment Anything, Stable Diffusion, AudioCraft, Cosmos Policy, OpenPI, and OpenVLA-OFT. Use when working with images, audio, multimodal tasks, or vision-language-action robot policies.

View Details

Plugin

emerging-techniques

Advanced ML techniques including MoE Training, Model Merging, Long Context, Speculative Decoding, Knowledge Distillation, and Model Pruning. Use when implementing cutting-edge optimization or architecture techniques.

View Details

Plugin

autoresearch

Autonomous research orchestration using a two-loop architecture. Manages the full research lifecycle from literature survey to paper writing, routing to domain-specific skills for execution. Use when starting a research project, running autonomous experiments, or managing multi-hypothesis research.

View Details

Plugin

ml-paper-writing

Write publication-ready ML/AI/Systems papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM, OSDI, NSDI, ASPLOS, SOSP. Includes LaTeX templates, citation verification, reviewer guidelines, publication-quality figure generation, and writing best practices.

View Details

Plugin

ideation

Research ideation frameworks including structured brainstorming and creative thinking. Use when exploring new research directions, generating novel ideas, or seeking fresh angles on existing work.

View Details

Skills

Skill

litgpt

LLM architectures and implementations including LitGPT, Mamba, NanoGPT, RWKV, and TorchTitan. Use when implementing, training, or understanding transformer and alternative architectures.

From ai-research-skills/model-architecture

View Details

Skill

mamba

LLM architectures and implementations including LitGPT, Mamba, NanoGPT, RWKV, and TorchTitan. Use when implementing, training, or understanding transformer and alternative architectures.

From ai-research-skills/model-architecture

View Details

Skill

nanogpt

LLM architectures and implementations including LitGPT, Mamba, NanoGPT, RWKV, and TorchTitan. Use when implementing, training, or understanding transformer and alternative architectures.

From ai-research-skills/model-architecture

View Details

Skill

rwkv

LLM architectures and implementations including LitGPT, Mamba, NanoGPT, RWKV, and TorchTitan. Use when implementing, training, or understanding transformer and alternative architectures.

From ai-research-skills/model-architecture

View Details

Skill

torchtitan

LLM architectures and implementations including LitGPT, Mamba, NanoGPT, RWKV, and TorchTitan. Use when implementing, training, or understanding transformer and alternative architectures.

From ai-research-skills/model-architecture

View Details

Skill

huggingface-tokenizers

Text tokenization for LLMs including HuggingFace Tokenizers and SentencePiece. Use when training custom tokenizers or handling multilingual text.

From ai-research-skills/tokenization

View Details

Skill

sentencepiece

Text tokenization for LLMs including HuggingFace Tokenizers and SentencePiece. Use when training custom tokenizers or handling multilingual text.

From ai-research-skills/tokenization

View Details

Skill

axolotl

LLM fine-tuning frameworks including Axolotl, LLaMA-Factory, PEFT, and Unsloth. Use when fine-tuning models with LoRA, QLoRA, or full fine-tuning.

From ai-research-skills/fine-tuning

View Details

Skill

llama-factory

LLM fine-tuning frameworks including Axolotl, LLaMA-Factory, PEFT, and Unsloth. Use when fine-tuning models with LoRA, QLoRA, or full fine-tuning.

From ai-research-skills/fine-tuning

View Details

Skill

peft

LLM fine-tuning frameworks including Axolotl, LLaMA-Factory, PEFT, and Unsloth. Use when fine-tuning models with LoRA, QLoRA, or full fine-tuning.

From ai-research-skills/fine-tuning

View Details

Skill

unsloth

LLM fine-tuning frameworks including Axolotl, LLaMA-Factory, PEFT, and Unsloth. Use when fine-tuning models with LoRA, QLoRA, or full fine-tuning.

From ai-research-skills/fine-tuning

View Details

Skill

nnsight

Neural network interpretability tools including TransformerLens, SAELens, NNSight, and pyvene. Use when analyzing model internals, finding circuits, or understanding how models compute.

From ai-research-skills/mechanistic-interpretability

View Details

Skill

pyvene

Neural network interpretability tools including TransformerLens, SAELens, NNSight, and pyvene. Use when analyzing model internals, finding circuits, or understanding how models compute.

From ai-research-skills/mechanistic-interpretability

View Details

Skill

saelens

Neural network interpretability tools including TransformerLens, SAELens, NNSight, and pyvene. Use when analyzing model internals, finding circuits, or understanding how models compute.

From ai-research-skills/mechanistic-interpretability

View Details

Skill

transformer-lens

Neural network interpretability tools including TransformerLens, SAELens, NNSight, and pyvene. Use when analyzing model internals, finding circuits, or understanding how models compute.

From ai-research-skills/mechanistic-interpretability

View Details

Skill

nemo-curator

Data curation and processing at scale including NeMo Curator and Ray Data. Use when preparing training datasets or processing large-scale data.

From ai-research-skills/data-processing

View Details

Skill

ray-data

Data curation and processing at scale including NeMo Curator and Ray Data. Use when preparing training datasets or processing large-scale data.

From ai-research-skills/data-processing

View Details

Skill

grpo-rl-training

From ai-research-skills/post-training

View Details

Skill

miles

From ai-research-skills/post-training

View Details

Skill

openrlhf

From ai-research-skills/post-training

View Details

Skill

simpo

From ai-research-skills/post-training

View Details

Skill

slime

From ai-research-skills/post-training

View Details

Skill

torchforge

From ai-research-skills/post-training

View Details

Skill

trl-fine-tuning

From ai-research-skills/post-training

View Details

Skill

verl

From ai-research-skills/post-training

View Details

Skill

constitutional-ai

AI safety and content moderation including Constitutional AI, LlamaGuard, NeMo Guardrails, and Prompt Guard. Use when implementing safety filters, content moderation, or prompt injection detection.

From ai-research-skills/safety-alignment

View Details

Skill

llamaguard

AI safety and content moderation including Constitutional AI, LlamaGuard, NeMo Guardrails, and Prompt Guard. Use when implementing safety filters, content moderation, or prompt injection detection.

From ai-research-skills/safety-alignment

View Details

Skill

nemo-guardrails

AI safety and content moderation including Constitutional AI, LlamaGuard, NeMo Guardrails, and Prompt Guard. Use when implementing safety filters, content moderation, or prompt injection detection.

From ai-research-skills/safety-alignment

View Details

Skill

prompt-guard

AI safety and content moderation including Constitutional AI, LlamaGuard, NeMo Guardrails, and Prompt Guard. Use when implementing safety filters, content moderation, or prompt injection detection.

From ai-research-skills/safety-alignment

View Details

Skill

accelerate

Multi-GPU and multi-node training including DeepSpeed, PyTorch FSDP, Accelerate, Megatron-Core, PyTorch Lightning, and Ray Train. Use when training large models across GPUs.

From ai-research-skills/distributed-training

View Details

Skill

deepspeed

Multi-GPU and multi-node training including DeepSpeed, PyTorch FSDP, Accelerate, Megatron-Core, PyTorch Lightning, and Ray Train. Use when training large models across GPUs.

From ai-research-skills/distributed-training

View Details

Skill

megatron-core

Multi-GPU and multi-node training including DeepSpeed, PyTorch FSDP, Accelerate, Megatron-Core, PyTorch Lightning, and Ray Train. Use when training large models across GPUs.

From ai-research-skills/distributed-training

View Details

Skill

pytorch-fsdp2

Multi-GPU and multi-node training including DeepSpeed, PyTorch FSDP, Accelerate, Megatron-Core, PyTorch Lightning, and Ray Train. Use when training large models across GPUs.

From ai-research-skills/distributed-training

View Details

Skill

pytorch-lightning

Multi-GPU and multi-node training including DeepSpeed, PyTorch FSDP, Accelerate, Megatron-Core, PyTorch Lightning, and Ray Train. Use when training large models across GPUs.

From ai-research-skills/distributed-training

View Details

Skill

ray-train

Multi-GPU and multi-node training including DeepSpeed, PyTorch FSDP, Accelerate, Megatron-Core, PyTorch Lightning, and Ray Train. Use when training large models across GPUs.

From ai-research-skills/distributed-training

View Details

Skill

lambda-labs

GPU cloud and compute orchestration including Modal, Lambda Labs, and SkyPilot. Use when deploying training jobs or managing GPU resources.

From ai-research-skills/infrastructure

View Details

Skill

modal

GPU cloud and compute orchestration including Modal, Lambda Labs, and SkyPilot. Use when deploying training jobs or managing GPU resources.

From ai-research-skills/infrastructure

View Details

Skill

skypilot

GPU cloud and compute orchestration including Modal, Lambda Labs, and SkyPilot. Use when deploying training jobs or managing GPU resources.

From ai-research-skills/infrastructure

View Details

Skill

awq

Model optimization and quantization including Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, and HQQ. Use when reducing memory, accelerating inference, or quantizing models.

From ai-research-skills/optimization

View Details

Skill

bitsandbytes

Model optimization and quantization including Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, and HQQ. Use when reducing memory, accelerating inference, or quantizing models.

From ai-research-skills/optimization

View Details

Skill

flash-attention

Model optimization and quantization including Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, and HQQ. Use when reducing memory, accelerating inference, or quantizing models.

From ai-research-skills/optimization

View Details

Skill

gguf

Model optimization and quantization including Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, and HQQ. Use when reducing memory, accelerating inference, or quantizing models.

From ai-research-skills/optimization

View Details

Skill

gptq

Model optimization and quantization including Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, and HQQ. Use when reducing memory, accelerating inference, or quantizing models.

From ai-research-skills/optimization

View Details

Skill

hqq

Model optimization and quantization including Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, and HQQ. Use when reducing memory, accelerating inference, or quantizing models.

From ai-research-skills/optimization

View Details

Skill

bigcode-evaluation-harness

LLM benchmarking and evaluation including lm-evaluation-harness, BigCode Evaluation Harness, and NeMo Evaluator. Use when benchmarking models or measuring performance.

From ai-research-skills/evaluation

View Details

Skill

lm-evaluation-harness

LLM benchmarking and evaluation including lm-evaluation-harness, BigCode Evaluation Harness, and NeMo Evaluator. Use when benchmarking models or measuring performance.

From ai-research-skills/evaluation

View Details

Skill

nemo-evaluator

LLM benchmarking and evaluation including lm-evaluation-harness, BigCode Evaluation Harness, and NeMo Evaluator. Use when benchmarking models or measuring performance.

From ai-research-skills/evaluation

View Details

Skill

llama-cpp

Production LLM inference including vLLM, TensorRT-LLM, llama.cpp, and SGLang. Use when deploying models for production inference.

From ai-research-skills/inference-serving

View Details

Skill

sglang

Production LLM inference including vLLM, TensorRT-LLM, llama.cpp, and SGLang. Use when deploying models for production inference.

From ai-research-skills/inference-serving

View Details

Skill

tensorrt-llm

Production LLM inference including vLLM, TensorRT-LLM, llama.cpp, and SGLang. Use when deploying models for production inference.

From ai-research-skills/inference-serving

View Details

Skill

vllm

Production LLM inference including vLLM, TensorRT-LLM, llama.cpp, and SGLang. Use when deploying models for production inference.

From ai-research-skills/inference-serving

View Details

Skill

mlflow

ML experiment tracking and lifecycle including Weights & Biases, MLflow, and TensorBoard. Use when tracking experiments or managing models.

From ai-research-skills/mlops

View Details

Skill

tensorboard

ML experiment tracking and lifecycle including Weights & Biases, MLflow, and TensorBoard. Use when tracking experiments or managing models.

From ai-research-skills/mlops

View Details

Skill

weights-and-biases

ML experiment tracking and lifecycle including Weights & Biases, MLflow, and TensorBoard. Use when tracking experiments or managing models.

From ai-research-skills/mlops

View Details

Skill

autogpt

LLM agent frameworks including LangChain, LlamaIndex, CrewAI, and AutoGPT. Use when building chatbots, autonomous agents, or tool-using systems.

From ai-research-skills/agents

View Details

Skill

crewai

LLM agent frameworks including LangChain, LlamaIndex, CrewAI, and AutoGPT. Use when building chatbots, autonomous agents, or tool-using systems.

From ai-research-skills/agents

View Details

Skill

langchain

LLM agent frameworks including LangChain, LlamaIndex, CrewAI, and AutoGPT. Use when building chatbots, autonomous agents, or tool-using systems.

From ai-research-skills/agents

View Details

Skill

llamaindex

LLM agent frameworks including LangChain, LlamaIndex, CrewAI, and AutoGPT. Use when building chatbots, autonomous agents, or tool-using systems.

From ai-research-skills/agents

View Details

Skill

chroma

Retrieval-Augmented Generation including Chroma, FAISS, Pinecone, Qdrant, and Sentence Transformers. Use when building semantic search or document retrieval systems.

From ai-research-skills/rag

View Details

Skill

faiss

Retrieval-Augmented Generation including Chroma, FAISS, Pinecone, Qdrant, and Sentence Transformers. Use when building semantic search or document retrieval systems.

From ai-research-skills/rag

View Details

Skill

pinecone

Retrieval-Augmented Generation including Chroma, FAISS, Pinecone, Qdrant, and Sentence Transformers. Use when building semantic search or document retrieval systems.

From ai-research-skills/rag

View Details

Skill

qdrant

Retrieval-Augmented Generation including Chroma, FAISS, Pinecone, Qdrant, and Sentence Transformers. Use when building semantic search or document retrieval systems.

From ai-research-skills/rag

View Details

Skill

sentence-transformers

Retrieval-Augmented Generation including Chroma, FAISS, Pinecone, Qdrant, and Sentence Transformers. Use when building semantic search or document retrieval systems.

From ai-research-skills/rag

View Details

Skill

dspy

Structured LLM outputs including DSPy, Instructor, Guidance, and Outlines. Use when extracting structured data or constraining LLM outputs.

From ai-research-skills/prompt-engineering

View Details

Skill

guidance

Structured LLM outputs including DSPy, Instructor, Guidance, and Outlines. Use when extracting structured data or constraining LLM outputs.

From ai-research-skills/prompt-engineering

View Details

Skill

instructor

Structured LLM outputs including DSPy, Instructor, Guidance, and Outlines. Use when extracting structured data or constraining LLM outputs.

From ai-research-skills/prompt-engineering

View Details

Skill

outlines

Structured LLM outputs including DSPy, Instructor, Guidance, and Outlines. Use when extracting structured data or constraining LLM outputs.

From ai-research-skills/prompt-engineering

View Details

Skill

langsmith

LLM application monitoring including LangSmith and Phoenix. Use when debugging LLM apps or monitoring production systems.

From ai-research-skills/observability

View Details

Skill

phoenix

LLM application monitoring including LangSmith and Phoenix. Use when debugging LLM apps or monitoring production systems.

From ai-research-skills/observability

View Details

Skill