Understanding disaggregated GenAI model serving with llm-d
What is llm-d? llm-d is an open source solution for managing high-scale, high-performance Large Language Model (LLM) deployments. LLMs are at the heart of generative AI – so when y...
Search fresh public links, source activity, and post angles for Model-Serving.
Fresh curated links around model-serving are collected here so marketers can spot useful updates and turn timely ideas into posts faster.
Recent items include:
Recent curated links from global sources. Generate one free draft from any story, then use SocialBu to schedule and refine your content calendar.
What is llm-d? llm-d is an open source solution for managing high-scale, high-performance Large Language Model (LLM) deployments. LLMs are at the heart of generative AI – so when y...
Part 3 of the Understanding LLM Serving seriesContinue reading on Understanding LLM Serving »
Continue reading on Medium »
Multi-model LLM orchestration is the practice of routing AI requests to different models based on what each task needs — speed, cost, reasoning depth, or code quality. OpenRouter m...
In this post, we walk through how we fine-tuned Qwen 2.5 7B Instruct for tool calling using RLVR. We cover dataset preparation across three distinct agent behaviors, reward functio...
IntroductionContinue reading on Medium »
前言:那個在 POC 準確率 99%,上線後卻全面崩盤的模型Continue reading on Medium »
You’ve done it. You spent weeks cleaning messy data, tuning hyperparameters, and finally, you see that beautiful 95% accuracy score in…Continue reading on Medium »
Last week I shipped a Model Context Protocol (MCP) server for my analytics SaaS. Now Claude Desktop, Cursor, and any MCP compatible client can query traffic, revenue, and funnel da...
In this post, we demonstrate how to build a secure, complete LLM fine-tuning workflow that integrates Unity Catalog with Amazon SageMaker AI using Amazon EMR Serverless for preproc...
In this tutorial, we explore kvcached, a dynamic KV-cache implementation on top of vLLM, to understand how dynamic KV-cache allocation transforms GPU memory usage for large languag...
Use SocialBu to discover ideas, generate post drafts, and schedule them across your social channels.