builder

AI / ML

/

Chunking, embedding, retrieval, reranking, eval — the full stack.

Prompt engineering→

Harden prompts, review chain-of-thought, design tool-calling.

Eval harness→

Golden set + metrics + regression detection for an LLM feature.

Fine-tuning brief→

Decide between FT, prompt engineering, and RAG with a real framework.

Model selection→

Pick a model with latency / cost / capability tradeoffs named.

Guardrail design→

Input and output filters for a deployed LLM system.

Hallucination triage→

A deployed system is fabricating — what to investigate, in what order.