Hacker News Show HN

タイトル案、本文テンプレート、想定Q&A。

タイトル案（3パターン）

HNのタイトルは80文字以内。技術的な具体性と「何が新しいか」が伝わることが重要。マーケティング用語は避ける。

Show HN: chatweb.ai – 39 AI agents that execute tasks via A2A protocol (JSON-RPC 2.0)

Show HN: chatweb.ai – Multi-agent platform with 12 LLMs and self-improving feedback loops

Show HN: chatweb.ai – From chat to execution: agents that send emails, deploy code, publish sites

本文テンプレート

Hi HN, I'm Yuki. I built chatweb.ai, a multi-agent platform where 39 specialized AI agents execute real-world tasks.

The problem: LLMs are great at generating text but can't actually *do* things. You still need to copy the email draft, open Gmail, paste, and send. Multiply this across dozens of daily tasks.

chatweb.ai gives each task type a specialized agent that handles end-to-end execution:
- GmailAgent sends/replies to emails
- WebPublisher generates and deploys static sites
- CodeDeployer builds and ships code
- BrowserAgent navigates the web and extracts data

Technical details:

1. A2A v1.0 Protocol: Agents communicate via Google's Agent-to-Agent standard — JSON-RPC 2.0 requests with SSE streaming for real-time responses. Each agent publishes an Agent Card declaring its capabilities, input/output schemas, and constraints.

2. Multi-model orchestration: 12 models (Llama 4 Scout, Claude Opus, GPT-4o, Gemini Pro, Qwen, etc.) allocated per task type based on cost/quality/latency tradeoffs. Not all tasks need the most expensive model.

3. Self-improving loop: Every 6 hours, execution results are scored and prompts are auto-optimized. Pro tier targets 9/10 quality with up to 5 improvement iterations per task.

4. Stack: Rust (axum) on AWS Lambda (ARM64, musl), DynamoDB, SSE streaming through API Gateway. Also runs on Fly.io with libSQL/SQLite.

Free tier includes $2/month credit. Planning to open-source the core A2A implementation.

https://chatweb.ai

Happy to answer any technical questions about the architecture, agent design, or feedback loop implementation.

想定Q&A

HNコメントで聞かれそうな質問と、推奨回答。

Q: How is this different from AutoGPT / AgentGPT / CrewAI?

Those frameworks focus on autonomous planning loops with a single LLM. chatweb.ai uses specialized agents (not generic ones) with real tool integrations — each agent has actual OAuth credentials and API access to execute tasks. We also use 12 different models optimized per task rather than routing everything through one LLM. The A2A protocol means agents negotiate and delegate tasks based on declared capabilities, not just a planner assigning work.

Q: Why 12 models instead of just using Claude/GPT-4o for everything?

Cost and latency. Not every task needs a $15/M-token model. Simple email categorization works great with Llama 4 Scout at a fraction of the cost. Complex code generation routes to Claude Opus. We benchmark continuously and reassign models as the landscape shifts. The 6-hour feedback loop also catches model regressions — if a model update degrades quality on a specific task type, the system adapts.

Q: What's the A2A protocol and why not just use function calling?

A2A (Agent-to-Agent) is Google's proposed standard for inter-agent communication. Function calling works for single-model tool use, but breaks down when you need agent-to-agent delegation. With A2A, each agent publishes an Agent Card (JSON schema describing capabilities), and agents discover and negotiate with each other via JSON-RPC 2.0. SSE streaming gives real-time progress updates. It's a proper multi-agent protocol vs. bolting tools onto a single LLM.

Q: How do you handle security? Agents have access to my Gmail?

Each agent integration uses standard OAuth 2.0 with minimal scopes. You explicitly grant permissions per service. Agent actions are logged and auditable. We don't store email content — the GmailAgent uses the Gmail API directly with your token. Enterprise tier supports on-premise deployment for full data control. We're also working on granular permission policies (e.g., "can read but not delete emails").

Q: How does the self-improving feedback loop work technically?

Every agent execution produces a result with metadata (success/failure, latency, user feedback if given). Every 6 hours, a meta-agent evaluates recent executions against quality targets (7/10 for Free, 9/10 for Pro). It identifies patterns — e.g., "GmailAgent fails 30% of the time on multi-recipient emails." Then it generates prompt modifications, tests them against a held-out set, and deploys if quality improves. Pro tier runs up to 5 iterations per cycle. It's essentially automated prompt engineering with a continuous evaluation harness.

Q: Why Rust + Lambda instead of Python/Node?

Cold start latency. Our P95 cold start on ARM64 Lambda with musl is under 50ms. A Python Lambda with dependencies is 500ms-2s. For agent orchestration with SSE streaming, every millisecond of overhead compounds. Rust + axum also gives us a single binary (~15MB) that runs identically on Lambda, Fly.io, and Docker. The same codebase serves both platforms.

Q: $1M/month Enterprise tier? Is that a joke?

It's a concierge tier for large organizations that want custom agent development, on-premise deployment, dedicated model instances, and SLA guarantees. Limited to 10 companies because each requires significant hands-on engineering. Think of it as "we build your custom AI workforce" rather than a SaaS subscription. The core product works on the Free and Pro tiers.

Q: When will it be open-sourced?

We're planning to open-source the core A2A protocol implementation and the agent framework. Timeline is within the next few months — we want to stabilize the API first. The hosted platform (auth, billing, managed infrastructure) will remain proprietary, similar to how Supabase open-sources the core but offers a managed service.

HNでの注意点: 過度なマーケティング表現は避ける。技術的な質問には具体的なコード例やアーキテクチャで回答する。批判的なコメントにも丁寧に対応し、「good point, we're working on that」のスタンスで。