I am building myberuf, a German-for-work learning product. The product is situation-first, not grammar-first: interviews, onboarding, workplace pragmatics, AI coaching, mistake review, and browser-local progress. This article is not a generic tool ranking. It is a field note from trying to build a real MVP on a budget.
1. I am building an MVP on a budget
myberuf exists because a learner can take many B2/C1 German classes and still freeze when the situation becomes real: introducing yourself in an interview, asking a polite question at the end, navigating onboarding, or responding to workplace feedback.
The product tries to turn studied German into retrieval under pressure. It has HR interview simulations, onboarding scenarios, AI coaching, mistake review, progress tracking, workplace pragmatics, and a private beta workflow built around fast preview testing.
I did not start by hiring a full engineering team or pretending I had a polished agency machine behind me. I used AI tools as a practical build team: ChatGPT, Claude, Claude Code, Codex, Mistral, Vercel, GitHub, project reports, handovers, and smoke tests.
2. The real lesson: AI tools need an operating system
The stack matters, but the operating system matters more. A strong model with a vague task can still make a mess. A smaller model with a crisp scope, the right files, explicit non-goals, and a smoke test can be surprisingly useful.
The operating system I keep coming back to is:
Think through the product problem before touching files.
Turn ambiguity into one reviewable sprint.
Use a repo-aware agent for targeted changes.
Run checks and manually smoke-test the user flow.
Write the memory so the next session starts clean.
That workflow sounds simple. It is also the difference between "AI made a cool demo" and "AI helped me keep building a product without losing the thread."
3. The stack I used
This is not a universal leaderboard. It is the division of labor that has been useful in my current builder workflow.
ChatGPT / OpenAI
Useful for brainstorming, product framing, content, messy notes, implementation prompts, and turning a long build journey into usable lessons.
Claude Opus
Useful when the problem is not just code: pedagogy, product architecture, tradeoffs, and deciding what the product should actually teach.
Claude Sonnet / Claude Code
Useful for implementation-heavy work when the slice is scoped and the agent needs to follow existing repo patterns.
Codex / OpenAI
Useful for targeted fixes, validation commands, debugging, and executing from project reports without carrying every old chat forward.
Mistral
Promising for myberuf's German coaching layer because coaching quality, language nuance, latency, cost, deployment trust, and European AI context matter.
Vercel + GitHub
Preview deployment, branches, commits, and rollback points turned AI work into something testable instead of just impressive in a chat.
Local and open models matter too. In the GOJA NLP project, local LLMs were evaluated for structured information extraction from German job ads under privacy and institutional constraints. Local inference, Ollama, schema validation, and evaluation reports are a different kind of learning lab from myberuf, but the lesson is related: tool choice is task choice.
4. What each tool is good for
The practical split is this: use stronger reasoning models when the work is ambiguous, and use repo-aware execution tools when the task is clear.
For myberuf, Claude and ChatGPT were useful for strategy, pedagogy, product thinking, and implementation prompts. A lot of the important work was not "write code"; it was deciding that Module 1 should feel like surviving an HR interview, not taking a grammar lesson.
Codex was useful when the target was concrete: inspect these files, fix this validation path, run these checks, update this handover, do not touch unrelated code. Claude Code and Sonnet were useful for implementation-heavy work when the product decision was already made.
Mistral is relevant for my current model layer, especially because myberuf is language-heavy and German workplace coaching has to be natural enough to be useful. The model layer is not only about raw intelligence; it is about coaching quality, language nuance, latency, cost, deployment trust, and whether the surrounding AI infrastructure fits the product's context.
That is why Mistral is interesting here. myberuf is a German/European workplace-language product, so European AI infrastructure and enterprise-trust questions are not decoration. They are part of the long-term product environment. Mistral fits naturally into a multi-model builder stack: not as a replacement for every other tool, but as a serious candidate for the coaching flow.
I am not claiming Mistral is objectively best. I am saying it is promising for this use case and relevant for European/privacy-conscious AI workflows. The practical builder lesson is to choose models by job-to-be-done, not by hype.
5. The handover system
The most underrated part of the stack is not a model. It is the handover.
When a project moves between ChatGPT, Claude, Claude Code, Codex, terminal sessions, project reports, and task files, the bottleneck is no longer only model capability. The bottleneck is whether the next agent receives the right context without dragging in irrelevant history.
In myberuf, files like progress.md, next_sprint.md,
weekly handover notes, smoke checklists, and rollback notes became the
shared memory layer. They reduced context pollution, token waste,
repeated work, and stale assumptions.
A good handover says what changed, what files were touched, what checks ran, what was manually tested, what risks remain, what to do next, and what not to start.
6. The Vercel and secrets lesson
Vercel made preview deployment fast. That mattered because several myberuf issues only became visible in a real browser: scroll behavior, confusing beta signup copy, and UI options that created expectations the model could not meet.
But fast deployment does not remove security basics. Hosted model calls
require careful environment variable handling. API keys belong on the
server, not in browser-exposed code. .env.local stays out
of git. Vercel environment variables need to be set deliberately. Logs
should be inspected without exposing secrets.
7. Automation lesson: voice changes the latency requirement
One myberuf experiment explored a more conversational direction using Vapi: what if the learner could speak to the platform instead of only typing?
That changes the product requirements. For text-based coaching, a short delay can be acceptable. For voice-based learning, latency becomes part of the simulation. If the response lags, the learner no longer feels like they are practicing a real workplace conversation.
Source video from the myberuf voice/automation experiment. The public page links to Loom instead of relying on an iframe player that can fail in local preview.
This is another reason model and infrastructure choice matters. The question is not only "which AI is smartest?" It is "which model fits the interaction?" For a European German-for-work product, Mistral is strategically interesting because model-layer choices affect latency, trust, deployment fit, cost, and privacy expectations. That is not a claim that Mistral is universally faster or best. It is a job-to-be-done argument.
8. The guardrails lesson
AI products need deterministic checks before expensive or subjective LLM evaluation.
myberuf taught this the uncomfortable way. Inputs like test,
asdf, or weiss nicht should not advance an
interview, call the LLM, or save to a mistake bank. That needs a cheap
guard before the model is asked to coach anything.
Later, the same lesson appeared in a harder form: long abusive, non-German, or nonsense answers can still reach coaching if validation is too weak. If the coaching UI always expects "what was good," the model can become too generous. Sometimes the honest answer is: this was not a meaningful attempt.
Session 18 hardened the beta with a progress page, content validation,
retry-coach rate limiting, cleanup, and clearer navigation. Session 19
fixed active-beat scrolling and produced the Round 2 plan. Session 19B
shipped the optional /modules/1/exercises?round=2 slice
with its own progress key, profile.modules["1-round2"], so
it would not corrupt Module 1 or Module 2 progression. Session 20 added
request guards, clarified beta copy so email capture did not sound like
login, refined scroll behavior, and added a tiny Rueckfragen slice.
9. What I would tell another builder
Start smaller than you want to. Pick one painful user situation. For myberuf, that was not "learn German"; it was "survive a German workplace situation without freezing."
Then ask AI for one slice. Not a platform. Not a complete product. One flow that can be tried, broken, improved, and documented.
Separate product thinking from repo execution. Let ChatGPT or Claude help you clarify the product decision. Let Codex or Claude Code execute the specific file-level task. Then smoke-test the result yourself.
End every session with a handover. A project built with AI agents is not one heroic prompt. It is a chain of scoped decisions.
10. Practical checklist
Before asking an AI agent to build
- Write the user situation in one sentence.
- Define the smallest useful vertical slice.
- List the files the agent should inspect.
- List explicit non-goals.
- Commit or identify a rollback point.
- Decide which checks and smoke tests must run.
Before showing users
- Run the app in a real browser.
- Try nonsense input and malicious-looking input.
- Confirm invalid input does not call expensive LLM paths.
- Check beta copy does not imply features that do not exist.
- Confirm secrets are server-side and out of git.
- Write the handover for the next session.
11. AI amplifies clarity or confusion
Building on a budget is now possible in a way that would have sounded unrealistic a few years ago. But it is not automatic.
AI does not remove the need for product judgment. It increases the value of clear thinking. If the task is vague, AI amplifies confusion. If the task is scoped, validated, and handed over properly, AI becomes a real building system.
The best AI stack is not just a list of tools. It is a way of working: strategy -> task -> execution -> validation -> handover.
Use the operating system, not just the tools.
Start with one narrow user situation, keep the build reviewable, and make the next AI session small enough to validate.