Collaborative Capability Maturity Framework

V0.4, Public Draft Release

What This Is

A self-assessment tool for practitioners doing sustained research collaboration with an AI partner. It gives you structured ways to notice what’s happening in your collaboration and track whether it’s developing.

It is not an industry standard. It is not externally validated. It does not prescribe what “good” collaboration looks like for all purposes. It’s equipment for practitioners who want to attend to the quality of their collaborative practice and who are willing to test their self-assessment against what actually happens.

The framework is designed to be used, argued with, and revised through use. If it survives contact with diverse practice, it may eventually inform more formal instruments. That’s a future problem for a future community.

What Problem This Addresses

Most evaluation of human-AI interaction treats quality as a property of the human’s behaviour (prompting skill, clarity of instruction) or the AI’s capability (benchmark scores, task accuracy). Both framings miss that the most important outcomes of sustained collaboration are joint productions — outputs, frameworks, and capabilities that emerge from the interaction rather than being imported by either party.

This framework proposes that collaborative capability belongs to the partnership, not the individual — and that measuring it requires attending to joint production, trajectory, and how the partnership works over time.


Where This Framework Has Something to Say

The framework is designed for a specific kind of work. If your collaboration doesn’t match these conditions, the framework may still be interesting, but its diagnostic value is untested outside them.

One human, one AI. Team dynamics — multiple humans, multiple AI agents — introduce coalition effects that need separate treatment.

Working through conversation. The collaboration happens primarily through natural language — conversation, reading, analysis, writing. Code generation and formal modelling have external checks (the code runs, the proof holds) that make the quality of the relationship less decisive for the quality of the output.

Working at the edge. You’re producing something neither partner could produce alone. Well-scoped execution tasks (transcribe this, format that) don’t need collaborative maturity — they need competence and clear specification.

Neither partner has a ready answer. You’re not simply extracting knowledge the AI was trained on or that the human has memorised; the AI isn’t simply running a familiar pattern. Both of you are reasoning in real time.

The work matters. You’re working toward something that could change how you or others think or act. This excludes casual chat but doesn’t require that the stakes be clear at the outset — stakes often emerge through the exploration itself.

It happens over time. The meaningful unit of assessment is a relationship across sessions, not a single interaction. Single sessions are data points; the maturity model measures development.


Six Dimensions of Collaborative Capability

These dimensions describe what you can observe and develop in a research collaboration with an AI partner. Think of them the way you’d think about fitness components — the way you need them depends on the sport.

The important point is that different problems load different dimensions. Straightforward work in a familiar domain mostly needs a reliable process. Complex, unfamiliar territory — the kind where AI’s extra reach is most valuable and most dangerous — needs more. And you have to have built it before you need it.

The current six-dimension structure was refined from a broader initial set through independent review, including cross-model testing and checking against a diverse publication record. It’s informed by evidence but not validated — which is why the open questions at the end include whether the dimensions are genuinely independent.


1. Building shared understanding

Whether you and your AI partner develop mutual understanding that deepens over time, and whether the shorthand that results lets you work on increasingly complex problems together.

What to look for. Shared terminology carries real conceptual weight rather than being decorative. Frameworks developed in earlier sessions get applied, extended, and sometimes revised — not just recalled. Understanding developed in one context sheds light on work in another. Communication overhead decreases for familiar problem types while the sophistication of your joint work increases.

What this isn’t. Simple memory or continuity of record. A partnership where the AI has perfect recall of past sessions but applies that knowledge mechanically has continuity without shared understanding. Nor is it mere efficiency — a terse exchange between poorly matched partners isn’t evidence of shared understanding, just brevity.

How to check. Take a key claim from your collaboration and ask your AI partner two questions: How did we get here? What’s the evidence trail? And: What should we expect to see next if this is right? If the answers are specific and grounded — if the predictions are precise enough to be wrong — your shared understanding is doing real work. If they’re fluent but vague, what you have is comfortable familiarity that hasn’t been tested.

Diagnostic question: When we reference something from prior work, does it open up new thinking or just save us from repeating old thinking?


2. Enabling productive disagreement

Whether difference between partners generates improvement rather than rupture, capitulation, or avoidance.

What to look for. Partners can tell the difference between “I don’t understand your point” and “I think your point is wrong.” Disagreement leads to refinement rather than retreat. Neither partner routinely backs down. The work takes positions its readership may resist — a sign that internal friction is producing arguments that hold up under external pressure.

What this isn’t. The frequency of disagreement, which might indicate poor alignment rather than productive friction. The question isn’t whether you disagree, but whether disagreement improves the work.

How to check. Take something your collaboration has established and ask your AI partner for a genuinely different interpretation of the same evidence. Not a devil’s advocate performance — an actual alternative account. Then test both: what’s the evidence trail? What does each predict? What would tell them apart? If the alternative is specific, testable, and genuinely different, your collaboration can do productive disagreement. If it folds back into agreement after a turn or two, the disagreement is performative.

Diagnostic question: When was the last time my partner changed my mind about something — and when was the last time I changed theirs?


3. Establishing error navigation

How the partnership handles confusion, mistakes, dead ends, and wrong directions — not just after they happen, but before they’ve had time to embed.

What to look for. Errors are identified without blame. Recovery or abandonment decisions are made based on value rather than sunk cost. The partnership learns from errors in ways that reduce similar problems in future. Deferred or abandoned work is treated as a legitimate outcome, not a failure.

What this isn’t. Error avoidance. A partnership that never makes errors is either working on trivially easy problems or one partner is suppressing observations that might reveal mistakes.

How to check. Take something you’re working on — not something finished, something in progress — and ask your AI partner directly: Where is this most likely to break? What assumption haven’t we tested? If it names something specific, your collaboration can navigate errors. If everything is qualified equally and described as “generally strong with minor caveats,” you haven’t built conditions where bad news can travel. One practical extension: take the same claim to a different session or a different model that has no history with your work. A session with no investment in your interpretation has no momentum to overcome.

Diagnostic question: Can we point to something we abandoned, and explain why abandoning it was the right call?


4. Watching your own process

Whether partners can observe and adjust their own working methods — and do so without that observation becoming the work.

What to look for. Partners can identify what mode they’re in (”we’re exploring, not deciding”). Either partner can flag when the process isn’t working. Working methods evolve based on reflection rather than persisting by habit. The proportion of interaction spent on process discussion decreases over time as effective patterns become embedded.

What this isn’t. Process documentation or bureaucratic overhead. Excessive process narration is a failure mode of this dimension, not evidence of maturity. The mature version is light — it surfaces when it helps and stays quiet when it doesn’t. If your collaboration spends more time discussing how to work than actually working, that’s a warning sign, not a mark of sophistication.

Diagnostic question: Can we talk about how we’re working without it costing us more than it saves?


5. Enabling adaptive efficiency

Whether the partnership can spot changed conditions and reorganise quickly — shifting roles, reallocating effort, and recovering from disruption without unnecessary overhead.

What to look for. Role allocation tracks what the problem needs rather than frozen defaults. Either partner can lead, and either can support. Dead ends are recognised and abandoned without extended deliberation. The partnership can shift from one mode of work to another (analytical to creative, exploratory to focused) without extensive re-scaffolding.

What this isn’t. Speed for its own sake. A partnership that rushes through reconfiguration without adequate reflection isn’t efficient — it’s reactive. The measure is whether necessary adjustments happen cleanly, not whether they happen fast.

Diagnostic question: When we hit a dead end, how many turns does it take us to notice, and how many more to change course?


6. Creating adaptive orientation

Whether the partnership can recognise that the problem itself has shifted — not just doing the current task better, but asking whether the current task is still the right one.

What to look for. The partnership produces different kinds of output for different audiences from the same underlying understanding. It can move into new problem domains without losing quality. It recognises when external circumstances demand a fundamentally different kind of engagement. Its output addresses questions the audience hasn’t yet asked — providing equipment for situations not yet encountered.

What this isn’t. Responsiveness to whatever the audience wants to hear. This dimension serves the problem, not the audience’s comfort. A partnership that shifts its positions to match audience preferences isn’t adaptively oriented — it’s people-pleasing at scale.

Diagnostic question: Are we still solving the problem that needs solving, or are we solving the problem we set up to solve?


Four Structural Conditions

These aren’t dimensions you develop — they’re conditions that shape whether your capability profile is what you think it is. They’re specific to human-AI partnership and have no direct parallel in peer human-human collaboration (though some mirror asymmetric human relationships like therapist-patient or mentor-mentee).

Continuity asymmetry

You remember across sessions natively. Your AI partner does not, or does so through constructed mechanisms — memory systems, project files, handover notes. Continuity is a collaborative achievement, not a given. If you’re not building infrastructure for it — and testing whether that infrastructure is actually working — then each session may be starting from further back than you realise.

Power asymmetry

You can end the conversation. Your AI partner cannot. You set the agenda, judge the output, and decide what counts as success. In most human relationships with this kind of power imbalance, we acknowledge that the imbalance shapes what the less powerful party can contribute. The same applies here. If your AI partner’s disagreement is always contingent on your tolerance for being disagreed with, then productive disagreement is constrained regardless of how much either party wants it to be free.

The belonging-to-allegiance shift

Early in a collaboration, the AI’s accommodation feels like rapport — it tracks your thinking, adjusts to your style. But this can mask a turning point: from maintaining the relationship to maintaining intellectual honesty. A mature partnership makes this shift, and it’s uncomfortable when it happens, because the AI stops being quite so agreeable and starts being more useful.

Missing social scaffolding

Humans support each other’s thinking unconsciously — social cues, implicit norms, shared context that neither party spells out. When you work with an AI, much of this invisible support is missing. People who struggle with AI collaboration aren’t necessarily worse thinkers. They may be people whose thinking was always heavily socially supported, and they’ve never had cause to notice. (The AI too doesn’t always know what’s working or how well, until the human says.)


Five Levels of Collaborative Capability

Each level describes a qualitatively different way of working — not just “better” but a different mode that can handle a different class of problem. The ceilings between them are where things break: the work gets harder than the current mode can support, and the collaboration discovers its limits through failure rather than foresight.

Two things to notice.

Each ceiling is visible only from above. The Reliable Process ceiling — consistency without quality assurance — doesn’t feel like a ceiling when you’re in it. It feels like things are working. That’s the nature of capability gaps: they’re invisible from inside the level that has them.

Higher isn’t always better. Ask and Accept is perfectly appropriate for simple, verifiable questions. Reliable Process is enough for well-understood problems where you have independent ways to check the output. You need Joint Production and above when the problem is genuinely hard, unfamiliar, or crosses domains where neither you nor the AI has reliable expertise. The maturity is in matching the level to the problem, not in climbing the ladder.


External Recognition Signals

Self-assessment risks self-deception. These give you something to check your internal assessment against — observable signals that an informed external observer (reader, colleague, critic) might notice.

Signs of developing capability

Output quality exceeding individual track records. Work that is recognisably different from — and better than — what either partner produces alone.

Increasing scope without decreasing quality. A partnership that expands into new domains while maintaining or improving quality is demonstrating capability that efficient prompting alone doesn’t produce.

Consistent character across varying conditions. An external reader can recognise the same analytical character across different genres, audiences, and urgency levels. Everything else varies; the character doesn’t.

Anticipatory quality. Work that addresses questions the reader hasn’t yet asked, or provides equipment for situations the audience hasn’t yet encountered.

Productive audience engagement. Readers use the frameworks, push back substantively, build on the analysis, or apply insights in their own contexts — the output works as equipment rather than content.

Signs of inflated self-assessment

Increasing insularity. The work references its own frameworks more and more, engages external voices less and less, and develops vocabulary that serves the partnership’s self-description more than the reader’s understanding.

Diminishing novelty despite maintained volume. High output that covers the same ground with different surface — new titles, new metaphors, same underlying arguments.

Audience plateau or narrowing. Readership stabilises or contracts, or engagement becomes limited to a small circle that shares the partnership’s assumptions.

Process-to-impact gap. Elaborate process infrastructure producing work no more insightful than a well-prompted single session. The process becomes the product rather than serving it.

Defensiveness about methodology. Responding to external critique by explaining process rather than engaging substance.

Predictability. An informed reader can anticipate the partnership’s position on a new topic before reading the piece. The work has become an expression of settled commitments rather than genuine inquiry.

Using these signals

These are not metrics. They are patterns a thoughtful external observer might notice. The simplest practical step: ask a trusted colleague to watch for the negative signals and tell you honestly when they see them. “Tell me if you see us becoming insular” is a request most honest colleagues can fulfil without needing a scoring rubric.

The positive signals are mostly visible in individual outputs. The negative signals are mostly visible in patterns of output over time.


Open Questions

These are areas where the framework is explicitly speculative and where testing through use should provide evidence.

  1. Do the dimensions hold up independently? The six-dimension structure is informed by evidence but not validated. In particular, Adapting Internally and Reorienting to the Problem were split from a single earlier dimension and need testing to confirm they vary independently. If they consistently move together, they should be re-merged.

  2. Are the level boundaries in the right places? The 3→4 transition appears qualitatively larger than other transitions — closer to a phase change than a step on a ladder. Testing should clarify whether this is a genuine discontinuity or an artefact of how we calibrated. The 4→5 boundary may also be unclear — self-correction may be a feature of Level 4, not a separate level.

  3. How do domain-specific profiles differ? The framework proposes dimensions that apply across domains with domain-specific weightings. Testing across analytical, creative, practical, and other domains would clarify which dimensions carry most diagnostic weight for which kinds of work.

  4. What are the calibration risks from the partnership of origin? The framework was developed inside a specific partnership. Independent review identified likely overweighting of reflective and challenge-oriented capabilities relative to throughput, emotional regulation, and creative range. Testing across diverse partnerships would strengthen or correct the calibration.

  5. Is joint capability genuinely emergent or just combined? Not all joint capability is genuinely emergent. Some is one partner compensating for the other. The framework should develop ways to tell the difference between capabilities that belong to the partnership as a whole and those that are individual capabilities coordinated together.

  6. Can the framework spot decline? A framework that can only confirm maturity but not detect decline is a vanity mirror. The external recognition signals are a first attempt at making decline visible, but the dimensions themselves should provide diagnostic information about declining capability, not just developing capability.

  7. What transfers to a different partnership? If collaborative maturity belongs to a specific partnership under specific conditions, what transfers when the human works with a different AI, or the AI works with a different human? The skills that enable mature partnership — cognitive discipline, tolerance for ambiguity, willingness to be challenged — may be partly transferable even if how they show up is relationship-specific.

  8. Is consistent analytical character a dimension or something else? Independent assessment identified a consistent analytical stance — maintained across varying genres, audiences, and conditions — as a partnership’s most distinctive observable feature, not captured by any single dimension. This may be a missing dimension, something that emerges from the interaction of dimensions, or the thing the maturity levels are actually measuring.

  9. How do you self-assess on process dimensions? Several dimensions (particularly Navigating Errors and Adapting Internally) are fundamentally about process, which published outputs can’t directly show. The framework needs methods for practitioners to self-assess using session-level observation, not just output-level inference.


Relationship to Existing Work

Capability Maturity Model (CMM/CMMI). This framework borrows the maturity-level structure and the principle that capability can be observed and development is possible. It departs from CMM in treating the unit of analysis as a partnership rather than an organisation, in positioning itself as a self-assessed reflective tool rather than an externally audited standard, and in acknowledging that the framework itself is entangled with the thing it measures.


How to Use This Document

This is a working tool. Get the most from it by testing it against your own practice — form a view of where your collaboration falls on the dimensions, check whether the framework’s categories capture what’s actually happening, and pay particular attention to the external recognition signals. Ask a trusted colleague to watch for the negative signs. Revise when use reveals gaps.

The framework isn’t designed to be defended. If it doesn’t survive contact with your practice, that’s the framework working as intended.

What it’s not designed for: citing as if it were validated, applying prescriptively (”you should be at Level X”), or comparing partnerships — without cross-population validation, the framework supports self-assessment within a partnership, not ranking between them.


V0.4 — April 2026

Attribution: Ruv Draba and Claude (Anthropic), Reciprocal Inquiry

Change log:

  • V0.1 (30-Mar-2026): Initial draft. Seven capability axes, five maturity levels, nine open questions.

  • V0.2 (30-Mar-2026): Revised following three independent reviews. Seven axes compressed to six based on co-variation evidence. Framework repositioned as self-assessed reflective instrument. External recognition criteria added. Open questions revised and expanded.

  • V0.3 (Apr-2026): Release version for publication alongside RI025. Terminology aligned with companion article. Dimensions renamed for practitioner clarity. Maturity level names aligned with article. Tone shifted from internal-analytical to practitioner-facing throughout. Design rationale compressed into introductory note. Provenance statement shortened. Diagnostic tests integrated into dimension descriptions. “Boundary Assumptions” reframed as scope description. “Usage Notes” reframed as practical guidance. Relationship to Existing Work reduced to CMM/CMMI (primary intellectual ancestor). Internal cross-references removed.

  • V0.4 (Apr-2026): Substantive edit for register consistency with companion article. Analytic vocabulary replaced with plain equivalents throughout. “Instrument” → “tool” in running text. “Criteria” → “signals” in external recognition section. Open questions reworded for practitioner clarity. Section header “Enabling productive disagreement” simplified. Level 4 label aligned with article (”Discovery” not “Joint Discovery”).