|

How AI Mapped 20,000 Everyday Interactions—and What It Reveals About Social Life

What if you could press pause on the chaos of everyday conversation—the banter with a friend, the tense chat with your boss, the quick apology to a partner—and see the hidden structure underneath? That’s exactly what a new study has done, using generative AI to decode more than 20,000 two-person interactions and build a data-driven map of how social situations actually work.

Published in Psychological Science by Sudeep Bhatia and colleagues, this research turns messy human moments into measurable patterns—and the results are as practical as they are mind-expanding. Think: better coaching tools for mental health, more empathic chatbots, smarter training for managers, and fresh momentum for social science itself.

Here’s what the team found, why it matters, and how you can start using similar methods today.

Note: The study (DOI: 10.1177/09567976261418946) was reported by Phys.org and analyzed crowdsourced descriptions of two-person interactions across a wide range of contexts.

The study in a nutshell

  • Who: Sudeep Bhatia and colleagues, publishing in Psychological Science (journal site)
  • What: A large-scale, AI-powered taxonomy of two-person social interactions built from 20,000+ textual descriptions
  • How: Generative AI—large language models (LLMs) similar to those from OpenAI—coded each scenario on dimensions including conflict, cooperation, power dynamics, and emotional tone
  • Why: Manual social coding is slow and subjective; AI enables faster, more consistent analysis at scale
  • Headline result: AI coding reached 92% inter-rater reliability, outperforming human annotators and enabling scalable behavioral insights

The dataset, crowdsourced from diverse demographics, spanned everything from friendly check-ins to workplace negotiations—exactly the kinds of interactions that make up our daily lives.

Turning messy human moments into measurable data

From words to features: how LLMs “read between the lines”

Human language hides signals. A sentence that looks innocuous on the surface—“I’ll get that report to you soon”—can encode deference, duty, or power. The researchers used LLMs to detect precisely these subtleties, coding features like:

  • Conflict orientation: Is this a disagreement or an alignment?
  • Cooperation and mutuality: Are people working toward a shared goal?
  • Power asymmetry: Do interruptions, titles, or compliance indicate hierarchy?
  • Emotional tone: Warmth, tension, disappointment, relief, or pride

Because LLMs were trained on vast corpora of human text, they’re adept at spotting patterns we often intuit but rarely formalize. That’s why they could detect subtle cues—like who interrupts whom, or who softens requests—that often precede conflict or reveal implicit status.

A taxonomy of everyday social life

From this granular coding, the team built a taxonomy: a structured classification system that groups interactions by their core dynamics. While the taxonomy spans a spectrum, several themes stood out:

  • Affiliative interactions: Bonding, support, humor, and mutual care—dominant in friendships
  • Instrumental interactions: Task-driven exchanges, duty, and role-based obligations—prevalent at work
  • Power-misaligned interactions: Deference, hierarchy, command-and-control tones—often predictors of conflict
  • Hybrid categories: Blends of warmth and duty, such as “reciprocal obligation” in family contexts (e.g., “I’ll take care of this because you did that for me”)

This structure doesn’t just explain individual cases—it reveals systemic patterns in how relationships operate.

Key findings you can use

  • Friendships aren’t just about fun. Affiliative exchanges dominate friendships, clocking in at around 65% of interactions in these contexts. That’s a big validation of the role of consistent positive micro-moments—check-ins, jokes, small favors—in relationship health.
  • Workplaces run on instruments of duty. About 40% of workplace interactions are “instrumental”—centered on tasks, deadlines, roles, and hierarchy. Duty and hierarchy are built into many of these exchanges, which helps explain why conflict can feel impersonal but persistent in professional environments.
  • Conflict clusters where power is uneven. The models found a strong correlation between conflict-prone situations and power asymmetries. Small cues—interruptions, hedging, formal titles, or deferential language—signal where friction may flare.
  • Hybrid categories are common—and important. Family dynamics often blend care and obligation (e.g., “reciprocal obligation”), which can be both stabilizing and a source of quiet resentment if imbalances persist.
  • AI can outperform humans at scale. With 92% inter-rater reliability, the AI consistently matched or exceeded human coding performance, with fewer subjective drifts across raters. That makes large, ongoing observational studies far more feasible.

These are not just abstract findings. If you lead a team, negotiate with clients, or build products that talk to people, these patterns are actionable.

Why this matters for psychology, products, and people

Scalable, less-subjective social science

Traditional behavioral coding requires teams of trained annotators reading transcripts, debating edge cases, and tabulating scores—a months-long process prone to drift. The LLM approach standardizes that process, making it:

  • Faster: Thousands of interactions can be coded in hours
  • More consistent: Less noise from annotator mood, fatigue, or bias
  • More flexible: New dimensions (e.g., sarcasm or obligation) can be added without restarting from scratch

In other words, social psychology can finally keep pace with the richness and volume of real human behavior.

Mental health apps and coaching tools

From CBT chatbots to relationship coaching platforms, mental health tools depend on accurate detection of social context. This taxonomy helps:

  • Flag high-risk patterns (e.g., escalating conflict under hierarchy)
  • Suggest reframes (“shift from instrumental to affiliative language here”)
  • Personalize support (e.g., family “reciprocal obligation” tendencies vs. workplace duty scripts)

Imagine a coach that recognizes, in real time, whether your message will land as supportive or controlling—and suggests alternatives before you hit send.

Empathy training and education

Training LLMs to simulate empathic responses has been hit-or-miss. A structured understanding of social types can help models adapt tone and content to context. Teaching a system to “recognize it’s in an affiliative interaction vs. an instrumental one” could dramatically reduce awkward, tone-deaf outputs.

AI safety and better human–AI interactions

Understanding human social structures isn’t just useful for studying people—it’s essential for designing safer AI. If a system recognizes signs of power imbalance, distress, or conflict, it can:

  • De-escalate heated exchanges
  • Avoid manipulative or dismissive language
  • Make conservative choices about advice-giving and boundaries

That’s a recipe for more trustworthy, more aligned AI assistants.

Limitations and ethics you shouldn’t ignore

No model sees the world from nowhere. The team notes several constraints and open questions:

  • Cultural bias: The training data and crowdsourced scenarios skew Western. Social norms around deference, face-saving, humor, and obligation vary widely by culture. Expanding with multilingual models (e.g., Anthropic’s Claude) and region-specific datasets is essential.
  • Data privacy: Interaction data is sensitive. Even anonymized text can carry identifiers. Firms should secure consent, minimize sensitive collection, and adopt privacy-preserving analysis practices.
  • Hallucination risk: LLMs sometimes over-infer. Without proper guardrails, a model can confidently mislabel a neutral request as controlling or misread sarcasm as hostility.
  • Fairness across contexts: Workplaces with different power structures (e.g., flat startups vs. rigid bureaucracies) will show distinct patterns. Models need domain calibration to avoid one-size-fits-all conclusions.

Guardrails for responsible use

  • Obtain explicit, informed consent for any logging of interpersonal data
  • Anonymize and minimize; drop names and specifics that aren’t essential
  • Calibrate models on local context (industry, region, language norms)
  • Validate against human-coded subsets regularly; monitor drift
  • Disclose use of AI coding to stakeholders; enable appeal and oversight
  • Avoid high-stakes, automated decisions based purely on AI-coded social signals

What this means for your organization

Whether you’re a product leader, researcher, or HR executive, these insights can translate into concrete wins.

  • Product and UX: Build chat features that adapt tone (affiliative vs. instrumental), detect rising conflict, and suggest de-escalation strategies. Offer “tone rewrites” that respect power dynamics and intent.
  • HR and leadership: Train managers to recognize conversational markers of coercion or dismissal (e.g., chronic interruptions). Use anonymous, consented text snippets from retrospectives to spot systemic friction.
  • Customer success and sales: Coach teams on switching modes—start affiliative to build trust, then shift to instrumental when defining next steps. Surface “relationship health” indicators based on language.
  • Research teams: Replace months of manual coding with standardized, auditable pipelines. Replicate and extend studies faster, and apply the taxonomy to new domains (e.g., telehealth, education, online communities).
  • Brand and marketing: Align messaging with context. Don’t deploy instrumental, duty-laden scripts in spaces where audiences expect warmth and care.

A peek at the next frontier: multimodal social understanding

Text tells a lot—but it doesn’t capture tone of voice, pauses, facial expressions, or posture. The researchers plan to incorporate video and audio, enabling:

  • Prosody analysis: Pitch, pacing, and volume as early conflict/engagement signals
  • Turn-taking quality: Who yields, interrupts, or overlaps
  • Micro-expressions: Fleeting signals of discomfort, contempt, or relief
  • Gesture and space: Physical cues of authority, deference, or intimacy

Multimodal models could pinpoint when an “okay” is agreement, resignation, or sarcasm—and adjust accordingly. That’s a big step toward AI that can read the room, not just the words.

How to experiment with this today

You don’t need a research lab to benefit from this approach. Here’s a practical starting plan for teams:

  1. Define your goals – Examples: Detect early signs of conflict in support chats; improve manager feedback tone; segment conversation types for product analytics.
  2. Collect consented, de-identified text – Pull from helpdesk logs, coaching transcripts, surveys, or opt-in workplace tools. – Strip names, locations, and sensitive identifiers.
  3. Choose your models – Start with well-documented APIs such as OpenAI or Anthropic’s Claude. – For privacy or customization, consider hosting vetted open models via Hugging Face.
  4. Build a coding schema – Start simple: conflict vs. affiliation, instrumental vs. relational, power asymmetry, and emotional tone. – Use a few exemplar prompts per category for consistency.
  5. Validate reliability – Double-code a sample with humans. Track percent agreement and inter-rater metrics (see inter-rater reliability). – Iterate prompts and instructions until reliability stabilizes.
  6. Monitor and debias – Compare results across demographics and contexts. – Adjust prompts or add calibration layers where bias appears.
  7. Ship responsibly – Use the taxonomy to drive suggestions, not judgments. – Log model rationales and enable human review for edge cases.

Method spotlight: what does 92% inter-rater reliability mean?

Inter-rater reliability is a measure of consistency between coders. If two independent coders look at the same interaction and reach the same label most of the time, reliability is high. In this study, the AI’s coding agreed with reference standards 92% of the time—better than typical human-to-human agreement in similar tasks.

Key takeaways:

  • It’s a strong signal that the model captures stable patterns across contexts.
  • It doesn’t mean the model is always “right,” just consistently applying the same criteria.
  • Human oversight still matters—especially for ambiguous or high-stakes cases.

Case vignettes (hypothetical) to make it concrete

  • The rushed request
  • “Hey, I need that deck by 3 p.m. today—can you move it up the queue?”
  • Likely classification: Instrumental; moderate power asymmetry; potential for conflict if repeated without reciprocity.
  • Suggestion: Add affiliative cushioning and context (“I appreciate you. This is urgent because…”).
  • The supportive nudge
  • “You’ve been making great progress. Want to pair up on the tricky parts?”
  • Likely classification: Affiliative-plus-instrumental hybrid; high cooperation; low conflict risk.
  • Suggestion: Keep it. This is gold for team cohesion.
  • The family favor
  • “I covered for you last week—could you pick up Mom’s meds today?”
  • Likely classification: Reciprocal obligation; affiliative foundation with duty overlay.
  • Suggestion: Acknowledge the relationship first (“Thanks again for last week; love you”), then request.
  • The masked disagreement
  • “If that’s what you think is best, sure.”
  • Likely classification: Latent conflict; deference; possible resentment if pattern persists.
  • Suggestion: Invite explicit preferences and reduce power distance (“I genuinely want your view—what’s your take?”).

The bigger shift: from theory-led to data-led psychology

Classic social psychology often starts with a theory, then looks for supporting evidence. This study flips that script: start with the data, detect patterns, and let theory catch up. That’s not a rejection of theory—it’s a richer partnership. With LLMs doing the heavy lifting on first-pass coding, researchers can:

  • Explore larger, more diverse samples
  • Discover hybrid or emergent categories (like “reciprocal obligation”)
  • Test theories against real-world data, not just lab setups

In short, we’re moving toward a social science that’s more empirical, more representative, and more useful.

Frequently asked questions

  • What exactly is a “taxonomy” of social interactions?
  • It’s a structured classification system that groups similar interaction types (e.g., affiliative vs. instrumental) based on features like tone, goals, and power dynamics.
  • Did AI really beat human coders?
  • The model reached 92% inter-rater reliability, which outperformed typical human annotators in similar tasks. That means it was highly consistent—crucial for large-scale studies.
  • Does this replace psychologists or counselors?
  • No. It augments them. AI can pre-code and surface patterns, but human judgment is essential for interpretation, ethics, and context.
  • Is the data private and safe?
  • It should be. Responsible use requires explicit consent, de-identification, and strict access controls. Never feed sensitive data into third-party models without robust privacy safeguards.
  • Will cultural biases skew results?
  • Yes, if not addressed. The current approach skews Western. Expanding datasets and using multilingual models (such as Anthropic’s Claude) are critical steps toward fairness.
  • Can small teams use this method?
  • Absolutely. With careful scoping, a small dataset, and a simple schema, you can get reliable insights quickly—especially if you validate with a human-coded subset.
  • What is “reciprocal obligation” in family dynamics?
  • It’s a hybrid category where care and duty intertwine—“I’ll help because you helped me.” It’s stabilizing but can generate pressure if the ledger feels unbalanced.
  • When will models understand audio and video well enough for this?
  • Multimodal models are advancing quickly. Incorporating tone, timing, and expression is on the near horizon for research-grade tools; widespread, dependable deployment will require careful validation.

Clear takeaway

Generative AI just gave social science a powerful new lens. By coding 20,000 everyday interactions with 92% reliability, researchers mapped the hidden structure of our daily conversations—revealing how affiliation, duty, power, and emotion interact to shape outcomes. The payoff is huge: more empathic products, smarter coaching, fairer workplaces, and a faster, more data-driven psychology.

Use this study as a template. Start small, code responsibly, validate often, and design with context in mind. The better we understand the patterns of social life, the better we can show up for each other—online, at work, and at home.

Discover more at InnoVirtuoso.com

I would love some feedback on my writing so if you have any, please don’t hesitate to leave a comment around here or in any platforms that is convenient for you.

For more on tech and other topics, explore InnoVirtuoso.com anytime. Subscribe to my newsletter and join our growing community—we’ll create something magical together. I promise, it’ll never be boring! 

Stay updated with the latest news—subscribe to our newsletter today!

Thank you all—wishing you an amazing day ahead!

Read more related Articles at InnoVirtuoso

Browse InnoVirtuoso for more!