🧠 Estimating Human vs AI Moral Competences

📋 Overview

Large-scale comprehensive evaluation of LLMs on moral reasoning using Haidt's Moral Foundations Theory and statistical modeling.

This project compares AI performance against human annotators across moral dimensions: care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/degradation.

🎯 Key Findings

AI models show more balanced predictions and much fewer false negatives (missed findings) compared to human annotators, achieving 75th-100th percentile performance across moral foundations.

📊 AI vs Human Performance

Interactive performance rankings across moral dimensions

⚖️ AI vs Human Errors

Interactive false positive and false negative rates comparison

🧪 Methodology

Step 1: 📊 Dataset Standardization

Standardize three moral psychology datasets (MFRC, MFTC, eMFD) into unified 5-foundation taxonomy. Clean annotations from multiple human annotators across Reddit, Twitter, and forum text domains.

Step 2: 🤖 LLM Evaluation

Evaluate multiple state-of-the-art language models (Claude-4, DeepSeek-V3, Llama4-Maverick) on moral foundation classification using standardized prompting and async batch processing.

Step 3: 📈 Competence Modeling

Apply novel GPU-efficient Dawid-Skene statistical model to estimate annotator competences, compare AI vs human performance, and generate percentile rankings across moral dimensions.

⚙️ Dependencies

datasets tensorflow anthropic openai replicate wandb papermill

📂 View Repository on GitHub 📄 View arXiv Paper