๐Ÿง  Estimating Human vs AI Moral Competences

๐Ÿ“‹ Overview

Large-scale comprehensive evaluation of LLMs on moral reasoning using Haidt's Moral Foundations Theory and statistical modeling.

This project compares AI performance against human annotators across moral dimensions: care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/degradation.

๐ŸŽฏ Key Findings

AI models show more balanced predictions and much fewer false negatives (missed findings) compared to human annotators, achieving 75th-100th percentile performance across moral foundations.

๐Ÿ“Š AI vs Human Performance

Interactive performance rankings across moral dimensions

โš–๏ธ AI vs Human Errors

Interactive false positive and false negative rates comparison

๐Ÿงช Methodology

Step 1: ๐Ÿ“Š Dataset Standardization

Standardize three moral psychology datasets (MFRC, MFTC, eMFD) into unified 5-foundation taxonomy. Clean annotations from multiple human annotators across Reddit, Twitter, and forum text domains.

Step 2: ๐Ÿค– LLM Evaluation

Evaluate multiple state-of-the-art language models (Claude-4, DeepSeek-V3, Llama4-Maverick) on moral foundation classification using standardized prompting and async batch processing.

Step 3: ๐Ÿ“ˆ Competence Modeling

Apply novel GPU-efficient Dawid-Skene statistical model to estimate annotator competences, compare AI vs human performance, and generate percentile rankings across moral dimensions.

โš™๏ธ Dependencies

datasets tensorflow anthropic openai replicate wandb papermill

๐Ÿ“‚ View Repository on GitHub