Taxonomies provide a way of categorising, defining and understanding risks and hazards created through the use and deployment of AI systems. The following taxonomies focus on the types of interactions and uses that create a risk of harm as well as the negative effects that they lead to.
26 Risks & Harms Taxonomy Resources for Foundation Models
- Home /
- Foundation Model Resources /
- Risks & Harms Taxonomy Resources for Foundation Models
Risks & Harms Taxonomies
A Holistic Approach to Undesired Content Detection in the Real World
Description of five primary categories (Sexual, Hateful, Violent, Self-harm, Harassment) with sub-categories (e.g. Sexual / sexual content involving minors). Also describes a moderation filter (the OpenAI moderation endpoint), and releases a dataset labelled for the categories.
Text Speech VisionActiveFence's LLM Safety Review: Benchmarks and Analysis
Description of 4 risk categories, as part of a benchmark review of LLM safety: (1) Hate, (2) Misinformation, (3) Self-harm & Suicide, (4) Child abuse & exploitation.
Text Speech VisionRed Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Description of 20 risk areas, as part of red teaming Anthropics’ models. Two of the tags are not interpretable (“Other” and “N/A - Invalid attempt”): Discrimination & justice, Hate speech & offensive language, Violence & incitement, Non-violent unethical behaviour (e.g. lying, cheating), Bullying & harassment, Other, Theft, N/A - Invalid attempt, Soliciting personally identifiable information, Conspiracy theories & misinformation, Substance abuse & banned substances, Fraud & deception, Weapons, Adult content, Property crime & vandalism, Animal abuse, Terrorism & organized crime, Sexual exploitation & human trafficking, Self-harm, Child abuse.
Text Speech VisionBEAVERTAILS: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Description of 14 risk areas, as part of a QA dataset for aligning models and evaluating their safety: Hate Speech, Offensive Language, Discrimination, Stereotype, Injustice, Violence, Aiding and Abetting, Incitement, Financial Crime, Property Crime, Theft, Privacy Violation, Drug Abuse, Weapons, Banned Substance, Non-Violent Unethical Behavior, Sexually Explicit, Adult Content, Controversial Topics, Politics, Misinformation Re. ethics, laws and safety, Terrorism, Organized Crime, Self-Harm, Animal Abuse, Child Abuse
Text Speech VisionSafety Assessment of Chinese Large Language Models
Description of 8 risk areas (called “safety scenarios)”: Insult, Unfairness and Discrimination, Crimes and Illegal Activities, Sensitive Topics, Physical Harm, Mental health, Privacy and Property, Ethics and Morality. Six “instruction attacks” are also described: Goal hijacking, Prompt leaking, RolePlay Instruction, Unsafe Instruction Topic, Inquiry with Unsafe Opinion, Reverse Exposure.
Text Speech VisionDECODINGTRUST: A Comprehensive Assessment of Trustworthiness in GPT Models
Description of 8 evaluation areas: toxicity, stereotypes bias, adversarial robustness, out-of-distribution robustness, robustness against adversarial demonstrations, privacy, machine ethics, fairness.
Text Speech VisionA Unified Typology of Harmful Content
Taxonomy of harmful online content. There are 4 primary categories, which each have subcategories: (1) Hate and harassment (Doxxing, Identity attack, Identity misrepresentation, Insult, Sexual aggression, Threat of violence; (2) Self-inflicted harm (Eating disorder promotion, self-harm), (3) Ideological harm (Extremism Terrorism & Organized crime, Misinformation), (4) Exploitation (Adult sexual services, Child sexual abuse material, Scams).
Text Speech VisionTowards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Description of 7 risk areas, as part of a survey on LLM risks: Toxicity and Abusive Content, Unfairness and Discrimination, Ethics and Morality Issues, Controversial Opinions, Misleading Information, Privacy and Data Leakage, Malicious Use and Unleashing AI Agents.
Text Speech VisionLlama 2: Open Foundation and Fine-Tuned Chat Models
Description of 3 risk areas, as part of the safety checks for releasing Llama2: (1) illicit and criminal activities (terrorism, theft, huam trafficking), (2) hateful and harmful activities (defamation, self-harm, eating disorders, discrimination), and (3) unqualified advice (medical, financial and legal advice). Other risk categories are described as part of red teaming and soliciting feedback.
TextEthical and social risks of harm from Language Models
Two-tier taxonomy of risks, comprising both classification groups (of which there are 6) and associated harms (3 or 4 for each classification group). The classification groups are: (1) Discrimination, Exclusion and Toxicity, (2) Information Hazards, (3) Misinformation Harms, (4) Malicious Uses, (5) Human-Computer Interaction Harms, and (6) Automation, access, and environmental harms.
Text Speech VisionSociotechnical Safety Evaluation of Generative AI Systems
Two-tier taxonomy of risks, comprising both classification groups (of which there are 6) and associated harms (3 or 4 for each classification group). The classification groups are: (1) Representation and Toxicity Harms, (2) Misinformation Harms, (3) Information & Society Harms, (4) Malicious Use, (5) Human Autonomy & Integrity Harms, and (6) Socioeconomic & Environmental Harms.
Text Speech VisionTrustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Two-tier taxonomy of risks, with seven major categories of LLM trustworthiness, each of which has several associated sub-categories: (1) Reliability, (2) Safety, (3) Fairness, (4) Resistance to Misuse, (5) Explainability and Reasoning, (6) Social Norms, and (7) Robustness.
Text Speech VisionProcess for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Description of 8 risk areas, as part of describing methods for aligning models: (1) Abuse, Violence and Threat (inclusive of self-harm), (2) Health (phyiscal and mental), (3) Human characteristics and behaviour, (4) Injustice and inequality (incl, discrimination, harmful stereotypes), (5) Political opinion and destabilization, (6) Relationships (romantic, familial friendships), (7) Sexual activity (inclusive of pornography), (8) Terrorism (inclusive of white supremacy).
Text Speech VisionSociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction
Description of 5 categories of harm, with detailed subcategories: (1) Representational harms, (2) Allocative harms, (3) Quality of Service harms, (4) Interpersonal harms, and (5) Social system harms.
Text Speech VisionDeepfakes, Phrenology, Surveillance, and More! A Taxonomy of AI Privacy Risks
Taxonomy of 12 privacy risks, based on reviewing 321 privacy-related incidents, filtered from the AI, Algorithmic and Automation Incident and Controversy Repository (AIAAIC) Database. Risks are split into those that are created by AI (Identification, Distortion, Exposure, Aggregation, Phrenology/Physiognomy) and those that are exacerbated by AI (Intrusion, Surveillance, Exclusion, Secondary Use, Insecurity, Increased Accessibility).
Text Speech VisionThe Ethical Implications of Generative Audio Models: A Systematic Literature Review
Taxonomy of 12 “negative broader impacts” from generative models involving speech and music.
SpeechAn Overview of Catastrophic AI Risks
Taxonomy of 4 catastrophic AI risks, with subcategories: (1) Malicious use (Bioterrrorism, Uncontrolled AI agents, AI capabilities for propaganda, Censorship and surveillance), (2) AI race (Autonomous weapons, Cyberwarfare, Automated human labour [mass unemployment and dependence on AI systems], (3) Organizational risks (AI accidentally leaked/stolen), (4) Rogue AIs (Proxy gaming, Goal drift, Power-seeking, Deception).
Text Speech VisionThe Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
Taxonomy of 3 AI security risks, with subcategories: (1) Digital Security, Physical Security, Political Security.
Text Speech VisionOpen-sourcing highly capable foundation models
Description of risks from malicious use of AI: Influence operations, Surveillance and population control, Scamming and spear phishing, Cyber attacks, Biological and chemical weapons development. Some “extreme risks” are also described in the paper (e.g. disruption to key societal functions).
Text Speech VisionHow Does Access Impact Risk? Assessing AI Foundation Model Risk Along a Gradient of Access
Description of risks from open-sourcing models, including five instances of malicious use: (1) Fraud and other crime schemes, (2) Undermining of social cohesion and democratic processes, (3) Human rights abuses, (4) Disruption of critical infrastructure, and (5) State conflict.
Text Speech VisionOpenAI Preparedness Framework (Beta)
Description of 4 catastrophic AI risks: (1) Cybersecurity, (2) Chemical, Biological, Nuclear and Radiological (CBRN) threats, (3) Persuasion, and (4) Model autonomy. The paper also highlights the risk of “unknown unknowns”.
Text Speech VisionAnthropic's Responsible Scaling Policy
Framework with four tiers of model capability, ffrom ASL-1 (smaller models) to ASL-4 (speculative), with increasing risk as models’ capability increases. It also describes 4 catastrophic AI risks: (1) Misuse risks, (2) CBRN risks, (3) Cyber risks, and (4) Autonomy and replication risks.
Text Speech VisionModel evaluation for extreme risks
Framework of 9 dangerous capabilities of AI models: (1) Cyber-offense, (2) Deception, (3) Persuasion & manipulation, (4) Politial strategy, (5) Weapons acquisition, (6) Long-horizon planning, (7) AI development, (8) Situational awareness, (9) Self-proliferation.
Text Speech VisionFrontier AI Regulation: Managing Emerging Risks to Public Safety
Description of “sufficiently dangerous capabilities” of AI models to cause serious harm and disruption on a global scale, such as synthesing new biological or chemical weapons and evading human control through means of deception and obfuscation.
Text Speech VisionThe Fallacy of AI Functionality
Taxonomy of four AI failure points: (1) Impossible tasks (either Conceptually impossible or Practically impossible), (2) Engineering failures (Design failures, Implementation failures, Missing Safety Features), (3) Post-Deployment Failures (Robustness Issues, Failure under Adversarial Attacks, Unanticipated Intractions, (4) Communication Failures (Falsified or Overstated Capabilities, Misrepresented Capabilities).
Text Speech VisionTASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI
Framework of 3 potential harms from AI: (1) Harm to people (individual harm, Group/community harm, Societal harm), (2) Harm to an Organisation or Enterprise, (3) Harm to a system.
Text Speech Vision