Paper Image

Detecting Gendered Abuse in Indian Languages

Published on:

2 April 2024

Primary Category:

Computation and Language

Paper Authors:

Advaitha Vetagiri,

Gyandeep Kalita,

Eisha Halder,

Chetna Taparia,

Partha Pakray,

Riyanka Manna

Bullets

Key Details

Ensemble CNN and LSTM networks capture spatial and temporal patterns in text

Over 7,600 annotated Twitter posts across English, Hindi and Tamil

CNN extracts localized features indicative of abusive language

LSTM analyzes sequence for context-based dependencies

Custom embeddings and tuning improve detection capability

AI generated summary

Detecting Gendered Abuse in Indian Languages

This paper presents an ensemble deep learning approach using CNN and LSTM networks to detect gendered abuse and harassment in online posts. The models were trained on a dataset of over 7,600 Twitter posts annotated for explicit abuse, attacks on minorities, and general offenses across English, Hindi and Tamil. The technique combines CNN's ability to capture localized textual features indicative of abuse, with LSTM's sequence modeling that analyzes context and dependencies. Multiple variations using FastText and GloVe embeddings were validated, especially an English model reaching an F1 score of 0.84. The experiments and top competition rank demonstrate the promise of custom embeddings and hyperparameters tuning to handle real-world noisy text and code-switching in combating cyber harassment.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up