Pushing the Boundaries of Deep Learning for Cyberbullying Detection

Discover how creativity and technology are transforming modern design with AI tools, virtual reality, and generative design, reshaping creative processes and future trends.

Share this article

The concept of sharing on the internet has no doubt been a hit since the time Facebook was launched. However, in today’s world, social media has become a daily necessity, an addiction that in turn facilitates as a source of income for absolutely anyone willing to put themselves out there.Proportional to this splurge in the number of content creators across various platforms, the amount of cases reported due to cyberbullying has also exponentially increased.

This pressing concern lies in the fact that it does not just consist of any one type of harassment, but spreads across many cultural domains, ethnicity boundaries and gender speculations, which is why it is necessary to create a robust system that looks into all of these subcategories, 6 in this case and classifies them aptly.

Machine Learning has been extensively explored for NLP tasks including but not limited to sentiment analysis, emotion recognition and even cyberbullying detection but what Text Analytics truly needs is the involvement of high-functioning, combinational and relatable Deep Learning models curated precisely for the purpose and tasks at hand.

The two models I picked and combined for this task were Convolutional Neural Networks (CNNs) and Bidirectional Long Short Term Memory (BiLSTM) Networks to create a hybrid CNN-BiLSTM architecture that could leverage the feature extraction capabilities of CNNs for identifying relevant local patterns and the contextual understanding of BiLSTMs for capturing intricate relationships over longer sequences.

This combinational neural network was then trained on a dataset of 47,692 labelled entries of comments taken from social media categorized into 6 classes.Let’s get into the implementation

Step 1: Exploratory Data Analytics and Data CleaningEnsuring an unbiased and robust training process begins with a meticulous examination of the data landscape. By conducting EDA, we delve into the intricate patterns and distributions of all classes within the dataset. This critical step not only illuminates the prevalence of various cyberbullying instances but also lays the groundwork for achieving a balanced dataset.

Step 2: Prepare the Data for TrainingThe standard splitting ratio for data in machine learning models is typically 8:2 for training and testing respectively but for this hybrid deep learning model, a ratio of 7:3 is taken in order to have a more rigorous testing process. After the data is split and categorised as x (text) and y(class), the text is tokenized and the labels are One Hot Encoded.

Step 3: Word EmbeddingsBy integrating pre-trained GloVe vectors, we can infuse our textual data with rich semantic meaning, capturing subtle contextual relationships between words. This pivotal step not only facilitates the transformation of text into numerical representations but also enhances the model’s ability to discern intricate linguistic nuances.

Step 4: Hyperparameter Initialisation for CNN-BiLSTMIt is essential to tune the model according to the hyperparameters required. To accomplish this, any pre-trained word embedding corpus such as GloVe, Word2vec, or BERT can be used to transform input sequences into dense vectors.CNN: Convolutional 1D layers followed by MaxPooling layers successfully capture local patterns and downsample the features.BiLSTM: Captures both past and future context information of the sequences, enhancing the model’s ability to understand sequential data.

Step 5: Training CNN-BiLSTM for Results and AnalysisAccuracy and Loss plotsUpon training, the CNN-BiLSTM hybrid architecture demonstrates remarkable performance, achieving an impressive accuracy of 83%. This result underscores the efficacy in capturing nuanced semantic relationships within textual data, thereby showcasing its robust capabilities in Natural Language Processing (NLP) tasks.

GUI for Cyberbullying Detection in Real Time

GitHub Link: https://github.com/PriyankaKatariya/Cyberbullying-Detection-using-ML-and-CNN-BiLSTM

Written by Priyanka Katariya

Freelancing as a College Student: Start Earning Through Side Hustles

Oct 22, 2024

Freelancing as a College Student: Start Earning Through Side Hustles

Oct 22, 2024

Freelancing as a College Student: Start Earning Through Side Hustles

Oct 22, 2024

Three Years of Creating ML Projects : What Did I Learn?

Oct 10, 2024

Three Years of Creating ML Projects : What Did I Learn?

Oct 10, 2024

Three Years of Creating ML Projects : What Did I Learn?

Oct 10, 2024

I Tried Creating a Blog Generator Chatbot with 5 Different Open Source Transformers

Oct 4, 2024

I Tried Creating a Blog Generator Chatbot with 5 Different Open Source Transformers

Oct 4, 2024

I Tried Creating a Blog Generator Chatbot with 5 Different Open Source Transformers

Oct 4, 2024

A Tool for Optimizing Code Health Through Automatic Feedback and Visualization : preciseLake

Jun 15, 2024

A Tool for Optimizing Code Health Through Automatic Feedback and Visualization : preciseLake

Jun 15, 2024

A Tool for Optimizing Code Health Through Automatic Feedback and Visualization : preciseLake

Jun 15, 2024