Apr 25, 2024
/
4 min read
Pushing the Boundaries of Deep Learning for Cyberbullying Detection
Discover how creativity and technology are transforming modern design with AI tools, virtual reality, and generative design, reshaping creative processes and future trends.
Share this article
The concept of sharing on the internet has no doubt been a hit since the time Facebook was launched. However, in today’s world, social media has become a daily necessity, an addiction that in turn facilitates as a source of income for absolutely anyone willing to put themselves out there.Proportional to this splurge in the number of content creators across various platforms, the amount of cases reported due to cyberbullying has also exponentially increased.
This pressing concern lies in the fact that it does not just consist of any one type of harassment, but spreads across many cultural domains, ethnicity boundaries and gender speculations, which is why it is necessary to create a robust system that looks into all of these subcategories, 6 in this case and classifies them aptly.
Machine Learning has been extensively explored for NLP tasks including but not limited to sentiment analysis, emotion recognition and even cyberbullying detection but what Text Analytics truly needs is the involvement of high-functioning, combinational and relatable Deep Learning models curated precisely for the purpose and tasks at hand.

The two models I picked and combined for this task were Convolutional Neural Networks (CNNs) and Bidirectional Long Short Term Memory (BiLSTM) Networks to create a hybrid CNN-BiLSTM architecture that could leverage the feature extraction capabilities of CNNs for identifying relevant local patterns and the contextual understanding of BiLSTMs for capturing intricate relationships over longer sequences.
This combinational neural network was then trained on a dataset of 47,692 labelled entries of comments taken from social media categorized into 6 classes.Let’s get into the implementation
Step 1: Exploratory Data Analytics and Data CleaningEnsuring an unbiased and robust training process begins with a meticulous examination of the data landscape. By conducting EDA, we delve into the intricate patterns and distributions of all classes within the dataset. This critical step not only illuminates the prevalence of various cyberbullying instances but also lays the groundwork for achieving a balanced dataset.
Step 2: Prepare the Data for TrainingThe standard splitting ratio for data in machine learning models is typically 8:2 for training and testing respectively but for this hybrid deep learning model, a ratio of 7:3 is taken in order to have a more rigorous testing process. After the data is split and categorised as x (text) and y(class), the text is tokenized and the labels are One Hot Encoded.
Step 3: Word EmbeddingsBy integrating pre-trained GloVe vectors, we can infuse our textual data with rich semantic meaning, capturing subtle contextual relationships between words. This pivotal step not only facilitates the transformation of text into numerical representations but also enhances the model’s ability to discern intricate linguistic nuances.
Step 4: Hyperparameter Initialisation for CNN-BiLSTMIt is essential to tune the model according to the hyperparameters required. To accomplish this, any pre-trained word embedding corpus such as GloVe, Word2vec, or BERT can be used to transform input sequences into dense vectors.CNN: Convolutional 1D layers followed by MaxPooling layers successfully capture local patterns and downsample the features.BiLSTM: Captures both past and future context information of the sequences, enhancing the model’s ability to understand sequential data.
Step 5: Training CNN-BiLSTM for Results and AnalysisAccuracy and Loss plotsUpon training, the CNN-BiLSTM hybrid architecture demonstrates remarkable performance, achieving an impressive accuracy of 83%. This result underscores the efficacy in capturing nuanced semantic relationships within textual data, thereby showcasing its robust capabilities in Natural Language Processing (NLP) tasks.

GUI for Cyberbullying Detection in Real Time
GitHub Link: https://github.com/PriyankaKatariya/Cyberbullying-Detection-using-ML-and-CNN-BiLSTM
Written by Priyanka Katariya
Read more
