Analyzing Sentiment in Twitter Airline Data with Python

5 min readApr 10, 2024

Sentiment Analysis

Photo by Philippe Murray-Pietsch on Unsplash

Purpose

The goal of the project is to classify airline tweets as positive and negative by building text classifier using straightforward LSTM(Long Short-Term Memory) architecture, with Dropout mechanism.

Data Preprocessing

In the dataset, there are a total of 14,640 entries, each containing 15 variables.

These variables include tweet ID, airline sentiment, sentiment confidence, negative reasons, confidence in negative reasons, airline name, airline sentiment gold, user name, negative reason gold, retweet count, text content, tweet coordinates, tweet creation timestamp, tweet location, and user time zone.

Notably, there are no instances of null data within the dataset, ensuring the integrity and completeness of the information for analysis.

For the purposes of this project, I narrowed down the dataset to include only the ‘text’ and ‘sentiment’ columns, as they are the most relevant for sentiment analysis.

Given that this project focuses on binary classification, where the goal is to distinguish between positive and negative sentiments, I removed rows from the dataset containing ‘neutral’ reviews. This step ensures that our analysis is targeted towards classifying tweets into two distinct categories, enhancing the effectiveness of the sentiment analysis model.

In the dataset, there are 9,178 instances of negative sentiment and 2,363 instances of positive sentiment regarding airline experiences.

This class imbalance, with a higher frequency of negative sentiments, underscores the importance of appropriately handling class distribution during model training to ensure balanced performance and accurate classification results.

To prepare the data for modeling, all categorical sentiment values were transformed into numeric format. Specifically, ‘0’ was assigned to represent positive sentiment, while ‘1’ was designated for negative sentiment.

The text data from the dataset was extracted and tokenized using a tokenizer, with a predefined maximum vocabulary size of 5000 words. By utilizing the fit_on_text() method, the tokenizer established a mapping between the words in the dataset and their corresponding numerical representations. This mapping was stored as a dictionary within the tokenizer.word_index attribute, allowing for efficient encoding of text data into numerical sequences for subsequent processing and analysis.

To ensure uniformity in input dimensions for the model, I implemented padding on the sentences from the dataset. This process involved extending or truncating the sequences to achieve a consistent length of 200 tokens.

2. Build Text Classifier

In the machine learning, LSTM layers are commonly used for sentiment analysis task due to their capability to capture long-term dependencies among words within a sentence. The architecture of our model encompasses an embedding layer, an LSTM layer, and a Dense layer positioned at the end.

To mitigate the risk of overfitting, we incorporated a Dropout mechanism between the LSTM layers. Additionally, the vocabulary size was predefined as 30,000 to accommodate a diverse range of words and enhance the model’s ability to capture nuanced language patterns.

3. Train the sentiment analysis model

The model underwent training for 5 epochs using the entire dataset, employing a batch size of 32 and a validation split of 20%. Following the training process, the model achieved an impressive accuracy of 96% on the training set and 94% on the test set.

These high accuracy scores signify the effectiveness of the model in accurately classifying sentiments expressed in tweets about US airlines, thereby demonstrating its robust performance in sentiment analysis tasks.

The visualization of the accuracy plots for both training and validation sets over the 5 epochs reveals a noticeable disparity between the two. Such a discrepancy suggests the possibility of overfitting, wherein the model becomes overly specialized to the training data and consequently struggles to generalize well to unseen data.

This phenomenon often occurs when the model is excessively complex or has been trained excessively on the training data.

I tested 5 sentences, and it accurately classified both positive and negative tweets.

Conclusion

In analyzing Twitter airline data for sentiment analysis, our project yielded promising results, showcasing high accuracy on both the training and test datasets. However, a noticeable gap between training and validation accuracies hints at potential overfitting. While the model effectively classified test sentences, its ability to generalize to unseen data warrants attention.

Suggestions for Improvement

Embrace Regularization Techniques: Incorporate methods like L2 regularization or dropout regularization to counter overfitting by imposing penalties on the model’s weights or intermittently disconnecting connections during training.
Adopt Cross-Validation: Employ k-fold cross-validation to gauge the model’s performance across multiple validation sets, offering a more comprehensive assessment of its generalization capacity.
Leverage Data Augmentation: Enhance the training data by introducing variations in the text, such as synonyms, paraphrases, or back-translations. This approach diversifies the dataset, bolstering the model’s ability to generalize.
Simplify Model Architecture: Streamline the model architecture by minimizing parameters or opting for simpler network structures. This simplification reduces complexity, alleviating the risk of overfitting.
Fine-Tune Hyperparameters: Systematically experiment with hyperparameters like learning rate, batch size, and epoch count to optimize model performance.

By integrating these strategies, we can fortify the robustness and applicability of our sentiment analysis model, enhancing its reliability in real-world scenarios.

Analyzing Sentiment in Twitter Airline Data with Python

Written by Clara L.