Through this activity, I gained hands-on experience implementing four fundamental Natural Language Processing (NLP) tasks: text classification, POS tagging, sentiment analysis, and text summarization, primarily utilizing the spaCy library. The biggest takeaway for me was realizing the crucial impact of the dataset quality and class distribution on model performance. For example, the SMS spam classification and POS tagging models achieved strong accuracy (0.93 and 0.90, respectively) , while the sentiment analysis model, which was trained on a smaller, possibly imbalanced dataset, struggled with an accuracy of only 0.60. This practical contrast clearly demonstrates that a powerful model is only as good as the data it learns from, and performance evaluation isn't just about a single number, but about the balance between metrics like precision, recall, and F1-score. Additionally, the text summarization results, where ROUGE metrics showed high recall but low precision , highlighted the practical challenge of generating concise, focused output rather than just retaining broad content.
This project provides a robust foundation for my future IT career, especially in roles related to data science, machine learning, and AI application development. The ability to implement and, more importantly, critically evaluate NLP models using specific metrics like F1-score, for classification tasks, and ROUGE, for generative tasks, is a highly marketable skill. Understanding where a model is strong (like predicting frequent POS tags) and where it struggles (like less-represented classes or generating concise summaries) allows me to troubleshoot real-world systems effectively. Whether I am building a system for automated customer feedback analysis using sentiment analysis or developing better information retrieval tools through text classification, this hands-on experience ensures I can not only implement the code but also rigorously assess and improve the model's effectiveness and reliability for end-users.