Midterm Overall Learning Reflection

OthersITC C508Midterms

Submitted:November 23, 2025 10:00:00 PM

Entries

Text Entry

Week 8: Sentiment Analytics and Evaluation Metrics

The biggest takeaway from this week was realizing that a model's true performance is never captured by a single number like overall accuracy. My experience with sentiment analysis showed that facing class imbalance in real-world data requires a critical focus on metrics like F1-score, Precision, and Recall to ensure the model isn't simply ignoring the minority classes. For instance, while some classification tasks achieved over 90% accuracy, the sentiment model only hit 60%, highlighting the sensitivity to dataset quality. This week taught me the non-negotiable skill of rigorous metric evaluation, including using ROUGE metrics for generative tasks like summarization, which helped clarify the trade-off between content coverage and conciseness. This critical skill is vital for deploying reliable, unbiased systems in the IT sector.

Week 9: BERT-based Question and Answering

Studying BERT-based Question Answering (QA) was essential for understanding how to leverage powerful pre-trained models for information retrieval. The central lesson here was managing the Accuracy vs. Computational Efficiency trade-off. By benchmarking different BERT variants, I learned that a slightly larger model didn't always translate into a significant performance gain, but it definitely added computational overhead, which is a key concern in production. The analysis also confirmed that cased models are generally more accurate for QA than uncased versions, showing the importance of capitalization for contextual understanding. This taught me how to use the Exact Match (EM) score alongside Average Inference Time to make informed, practical model selection decisions, a crucial skill for building fast, reliable enterprise search and support tools.

Week 10: LLMs, LangChain, and RAG

This week was the most forward-looking, as it addressed the critical problem of hallucinations in Large Language Models (LLMs). I learned that Retrieval-Augmented Generation (RAG) is the state-of-the-art solution that grounds LLM outputs in verifiable, external documents, making them reliable and auditable. The introduction of LangChain was key, as it showed me a practical, modular framework for implementing the entire RAG pipeline—from connecting to a vector store to generating the final response—efficiently. This skill set—building an accurate and trustworthy LLM system—is arguably the most valuable for my IT career, directly preparing me for high-demand roles in modern AI application development.

Biggest Technical Challenge

The single biggest technical challenge across these three weeks was consistently managing the Trade-off between Performance Accuracy and Computational Efficiency/Latency.

This challenge was evident across tasks:

In Week 9 (BERT-QA), the choice of model directly dictated the balance between the Exact Match (EM) score and Inference Time. Prioritizing a marginally higher EM score often meant sacrificing speed, which is unacceptable in a user-facing application.
Even when building advanced systems in Week 10 (RAG), the need for both high accuracy (avoiding hallucinations) and fast, real-time responses is paramount.

In the professional IT world, resources like computing power and cloud costs are finite. The core engineering problem is not just achieving high performance, but achieving the optimal balance of accuracy and speed within those constraints. Learning to benchmark models using both precision metrics (like F1 or EM) and efficiency metrics (like inference time) is the key skill this coursework instilled, preparing me to make smart, cost-effective decisions in real-world MLOps.