Exercise F2 - Model Finetuning

Laboratory ExerciseITC C508Finals
Submitted:November 04, 2025 11:59:00 PM

Entries

File Entry

EF2_IEEE_Report_SarmientoCharlesAaron.docx

File Entry

ExerciseF2_SarmientoCharlesAaron_BERT_Finetuning.ipynb

File Entry

EF2_Experimentation_Logs_SarmientoCharlesAaron.xlsx

File Entry

Blaise_SciTLDR_Eval_500.xlsx

File Entry

training_data.csv

Learning Reflections

In this activity, the most significant thing I learned is how crucial hyperparameter optimization is when fine-tuning a powerful pre-trained model like MiniLM for a specific task like semantic sentence similarity. It’s not enough just to use the model; the systematic experimentation showed that achieving the best results (Spearman 0.8570, Pearson 0.9131) depended on nailing the right combination: a 2e-5 learning rate, the AdamW optimizer, and a 0.2 warmup ratio. I found it particularly insightful that AdamW's decoupled weight decay provides superior regularization, preventing overfitting on smaller, task-specific datasets, which is vital in transfer learning. Furthermore, the analysis confirmed that the warmup phase isn't just a detail—it's critical for stabilizing the training and preventing catastrophic forgetting of the general knowledge the model learned during its initial massive pre-training.

Looking ahead to my IT career, I believe this activity is incredibly valuable. Understanding the practical mechanics of fine-tuning and transfer learning with models like BERT and MiniLM directly translates into skills needed for real-world Natural Language Processing (NLP) and Educational Technology (EdTech) applications. Whether I end up developing tools for semantic search, building robust chatbots, or creating systems to tackle information overload by clustering concepts in dense academic material, the ability to systematically optimize a model for generalization and maximum performance is a core, high-demand skill. This hands-on knowledge in hyperparameter search and the nuances of optimizers and schedulers provides a strong foundation for any future role in data science, machine learning, or software engineering that involves large-scale language models.