The most significant thing I learned from this activity is the power and precision required to build a functional semantic similarity model for a real-world application, specifically for education. I realized that developing a tool to automatically assess the degree of similarity between academic source materials, like lecture notes, and a student’s written output, such as an essay or summary, hinges entirely on the fine-tuning process of an advanced model like Sentence-BERT (SBERT) or its efficient variant, MiniLM. It's not enough to just use a pre-trained model; the system's ability to accurately capture semantic equivalence—achieving high correlation scores like Spearman ≈0.8570—depended on careful hyperparameter optimization, ensuring the right balance of learning rate, optimizer (like AdamW), and regularization. This systematic approach proved that high utility is tied directly to precise model configuration.
This experience has equipped me with a highly valuable and marketable skill set for my IT career, especially in Machine Learning Engineering and Educational Technology (EdTech). The ability to build and deploy a highly accurate embedding model is the foundation for creating intelligent systems that can simplify studies by clustering related concepts, addressing the problem of information overload faced by students. Knowing how to use semantic similarity to measure the relevance and accuracy of students' responses and provide them with immediate, meaningful feedback is a core skill for building the next generation of formative assessment tools. This proves I can translate complex Natural Language Processing (NLP) techniques into practical, user-centric applications that solve real-world problems and contribute to goals like quality education.