# Importing of Libraries

The following code cell imports all the necessary libraries for the project. These libraries include:

- `torch`: A deep learning framework.
- `time`: For measuring execution time.
- `random`: For generating random numbers, used here for shuffling data.
- `numpy` as `np`: For numerical operations, especially array manipulation.
- `pandas` as `pd`: For data manipulation and analysis, particularly with DataFrames.
- `load_dataset` from `datasets`: To load datasets from the Hugging Face datasets library.
- `SentenceTransformer`, `InputExample`, `losses`, `util`, `models` from `sentence_transformers`: For building and using Sentence-BERT models.
- `DataLoader` from `torch.utils.data`: To efficiently load and batch data during training.
- `spearmanr` and `pearsonr` from `scipy.stats`: For calculating correlation coefficients to evaluate model performance.
- `train_test_split` from `sklearn.model_selection`: To split data into training and testing sets.

In [1]:

import torch, time, random, numpy as np, pandas as pd
from datasets import load_dataset
from sentence_transformers import SentenceTransformer, InputExample, losses, util, models
from torch.utils.data import DataLoader
from scipy.stats import spearmanr, pearsonr
from sklearn.model_selection import train_test_split


# Setting up Training Environment

Setting up the device for training and inference is handled in this cell. It checks if a CUDA-enabled GPU is available and, if so, sets the device to GPU. Otherwise, it defaults to using the CPU. Using a GPU significantly speeds up the training process for deep learning models.

In [2]:

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("GPU not available, using CPU.")


Using GPU: Tesla T4


# Dataset Loading


## Loading Training Data

The following code handles the data loading and preprocessing. It loads the "Blaise-g/scitldr" dataset, extracts source and summary text pairs, and creates positive and negative examples. Positive examples are original source-summary pairs, while negative examples are created by pairing sources with randomly shuffled summaries. The data is then split into training and evaluation sets, and converted into `InputExample` objects, which is the required format for the `SentenceTransformer` library.

In [3]:
def normalize_text(x):
    return " ".join(x) if isinstance(x, list) else str(x)

print("Loading training data from Blaise-g/scitldr ...")
dataset = load_dataset("Blaise-g/scitldr")
ds = dataset.get("train", dataset[list(dataset.keys())[0]])

pos_A, pos_B = [], []
max_pos = 1000
for ex in ds:
    src = ex.get("source") or ex.get("document") or ex.get("text")
    summ = ex.get("tldr") or ex.get("summary") or ex.get("target") or ex.get("highlights")
    if src and summ:
        src, summ = normalize_text(src), normalize_text(summ)
        pos_A.append(src); pos_B.append(summ)
        if len(pos_A) >= max_pos: break

# Build negatives
neg_A, neg_B = [], []
idx = list(range(len(pos_B)))
import random
random.shuffle(idx)
for i, j in enumerate(idx):
    if i == j: continue
    neg_A.append(pos_A[i]); neg_B.append(pos_B[j])
    if len(neg_A) >= len(pos_A): break

# Combine and label
all_A = pos_A + neg_A
all_B = pos_B + neg_B
all_y = [1.0]*len(pos_A) + [0.0]*len(neg_A)

# Split for training only (we’ll use manual dataset for eval)
train_A, _, train_B, _, y_train, _ = train_test_split(
    all_A, all_B, all_y, test_size=0.2, random_state=42, stratify=all_y
)

# Convert to InputExamples
train_examples = [InputExample(texts=[a, b], label=float(y)) for a, b, y in zip(train_A, train_B, y_train)]
print(f"✅ Training examples ready: {len(train_examples)}")

Loading training data from Blaise-g/scitldr ...


dataset_infos.json: 0.00B [00:00, ?B/s]

data/train-00000-of-00001-8e82bed0a659c9(…):   0%|          | 0.00/29.6M [00:00<?, ?B/s]

data/validation-00000-of-00001-1ef5839c3(…):   0%|          | 0.00/9.11M [00:00<?, ?B/s]

data/test-00000-of-00001-42dbf1b5d0e46ec(…):   0%|          | 0.00/9.67M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1992 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/619 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/618 [00:00<?, ? examples/s]

✅ Training examples ready: 1598


## Loading Evaluation Data

The next cell mounts your Google Drive to access the evaluation dataset stored there. It then loads an Excel file containing evaluation data into a pandas DataFrame. The "Sentence A", "Sentence B", and "Label" columns are extracted and converted into a list of `InputExample` objects for evaluation. Finally, it prints a summary of the training and evaluation data sizes. **Note:** You may need to update the `eval_data_path` variable to the correct location of your evaluation file in Google Drive.

In [5]:
from google.colab import drive
drive.mount('/content/drive')

# ⚠️ Update the path below if needed
eval_data_path = "/content/drive/MyDrive/ITC508_data/TheEnd/Blaise_SciTLDR_Eval_500.xlsx"

eval_df = pd.read_excel(eval_data_path)
print(f"✅ Evaluation dataset loaded successfully — {len(eval_df)} samples")
display(eval_df.head())

eval_A = eval_df["Sentence A"].tolist()
eval_B = eval_df["Sentence B"].tolist()
eval_scores = eval_df["Label"].tolist()

# Convert to SBERT InputExamples
test_examples = [InputExample(texts=[a, b], label=float(y)) for a, b, y in zip(eval_A, eval_B, eval_scores)]
print(f"✅ Evaluation examples ready: {len(test_examples)}")

print("\\nSummary:")
print(f"Training data: {len(train_examples)} from Blaise-g/scitldr")
print(f"Evaluation data: {len(test_examples)} from manual IT-Education dataset")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
✅ Evaluation dataset loaded successfully — 500 samples


Unnamed: 0,Sentence A,Sentence B,Label
0,Fine-grained Entity Recognition (FgER) is the ...,We present an agent that uses a beta-vae to ex...,0
1,Compression is a key step to deploy large neur...,This paper proves the universal approximabili...,1
2,Auto-encoding and generative models have made...,Packing region of Interest (ROI) such as cance...,0
3,Sequence-to-sequence (Seq2Seq) models with att...,We introduce a novel method to train Seq2Seq m...,1
4,Super Resolution (SR) is a fundamental and imp...,We introduce a Gaussian Process Prior over wei...,0


✅ Evaluation examples ready: 500
\nSummary:
Training data: 1598 from Blaise-g/scitldr
Evaluation data: 500 from manual IT-Education dataset


# Preparing Base Model

A function `build_sbert_model` to create a Sentence-BERT model is defined in the code below. It takes the base model name, pooling strategy (mean or max), and an optional dropout rate as input. It constructs the model by combining a transformer model with a pooling layer and sets the device (GPU or CPU) for the model.

In [6]:

def build_sbert_model(base="sentence-transformers/all-MiniLM-L6-v2", pooling="mean", dropout=None):
    transformer = models.Transformer(base)
    if dropout is not None:
        try:
            transformer.auto_model.config.attention_probs_dropout_prob = dropout
            transformer.auto_model.config.hidden_dropout_prob = dropout
            print(f"Applied dropout={dropout}")
        except: pass
    dim = transformer.get_word_embedding_dimension()
    pooling_layer = models.Pooling(
        dim,
        pooling_mode_cls_token=False,
        pooling_mode_mean_tokens=(pooling=="mean"),
        pooling_mode_max_tokens=(pooling=="max")
    )
    return SentenceTransformer(modules=[transformer, pooling_layer], device=device)

print("Base model ready: sentence-transformers/all-MiniLM-L6-v2")


Base model ready: sentence-transformers/all-MiniLM-L6-v2


# Preparing Fine-Tuning Helper Functions

The `train_eval` function, which handles the training and evaluation process, is defined in this cell. It takes the model, training examples, evaluation data, and various training hyperparameters (batch size, epochs, learning rate, optimizer, warmup steps) as input. It sets up a DataLoader, defines the loss function (CosineSimilarityLoss), and trains the model using the `model.fit` method. After training, it evaluates the model on the evaluation data by encoding the sentence pairs, calculating cosine similarity, and computing Spearman and Pearson correlation coefficients. It also measures the training and evaluation time. The `log_result` function is defined to store the results of each experiment in a list.

In [7]:

def train_eval(model, train_examples, test_A, test_B, y_test, batch=16, epochs=2, lr=2e-5, opt=torch.optim.Adam, warmup=0.1, desc="Run"):
    start = time.time()
    train_loader = DataLoader(train_examples, shuffle=True, batch_size=batch)
    loss_fn = losses.CosineSimilarityLoss(model)
    warmup_steps = int(len(train_loader)*epochs*warmup)
    model.fit(train_objectives=[(train_loader, loss_fn)], epochs=epochs, warmup_steps=warmup_steps,
              optimizer_class=opt, optimizer_params={'lr': lr}, show_progress_bar=True)
    with torch.no_grad():
        emb_A = model.encode(test_A, convert_to_tensor=True, normalize_embeddings=True)
        emb_B = model.encode(test_B, convert_to_tensor=True, normalize_embeddings=True)
        cos = util.cos_sim(emb_A, emb_B).diag().cpu().numpy()
    y_true, y_pred = np.array(y_test), (cos+1)/2
    sp, pr = spearmanr(y_true, y_pred).correlation, pearsonr(y_true, y_pred)[0]
    print(f"{desc} | Spearman={sp:.4f}, Pearson={pr:.4f}, Time={(time.time()-start)/60:.1f}min")
    return sp, pr, (time.time()-start)

results = []
def log_result(exp, params, sp, pr, t):
    results.append({**params,"exp":exp,"spearman":round(sp,4),"pearson":round(pr,4),"time_min":round(t/60,2)})


# Hyperparameter Experimentation

## 1. Low Learning Rate

This section is dedicated to exploring different hyperparameter settings for training the Sentence-BERT model. Each subsequent code cell in this section will train and evaluate the model with a specific combination of hyperparameters, such as learning rate, optimizer, and warmup steps, to see their impact on the model's performance (measured by Spearman and Pearson correlation).

The first hyperparameter experiment trains the Sentence-BERT model with a low learning rate of 1e-5 using the Adam optimizer and no warmup. The results (Spearman and Pearson correlations, and training time) are then logged.

In [8]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=1e-5,desc="Low LR 1e-5")
log_result("Low LR 1e-5",{"lr":1e-5,"batch":16,"epochs":2,"optimizer":"Adam"},sp,pr,t)

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mcharlesaaron07[0m ([33mcharlesaaron07-jose-rizal-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss


Low LR 1e-5 | Spearman=0.8552, Pearson=0.9082, Time=4.9min


## 2. Mid Learning Rate

This experiment trains the Sentence-BERT model with a mid-range learning rate of 2e-5 using the Adam optimizer and no warmup. The results are logged for comparison with other learning rates.

In [9]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=2e-5,desc="Mid LR 2e-5")
log_result("Mid LR 2e-5",{"lr":2e-5,"batch":16,"epochs":2,"optimizer":"Adam"},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Mid LR 2e-5 | Spearman=0.8562, Pearson=0.9114, Time=2.5min


## 3. AdamW Optimizer with Warmup of 0.2


Here, the Sentence-BERT model is trained using the AdamW optimizer with a learning rate of 2e-5 and a warmup proportion of 0.2. The results are logged to evaluate the effect of AdamW and warmup on performance.

In [10]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=2e-5,opt=torch.optim.AdamW,warmup=0.2,desc="AdamW + warmup 0.2")
log_result("AdamW + warmup 0.2",{"lr":2e-5,"batch":16,"epochs":2,"optimizer":"AdamW"},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


AdamW + warmup 0.2 | Spearman=0.8570, Pearson=0.9131, Time=2.5min


## 4. High Learning Rate

Training with a high learning rate of 3e-5 using the Adam optimizer and no warmup is performed in this cell. The results are logged to see if a higher learning rate improves performance.

In [11]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=3e-5,desc="High LR 3e-5")
log_result("High LR 3e-5",{"lr":3e-5,"batch":16,"epochs":2,"optimizer":"Adam","warmup":0.0},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


High LR 3e-5 | Spearman=0.8563, Pearson=0.9124, Time=2.5min


## 5. Very Low Learning Rate

This cell trains the Sentence-BERT model with a very low learning rate of 5e-6 using the Adam optimizer and no warmup. The results are logged to assess the impact of a very small learning rate.

In [12]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=5e-6,desc="Very Low LR 5e-6")
log_result("Very Low LR 5e-6",{"lr":5e-6,"batch":16,"epochs":2,"optimizer":"Adam","warmup":0.0},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Very Low LR 5e-6 | Spearman=0.8535, Pearson=0.9027, Time=2.5min


## 6. Stochastic Gradient Descent Optimizer

Here, the Sentence-BERT model is trained using the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 1e-4 and no warmup. The results are logged to compare the performance of SGD against Adam-based optimizers.

In [13]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=1e-4,opt=torch.optim.SGD,desc="SGD + LR 1e-4")
log_result("SGD + LR 1e-4",{"lr":1e-4,"batch":16,"epochs":2,"optimizer":"SGD","warmup":0.0},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


SGD + LR 1e-4 | Spearman=0.8456, Pearson=0.8779, Time=2.5min


## 7. AdamW only (No Warmup)

Training with the AdamW optimizer and no warmup is performed in this cell, using a learning rate of 2e-5. The results are logged to compare the performance of AdamW with and without warmup.

In [14]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=2e-5,opt=torch.optim.AdamW,warmup=0.0,desc="AdamW No Warmup")
log_result("AdamW No Warmup",{"lr":2e-5,"batch":16,"epochs":2,"optimizer":"AdamW","warmup":0.0},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


AdamW No Warmup | Spearman=0.8567, Pearson=0.9127, Time=2.5min


## 8. Adam with Warmup of 0.1

This cell trains the Sentence-BERT model using the Adam optimizer with a learning rate of 2e-5 and a warmup proportion of 0.1. The results are logged to evaluate the effect of Adam with a small warmup.

In [15]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=2e-5,opt=torch.optim.Adam,warmup=0.1,desc="Adam + Warmup 0.1")
log_result("Adam + Warmup 0.1",{"lr":2e-5,"batch":16,"epochs":2,"optimizer":"Adam","warmup":0.1},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Adam + Warmup 0.1 | Spearman=0.8562, Pearson=0.9114, Time=2.5min


## 9. AdamW with Warmup of 0.4

Here, the Sentence-BERT model is trained using the AdamW optimizer with a learning rate of 2e-5 and a higher warmup proportion of 0.4. The results are logged to see the impact of a larger warmup phase.

In [16]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=2e-5,opt=torch.optim.AdamW,warmup=0.4,desc="AdamW + Higher Warmup 0.4")
log_result("AdamW + Higher Warmup 0.4",{"lr":2e-5,"batch":16,"epochs":2,"optimizer":"AdamW","warmup":0.4},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


AdamW + Higher Warmup 0.4 | Spearman=0.8569, Pearson=0.9134, Time=2.5min


## 10. AdaFactor Optimizer

Training with the AdaFactor optimizer is performed in this cell, using a learning rate of 1e-3. The results are logged to evaluate the performance of the AdaFactor optimizer.

In [17]:
model = build_sbert_model()
sp,pr,t = train_eval(model,train_examples,eval_A,eval_B,eval_scores,lr=1e-3,opt=torch.optim.Adafactor,desc="AdaFactor Optimizer")
log_result("AdaFactor Optimizer",{"lr":1e-3,"batch":16,"epochs":2,"optimizer":"AdaFactor","warmup":"N/A"},sp,pr,t)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


AdaFactor Optimizer | Spearman=0.8536, Pearson=0.9105, Time=2.6min


# Displaying Experimentation Results

This cell compiles all the logged results from the hyperparameter experiments into a pandas DataFrame and displays it. This provides a clear overview of how different hyperparameters affected the model's performance (Spearman and Pearson correlations) and training time, allowing for easy comparison and identification of the best-performing settings.

In [22]:

df = pd.DataFrame(results)
print("\n--- Combined Results ---")
display(df)



--- Combined Results ---


Unnamed: 0,lr,batch,epochs,optimizer,exp,spearman,pearson,time_min,warmup
0,1e-05,16,2,Adam,Low LR 1e-5,0.8552,0.9082,4.9,
1,2e-05,16,2,Adam,Mid LR 2e-5,0.8562,0.9114,2.52,
2,2e-05,16,2,AdamW,AdamW + warmup 0.2,0.857,0.9131,2.5,
3,3e-05,16,2,Adam,High LR 3e-5,0.8563,0.9124,2.55,0.0
4,5e-06,16,2,Adam,Very Low LR 5e-6,0.8535,0.9027,2.47,0.0
5,0.0001,16,2,SGD,SGD + LR 1e-4,0.8456,0.8779,2.51,0.0
6,2e-05,16,2,AdamW,AdamW No Warmup,0.8567,0.9127,2.5,0.0
7,2e-05,16,2,Adam,Adam + Warmup 0.1,0.8562,0.9114,2.48,0.1
8,2e-05,16,2,AdamW,AdamW + Higher Warmup 0.4,0.8569,0.9134,2.53,0.4
9,0.001,16,2,AdaFactor,AdaFactor Optimizer,0.8536,0.9105,2.57,


The `semantic_similarity` function, which takes a trained Sentence-BERT model and two text strings as input, is defined in this cell. It encodes both text strings into embeddings using the model, calculates the cosine similarity between the embeddings, and returns a similarity score between 0 and 1. It then initializes a Sentence-BERT model (presumably the best performing one based on the previous experiments) and demonstrates how to use the `semantic_similarity` function with a few example text pairs, printing the resulting similarity scores.

# Inference Process

In [19]:

def semantic_similarity(model: SentenceTransformer, text1: str, text2: str) -> float:
    with torch.no_grad():
        e1 = model.encode(text1, convert_to_tensor=True, normalize_embeddings=True)
        e2 = model.encode(text2, convert_to_tensor=True, normalize_embeddings=True)
        return float((util.cos_sim(e1,e2).item()+1)/2)

best_model = build_sbert_model(pooling="mean")
examples = [
    ("Artificial intelligence helps in education.", "AI can improve learning outcomes."),
    ("The weather today is cold.", "It is hot and sunny outside."),
    ("Deep learning enables better natural language understanding.", "Neural networks can understand text.")
]
print("\n--- Running Semantic Similarity Inference ---")
for s1,s2 in examples:
    score = semantic_similarity(best_model, s1, s2)
    print(f"\nText 1: {s1}\nText 2: {s2}\nSimilarity Score: {score:.4f}")



--- Running Semantic Similarity Inference ---

Text 1: Artificial intelligence helps in education.
Text 2: AI can improve learning outcomes.
Similarity Score: 0.8671

Text 1: The weather today is cold.
Text 2: It is hot and sunny outside.
Similarity Score: 0.8143

Text 1: Deep learning enables better natural language understanding.
Text 2: Neural networks can understand text.
Similarity Score: 0.7958
