This code cell invokes `apt-get`, the system package manager for the Colab runtime. It installs the packages needed by the Python `textract-py3` library to properly work and be utilized.

In [None]:
! apt-get install -y  build-essential  python3-dev  libxml2-dev  libxslt1-dev \
    antiword  poppler-utils  pstotext tesseract-ocr flac ffmpeg lame libmad0 \
    libsox-fmt-mp3 sox libjpeg-dev swig libasound2-dev libpulse-dev

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
build-essential is already the newest version (12.9ubuntu3).
libjpeg-dev is already the newest version (8c-2ubuntu10).
libjpeg-dev set to manually installed.
tesseract-ocr is already the newest version (4.1.1-2.1build1).
libxml2-dev is already the newest version (2.9.13+dfsg-1ubuntu0.10).
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
The following additional packages will be installed:
  fonts-droid-fallback fonts-noto-mono fonts-urw-base35 ghostscript
  javascript-common libgs9 libgs9-common libid3tag0 libidn12 libijs-0.35
  libjbig2dec0 libjs-sphinxdoc libjs-underscore libopencore-amrnb0
  libopencore-amrwb0 libpulse-mainloop-glib0 libsox-fmt-alsa libsox-fmt-base
  libsox3 libwavpack1 poppler-data python3.10-dev swig4.0
Suggested packages:
  fonts-noto fonts-freefont-otf | fonts-freefont-ttf fonts-texgyre
  ghostscript-x apache2 | lighttpd | httpd lame-doc libasound2-do

This uses `pip`, the Python package manager to install the `optuna` library, a tool for hyperparameter tuning.

In [None]:
!pip install optuna

Collecting optuna
  Downloading optuna-4.6.0-py3-none-any.whl.metadata (17 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.10.1-py3-none-any.whl.metadata (11 kB)
Downloading optuna-4.6.0-py3-none-any.whl (404 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m404.7/404.7 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.10.1-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, optuna
Successfully installed colorlog-6.10.1 optuna-4.6.0


This installs `textract-py3`, a fork of the original `textract` library that has been updated for modern Python versions. It enables the extraction of text from different files, which is used to get text from PDFs, DOCXs, and PPTs for the purposes of this project.

In [None]:
!pip install textract-py3

Collecting textract-py3
  Downloading textract_py3-2.1.1-py3-none-any.whl.metadata (2.0 kB)
Collecting SpeechRecognition>=3.8.1 (from textract-py3)
  Downloading speechrecognition-3.14.3-py3-none-any.whl.metadata (30 kB)
Collecting argcomplete>=1.10.0 (from textract-py3)
  Downloading argcomplete-3.6.3-py3-none-any.whl.metadata (16 kB)
Collecting docx2txt>=0.8 (from textract-py3)
  Downloading docx2txt-0.9-py3-none-any.whl.metadata (529 bytes)
Collecting extract-msg>=0.30.11 (from textract-py3)
  Downloading extract_msg-0.55.0-py3-none-any.whl.metadata (15 kB)
Collecting pdfminer.six>=20221105 (from textract-py3)
  Downloading pdfminer_six-20251107-py3-none-any.whl.metadata (4.2 kB)
Collecting python-pptx>=0.6.18 (from textract-py3)
  Downloading python_pptx-1.0.2-py3-none-any.whl.metadata (2.5 kB)
Collecting xlrd<2.0.0,>=1.2.0 (from textract-py3)
  Downloading xlrd-1.2.0-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting olefile==0.47 (from extract-msg>=0.30.11->textract-py3)
  Downloa

The imported modules create a powerful analytical environment for Natural Language Processing (NLP), specifically tailored for developing, fine-tuning, and evaluating Sentence Embedding models. The setup combines the core functionalities of the PyTorch deep learning framework with the specialized tools from Sentence Transformers to effectively train models that generate high-quality vector representations of text, capturing semantic meaning. Crucially, the inclusion of Optuna and its samplers is for efficient hyperparameter optimization, ensuring the model achieves peak performance through automated experimentation. Furthermore, the environment incorporates standard data manipulation tools like NumPy and Pandas, capabilities for text extraction from various documents via textract, and resources for loading common datasets using the Hugging Face datasets library. Finally, the use of statistical functions like Spearman's $\rho$ and Pearson's $r$ from scipy.stats provides the necessary mechanism for rigorously benchmarking the model's performance on tasks like semantic similarity. This comprehensive set of imports facilitates a complete research and development pipeline, moving from data preparation and model training all the way through to statistical validation.

In [None]:
import numpy as np
import optuna
import os
import pandas as pd
import random
import re
import textract
import time
import torch

from datasets import load_dataset
from itertools import product
from optuna.samplers import RandomSampler
from sentence_transformers import (
  SentenceTransformer,
  InputExample,
  losses,
  util,
  models
)
from torch.utils.data import DataLoader
from scipy.stats import spearmanr, pearsonr
from sklearn.model_selection import train_test_split

By setting the environment variable `WANDB_DISABLED` to `true`, it disables the Weights & Biases integration for logging of information during a model's training.

In [None]:
os.environ["WANDB_DISABLED"] = "true"

This checks if the runtime running this script is capable of running with Compute Unified Device Architecture (CUDA), which is found in GPU runtime. Meanwhile, CPU runtimes do not have CUDA support, resulting in slower execution times of machine learning methods.

In [None]:
if torch.cuda.is_available():
  device = torch.device("cuda")
else:
  device = torch.device("cpu")
print(f"Using device: {device.type}")

Using device: cuda


This is the definition of the `normalize_text` function which takes any object or list of strings `x` and returns a normalized string, which may be the list of strings joined with a single whitespace, or the object converted to a string.

In [None]:
def normalize_text(x):
  return " ".join(x) if isinstance(x, list) else str(x)

Using the Hugging Face `datasets` library, it loads the [Blaise-g/scitldr](https://huggingface.co/datasets/Blaise-g/scitldr) dataset, which is a fork of the original SciTLDR dataset, but is converted to be able to be used in modern versions of the `datasets` library.

In [None]:
print("Loading training data from 'Blaise-g/scitldr'")
dataset = load_dataset("Blaise-g/scitldr")
train_data = dataset.get("train")
eval_data = dataset.get("validation")


Loading training data from 'Blaise-g/scitldr'


dataset_infos.json: 0.00B [00:00, ?B/s]

data/train-00000-of-00001-8e82bed0a659c9(…):   0%|          | 0.00/29.6M [00:00<?, ?B/s]

data/validation-00000-of-00001-1ef5839c3(…):   0%|          | 0.00/9.11M [00:00<?, ?B/s]

data/test-00000-of-00001-42dbf1b5d0e46ec(…):   0%|          | 0.00/9.67M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1992 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/619 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/618 [00:00<?, ? examples/s]

The code initializes four empty lists: pos_A, pos_B, eval_pos_A, and eval_pos_B, which will store the processed source and target texts for both the training and evaluation datasets, respectively. A crucial constraint is applied via max_pos = 1000, which explicitly limits the size of the training dataset to a maximum of 1,000 pairs of source and summary texts, preventing overly long processing times during initial setup or experimentation. The first loop iterates through the train_data, extracts the source (the input document) and target (the desired summary), and applies a normalize_text function—which likely cleans and standardizes the text, perhaps by lowercasing or removing punctuation—before appending the resulting strings to pos_A and pos_B. The loop breaks immediately once the training lists reach the max_pos limit. The second loop performs the exact same extraction and normalization steps on the eval_data, populating eval_pos_A and eval_pos_B without any size restriction, preparing the data for later validation.

In [None]:
pos_A, pos_B = [], []
eval_pos_A, eval_pos_B = [], []
max_pos = 1000

for entry in train_data:
  source = entry.get("source")
  summary = entry.get("target")

  if source and summary:
    pos_A.append(normalize_text(source))
    pos_B.append(normalize_text(summary))

    if len(pos_A) >= max_pos:
      break

for entry in eval_data:
  source = entry.get("source")
  summary = entry.get("target")

  if source and summary:
    eval_pos_A.append(normalize_text(source))
    eval_pos_B.append(normalize_text(summary))


The code block generates negative sample pairs for a contrastive learning or similarity-based task, which is a crucial step when preparing data for a Sentence Embedding model. The initial lists neg_A, neg_B, eval_neg_A, and eval_neg_B are created to hold these constructed negative pairs for the training and evaluation sets, respectively. The core mechanism involves generating negative samples by randomly shuffling the indices of the positive data points (pos_A and pos_B from the previous step) and then pairing a source sentence from the original order with a summary from the shuffled order. Specifically, index lists train_idx and eval_idx are created, representing the indices of the previously loaded positive samples, and are immediately shuffled to randomize the order of one list relative to the other. The two primary loops then iterate through the shuffled indices, taking the i-th element of the source list and the j-th element of the shuffled summary list to form a negative pair, effectively ensuring that the resulting neg_A and neg_B pair is highly unlikely to be semantically related. A check is included (if i == j: continue) to skip the rare chance that the shuffled index aligns with the original index, which would accidentally create a positive pair. Both the training and evaluation loops stop when the number of generated negative pairs equals the number of existing positive pairs, ensuring a balanced dataset for training the model to distinguish between related and unrelated texts.

In [None]:
neg_A, neg_B = [], []
eval_neg_A, eval_neg_B = [], []

train_idx = list(range(len(pos_A)))
eval_idx = list(range(len(eval_pos_A)))

random.shuffle(train_idx)
random.shuffle(eval_idx)

for i, j in enumerate(train_idx):
  if i == j:
    continue

  neg_A.append(pos_A[i])
  neg_B.append(pos_B[j])

  if len(neg_A) >= len(pos_A):
    break

for i, j in enumerate(eval_idx):
  if i == j:
    continue

  eval_neg_A.append(pos_A[i])
  eval_neg_B.append(pos_B[j])

  if len(eval_neg_A) >= len(eval_pos_A):
    break

The concluding section of the data preparation pipeline handles the final construction and splitting of the full dataset into the required format for training and evaluation. Initially, the code creates the complete training sets: all_A combines all positive and negative source texts (pos_A and neg_A), and all_B combines all positive and negative target texts (pos_B and neg_B). A corresponding target label list, all_y, is generated, where positive pairs are assigned the label 1.0 and negative (mismatched) pairs are assigned 0.0, effectively transforming the data into a binary classification structure suitable for tasks like semantic similarity. This entire combined dataset is then fed into the train_test_split function, which is strategically used not to create a separate test set, but to perform a final 80/20 split of the training data itself, though only the 80% training partition (train_A, train_B, train_y) is kept for use in the next step. Importantly, stratify=all_y ensures that the split maintains the exact same proportion of positive and negative samples in the resulting training subset, guaranteeing balance. Finally, the evaluation dataset is assembled by combining the previously generated positive and negative evaluation components (eval_pos_A, eval_neg_A, etc.) into the final lists: eval_A, eval_B, and their corresponding binary labels, eval_y, which will be used to track model performance after each training epoch.

In [None]:
all_A = pos_A + neg_A
all_B = pos_B + neg_B
all_y = [1.0] * len(pos_A) + [0.0] * len(neg_A)

train_A, _, train_B, _, train_y, _ = train_test_split(
    all_A, all_B, all_y,
    test_size=0.2,
    random_state=42,
    stratify=all_y,
)

eval_A = eval_pos_A + eval_neg_A
eval_B = eval_pos_B + eval_neg_B
eval_y = [1.0] * len(eval_pos_A) + [0.0] * len(eval_neg_A)

This line of code uses a list comprehension to iterate simultaneously over the previously prepared lists: train_A (source texts), train_B (target/comparison texts), and train_y (the binary labels of 1.0 or 0.0). For every corresponding triple of (a, b, y), it creates an instance of the InputExample class. This class is the standard container for training data in the Sentence Transformers framework, where the texts argument is a list containing the two paired strings ([a, b]), and the label argument is the floating-point classification score (float(y)). The resulting list, train_examples, is now a collection of these structured objects, making the data ready for the model's DataLoader and subsequent training phases. The final print statement serves as a confirmation, outputting the exact number of training examples prepared, which is a critical step to ensure the data volume matches expectations before starting any computationally intensive training.

In [None]:
train_examples = [
    InputExample(texts=[a, b], label=float(y))
    for a, b, y in zip(train_A, train_B, train_y)
]
print(f"Training data ready: {len(train_examples)}")

Training data ready: 1598


This code performs the initial loading of an external evaluation dataset from a local file, making the data accessible for model testing.

It first uses the load_dataset function from the Hugging Face datasets library to read a CSV file named /content/test.csv, specifying the csv format and using the latin1 encoding to correctly handle a wider range of character sets often encountered in older or non-English data. The resulting dataset object is keyed under "eval", and the subsequent line extracts the actual data content, assigning it to the test_data variable for easy access. Finally, two print statements confirm the success of the loading operation by reporting the total number of samples loaded into test_data and listing the names of all the columns found in the CSV file. This process is a foundational step in any machine learning workflow, ensuring that the model has a designated, unseen dataset for an objective final performance assessment.

In [None]:
test_dataset = load_dataset("csv", data_files={"eval": "/content/test.csv"}, encoding="latin1")
test_data = test_dataset.get("eval")
print(f"Evaluation dataset loaded: {len(test_data)} samples")
print(f"Columns in file: {test_data.column_names}")


Generating eval split: 0 examples [00:00, ? examples/s]

Evaluation dataset loaded: 500 samples
Columns in file: ['sentence_A', 'sentence_B', 'label']


This block of code is straightforward, accessing the specific columns from the loaded test_data object and assigning them to descriptive variables: test_A receives the list of texts from the column named "sentence_A", test_B receives the texts from "sentence_B", and test_y receives the ground-truth labels from the "label" column. These columns are assumed to contain the paired sentences and their corresponding semantic similarity scores, essential for the final evaluation of the trained model. The concluding print statement confirms the successful readiness of the test data by zipping the three lists together and reporting the total number of test samples available for evaluation, which serves as the final confirmation that all required data components are correctly prepared for the modeling phase.

In [None]:
test_A = test_data["sentence_A"]
test_B = test_data["sentence_B"]
test_y = test_data["label"]

print(f"Test data ready: {len(list(zip(test_A, test_B, test_y)))}")

Test data ready: 500


The provided function defines a reusable blueprint for constructing a Sentence Embedding model, allowing for easy modification of the base model and pooling strategy.

The function build_model takes three optional parameters: base (specifying the pre-trained model to use, defaulting to all-MiniLM-L6-v2), pooling (defining how word embeddings are aggregated, defaulting to mean), and dropout (an optional float for regularization). The first step is to instantiate a models.Transformer layer using the specified base model from the Sentence Transformers library, which handles the initial word embedding generation from the input text. An optional block then attempts to apply dropout regularization by directly modifying the attention_probs_dropout_prob and hidden_dropout_prob in the transformer's configuration if a dropout value is provided. Next, the code determines the embedding dimension (dim) of the transformer output, which is then used to create a models.Pooling layer. This layer aggregates the sequence of word embeddings into a single fixed-size sentence vector, supporting either mean pooling or max pooling based on the pooling argument. Finally, a complete SentenceTransformer model is assembled and returned by chaining the transformer and pooling_layer modules together and setting the computation device, resulting in a model ready for training or inference.

In [None]:
def build_model(
    base="sentence-transformers/all-MiniLM-L6-v2",
    pooling="mean",
    dropout=None
):
  transformer = models.Transformer(base)

  if dropout is not None:
    try:
      transformer.auto_model.config.attention_probs_dropout_prob = dropout
      transformer.auto_model.config.hidden_dropout_prob = dropout
    except:
      pass

    dim = transformer.get_word_embedding_dimension()
    pooling_layer = models.Pooling(
        dim,
        pooling_mode_mean_tokens=(pooling=="mean"),
        pooling_mode_max_tokens=(pooling=="max"),
    )

    return SentenceTransformer(
        modules=[transformer, pooling_layer],
        device=device
    )

The function orchestrates the complete training, evaluation, and logging cycle for a Sentence Embedding model using the Sentence Transformers framework.The function train_eval accepts the model, preprocessed data, and several hyperparameters like batch size, epochs, learning rate, and optimizer class. It first prepares the training data by creating a DataLoader to manage batching and shuffling, and defines the training objective using losses.CosineSimilarityLoss, which is highly suitable for maximizing the similarity between related sentence pairs. After calculating the required number of warmup steps for the learning rate scheduler, the function initiates the training loop using the model's fit method, passing in the training objectives, optimizer configuration, and the specified number of epochs. Following training, the model's performance is immediately evaluated on the provided test data: both test_A and test_B sentences are converted into normalized embeddings, and the cosine similarity is calculated between the corresponding pairs using util.cos_sim . These similarity scores are then processed and compared against the true labels (y_true). Finally, the function computes the Spearman's rank correlation ($\rho$) and Pearson's correlation coefficient ($r$) between the predicted similarities and the true labels, providing a robust statistical measure of the model's quality, before logging the results and the total training duration.

In [None]:
def train_eval(
    model,
    train_examples,
    test_A, test_B, test_y,
    batch=16,
    epochs=2,
    lr=2e-5,
    opt=torch.optim.Adam,
    warmup=0.1,
    desc="Run",
):
  train_loader = DataLoader(train_examples, shuffle=True, batch_size=batch)
  loss_fn = losses.CosineSimilarityLoss(model)
  warmup_steps = int(len(train_loader) * epochs * warmup)

  start = time.time()

  model.fit(
      train_objectives=[(train_loader, loss_fn)],
      epochs=epochs,
      warmup_steps=warmup_steps,
      optimizer_class=opt,
      optimizer_params={ 'lr': lr },
      show_progress_bar=True,
  )

  duration = time.time() - start

  with torch.no_grad():
    emb_A = model.encode(test_A, convert_to_tensor=True, normalize_embeddings=True)
    emb_B = model.encode(test_B, convert_to_tensor=True, normalize_embeddings=True)
    cos = util.cos_sim(emb_A, emb_B).diag().cpu().numpy()

  y_true = np.array(test_y)
  y_pred = (cos+1)/2

  sp = spearmanr(y_true, y_pred).statistic
  pr = pearsonr(y_true, y_pred).statistic

  print(f"{desc} | Spearman={sp:.4f}, Pearson={pr:.4f}, Duration={duration/60:.1f}min")

  return sp, pr, duration

At the top level, an empty list named results is initialized, which serves as the central repository for all experimental outcomes generated by the training and evaluation process. The function log_result accepts several parameters, including identifier strings like member and exp (experiment name), a dictionary of params (the model configuration or hyperparameters used in the run), and the key performance metrics: Spearman ($\rho$) and Pearson ($r$) correlation scores, and the training duration ($d$). Inside the function, a new dictionary is constructed to hold the data for the current experiment; it uses the dictionary unpacking operator (**params) to seamlessly integrate all tested hyperparameters into the record. The function then calculates and stores the metrics, rounding the correlation scores to four decimal places and the duration to minutes, before appending this newly created dictionary as a single, complete record to the global results list. This systematic approach ensures that every training run is captured with its configuration and performance, making it easy to analyze and compare different hyperparameter combinations later.

In [None]:
results = []

def log_result(member, exp, params, sp, pr, d):
  results.append({
    **params,
    "member": member,
    "exp": exp,
    "spearman": round(sp, 4),
    "pearson": round(pr, 4),
    "duration_min": round(d/60, 2)
  })

This function, named `objective`, takes a trial object as its input, which Optuna uses to sample and track the different hyperparameter settings for each run . Inside the function, the trial object suggests values for four key hyperparameters: batch_size (categorical choice of 8, 16, or 32), learning_rate (a log-scaled float between 10^-5 and 10^-4), dropout (a float between 0.1 and 0.5), and warmup_ratio (a float between 0.05 and 0.3). Using the suggested dropout value, it first calls the build_model function to initialize a new model instance. It then passes the remaining sampled parameters—along with the model and the pre-prepared training and evaluation data—into the train_eval function to execute a full training run for two epochs, capturing the resulting Spearman correlation ($sp$), Pearson correlation ($pr$), and duration ($d$). Regardless of the result, the log_result function is called to record the experiment's configuration and performance metrics in the global results list. Finally, the function returns the Spearman correlation ($sp$) as the optimization score, since Optuna seeks to maximize this return value to find the best set of hyperparameters, with an exception block included to handle and log any failures, returning $0$ to penalize bad trials.

In [None]:
def objective(trial):
  batch_size = trial.suggest_categorical("batch_size", [8, 16, 32])
  learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-4, log=True)
  dropout = trial.suggest_float("dropout", 0.1, 0.5)
  warmup_ratio = trial.suggest_float("warmup_ratio", 0.05, 0.3)

  model = build_model(
      dropout=dropout
  )

  try:
    sp, pr, d = train_eval(
        model,
        train_examples,
        eval_A, eval_B, eval_y,
        batch=batch_size,
        epochs=2,
        lr=learning_rate,
        warmup=warmup_ratio,
        desc=f"Trial {trial.number}"
    )

    log_result(
        "Optuna", f"Trial_{trial.number}",
        {
            "batch_size": batch_size,
            "learning_rate": learning_rate,
            "dropout": dropout,
            "warmup_ratio": warmup_ratio
        },
        sp, pr, d,
    )

    return sp
  except Exception as e:
    print(f"Trial {trial.number} failed: {e}")
    return 0

The final segment of the optimization script initiates the hyperparameter tuning process using the Optuna framework.

First, a RandomSampler is instantiated with a fixed seed=42. This ensures that the selection of hyperparameter combinations for each trial is based on a reproducible random distribution, which is important for debugging and comparing results across different experiments. Next, an optuna.create_study object is created, which serves as the central container for the optimization experiment. The sampler is passed to this study, and the direction="maximize" argument explicitly tells Optuna that the goal is to find the parameters that yield the highest possible score, corresponding to the Spearman correlation metric returned by the objective function . Finally, the study.optimize method executes the optimization routine, passing the previously defined objective function as the target to be minimized or maximized. The process is set to run for n_trials=60, meaning 60 different unique combinations of the batch size, learning rate, dropout, and warmup ratio will be tested. The show_progress_bar=True argument ensures that the user receives visual feedback as the optimization progresses through each of the 60 trials.

In [None]:
sampler = RandomSampler(seed=42)
study = optuna.create_study(sampler=sampler, direction="maximize")
study.optimize(objective, n_trials=60, show_progress_bar=True)

[I 2025-11-17 07:54:57,065] A new study created in memory with name: no-name-5b16e87e-5b8b-48c6-87cd-f9d6df285fc1


  0%|          | 0/60 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 0 | Spearman=0.8562, Pearson=0.9136, Duration=2.6min
[I 2025-11-17 07:57:43,454] Trial 0 finished with value: 0.856161354708948 and parameters: {'batch_size': 16, 'learning_rate': 3.968793330444374e-05, 'dropout': 0.1624074561769746, 'warmup_ratio': 0.08899863008405066}. Best is trial 0 with value: 0.856161354708948.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 1 | Spearman=0.8573, Pearson=0.9162, Duration=2.2min
[I 2025-11-17 08:00:04,997] Trial 1 finished with value: 0.8573252951795175 and parameters: {'batch_size': 16, 'learning_rate': 5.105903209394759e-05, 'dropout': 0.10823379771832098, 'warmup_ratio': 0.2924774630404986}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 2 | Spearman=0.8562, Pearson=0.9130, Duration=2.3min
[I 2025-11-17 08:02:29,455] Trial 2 finished with value: 0.856161354708948 and parameters: {'batch_size': 8, 'learning_rate': 1.5254729458052598e-05, 'dropout': 0.2216968971838151, 'warmup_ratio': 0.18118910790805948}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 3 | Spearman=0.8552, Pearson=0.9071, Duration=2.2min
[I 2025-11-17 08:04:48,773] Trial 3 finished with value: 0.8552468300535004 and parameters: {'batch_size': 32, 'learning_rate': 1.378776461935377e-05, 'dropout': 0.21685785941408728, 'warmup_ratio': 0.14159046082342291}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 4 | Spearman=0.8568, Pearson=0.9144, Duration=2.2min
[I 2025-11-17 08:07:09,815] Trial 4 finished with value: 0.8568264841131915 and parameters: {'batch_size': 16, 'learning_rate': 3.267641765781762e-05, 'dropout': 0.33696582754481696, 'warmup_ratio': 0.061612603179999434}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 5 | Spearman=0.8516, Pearson=0.9015, Duration=2.3min
[I 2025-11-17 08:09:37,313] Trial 5 finished with value: 0.8515610185633636 and parameters: {'batch_size': 8, 'learning_rate': 8.889667907018936e-05, 'dropout': 0.4862528132298237, 'warmup_ratio': 0.2520993370291153}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 6 | Spearman=0.8560, Pearson=0.9116, Duration=2.2min
[I 2025-11-17 08:11:56,227] Trial 6 finished with value: 0.8559950774988666 and parameters: {'batch_size': 32, 'learning_rate': 2.7551959649510774e-05, 'dropout': 0.14881529393791154, 'warmup_ratio': 0.17379422752781754}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 7 | Spearman=0.8571, Pearson=0.9157, Duration=2.2min
[I 2025-11-17 08:14:17,309] Trial 7 finished with value: 0.8571035922327424 and parameters: {'batch_size': 16, 'learning_rate': 4.5975057847321686e-05, 'dropout': 0.2246844304357644, 'warmup_ratio': 0.1800170052944527}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 8 | Spearman=0.8569, Pearson=0.9149, Duration=2.1min
[I 2025-11-17 08:16:35,390] Trial 8 finished with value: 0.856937315022661 and parameters: {'batch_size': 32, 'learning_rate': 5.958443469672525e-05, 'dropout': 0.4757995766256756, 'warmup_ratio': 0.2737068376069122}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 9 | Spearman=0.8562, Pearson=0.9115, Duration=2.2min
[I 2025-11-17 08:18:55,233] Trial 9 finished with value: 0.8561890675772948 and parameters: {'batch_size': 16, 'learning_rate': 1.570300837880672e-05, 'dropout': 0.11809091556421523, 'warmup_ratio': 0.1313325826908161}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 10 | Spearman=0.8560, Pearson=0.9106, Duration=2.1min
[I 2025-11-17 08:21:13,188] Trial 10 finished with value: 0.8559673646305198 and parameters: {'batch_size': 32, 'learning_rate': 2.2738055735631803e-05, 'dropout': 0.2123738038749523, 'warmup_ratio': 0.18567402078956213}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 11 | Spearman=0.8564, Pearson=0.9131, Duration=2.2min
[I 2025-11-17 08:23:34,438] Trial 11 finished with value: 0.8563553447873763 and parameters: {'batch_size': 16, 'learning_rate': 9.70257339412074e-05, 'dropout': 0.40889790771866297, 'warmup_ratio': 0.0996789203835431}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 12 | Spearman=0.8569, Pearson=0.9153, Duration=2.2min
[I 2025-11-17 08:25:54,202] Trial 12 finished with value: 0.8569096021543141 and parameters: {'batch_size': 16, 'learning_rate': 5.3580550092318687e-05, 'dropout': 0.4085081386743783, 'warmup_ratio': 0.06851116293352259}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 13 | Spearman=0.8564, Pearson=0.9130, Duration=2.1min
[I 2025-11-17 08:28:12,371] Trial 13 finished with value: 0.8563553447873763 and parameters: {'batch_size': 32, 'learning_rate': 4.2004723167022006e-05, 'dropout': 0.23235920994105969, 'warmup_ratio': 0.06588958757150591}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 14 | Spearman=0.8564, Pearson=0.9135, Duration=2.2min
[I 2025-11-17 08:30:30,674] Trial 14 finished with value: 0.856438483392417 and parameters: {'batch_size': 32, 'learning_rate': 4.3406770118893994e-05, 'dropout': 0.4548850970305306, 'warmup_ratio': 0.16805373129048734}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 15 | Spearman=0.8562, Pearson=0.9129, Duration=2.2min
[I 2025-11-17 08:32:49,421] Trial 15 finished with value: 0.8562444933139888 and parameters: {'batch_size': 32, 'learning_rate': 3.641473866814996e-05, 'dropout': 0.40838687198182444, 'warmup_ratio': 0.17344889909109767}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 16 | Spearman=0.8560, Pearson=0.9123, Duration=2.3min
[I 2025-11-17 08:35:14,953] Trial 16 finished with value: 0.8560227903672135 and parameters: {'batch_size': 8, 'learning_rate': 1.2820100418916903e-05, 'dropout': 0.11257167427469371, 'warmup_ratio': 0.20910260281594512}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 17 | Spearman=0.8556, Pearson=0.9091, Duration=2.2min
[I 2025-11-17 08:37:33,268] Trial 17 finished with value: 0.855634810210357 and parameters: {'batch_size': 32, 'learning_rate': 1.775383703652225e-05, 'dropout': 0.2641531692142519, 'warmup_ratio': 0.23888778463576216}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 18 | Spearman=0.8554, Pearson=0.9077, Duration=2.2min
[I 2025-11-17 08:39:51,585] Trial 18 finished with value: 0.855385394395235 and parameters: {'batch_size': 32, 'learning_rate': 1.4495102383254677e-05, 'dropout': 0.47187906093702925, 'warmup_ratio': 0.25203009489110423}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 19 | Spearman=0.8562, Pearson=0.9115, Duration=2.2min
[I 2025-11-17 08:42:11,947] Trial 19 finished with value: 0.8562167804456419 and parameters: {'batch_size': 16, 'learning_rate': 1.5366326576125014e-05, 'dropout': 0.45702359939599113, 'warmup_ratio': 0.1848355604789127}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 20 | Spearman=0.8561, Pearson=0.9105, Duration=2.2min
[I 2025-11-17 08:44:31,695] Trial 20 finished with value: 0.8561059289722542 and parameters: {'batch_size': 16, 'learning_rate': 1.2884035848463489e-05, 'dropout': 0.1911740650167767, 'warmup_ratio': 0.1567769471565641}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 21 | Spearman=0.8569, Pearson=0.9146, Duration=2.2min
[I 2025-11-17 08:46:51,288] Trial 21 finished with value: 0.8569096021543141 and parameters: {'batch_size': 16, 'learning_rate': 3.2415095286272715e-05, 'dropout': 0.2669644012595116, 'warmup_ratio': 0.10552695261768257}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 22 | Spearman=0.8559, Pearson=0.9102, Duration=2.2min
[I 2025-11-17 08:49:10,051] Trial 22 finished with value: 0.8558842260254791 and parameters: {'batch_size': 32, 'learning_rate': 2.104761698332611e-05, 'dropout': 0.30751624869734645, 'warmup_ratio': 0.22575473972379445}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 23 | Spearman=0.8563, Pearson=0.9122, Duration=2.2min
[I 2025-11-17 08:51:32,573] Trial 23 finished with value: 0.8563276319190293 and parameters: {'batch_size': 16, 'learning_rate': 1.7855922645162265e-05, 'dropout': 0.2988994023569542, 'warmup_ratio': 0.12521957745419243}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 24 | Spearman=0.8560, Pearson=0.9121, Duration=2.1min
[I 2025-11-17 08:53:51,404] Trial 24 finished with value: 0.8559950774988666 and parameters: {'batch_size': 32, 'learning_rate': 3.181845026156783e-05, 'dropout': 0.12059150049999574, 'warmup_ratio': 0.11966161605915286}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 25 | Spearman=0.8564, Pearson=0.9138, Duration=2.3min
[I 2025-11-17 08:56:17,956] Trial 25 finished with value: 0.8563553447873763 and parameters: {'batch_size': 8, 'learning_rate': 3.0864039085286224e-05, 'dropout': 0.49426018164424035, 'warmup_ratio': 0.1105138178778751}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 26 | Spearman=0.8572, Pearson=0.9158, Duration=2.2min
[I 2025-11-17 08:58:39,402] Trial 26 finished with value: 0.8572144437061301 and parameters: {'batch_size': 16, 'learning_rate': 5.348307249011095e-05, 'dropout': 0.2471132530877013, 'warmup_ratio': 0.20807645764839489}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 27 | Spearman=0.8546, Pearson=0.9082, Duration=2.3min
[I 2025-11-17 09:01:06,586] Trial 27 finished with value: 0.854581721213175 and parameters: {'batch_size': 8, 'learning_rate': 6.843881726124972e-05, 'dropout': 0.22831202598869435, 'warmup_ratio': 0.09662962759996356}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 28 | Spearman=0.8545, Pearson=0.9046, Duration=2.2min
[I 2025-11-17 09:03:26,934] Trial 28 finished with value: 0.8545262954764813 and parameters: {'batch_size': 32, 'learning_rate': 1.0389336884578947e-05, 'dropout': 0.3048372233197124, 'warmup_ratio': 0.10662394379948449}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 29 | Spearman=0.8560, Pearson=0.9107, Duration=2.2min
[I 2025-11-17 09:05:46,964] Trial 29 finished with value: 0.8559950774988666 and parameters: {'batch_size': 32, 'learning_rate': 2.4363256990834954e-05, 'dropout': 0.4746919954946939, 'warmup_ratio': 0.08438023603649832}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 30 | Spearman=0.8570, Pearson=0.9151, Duration=2.1min
[I 2025-11-17 09:08:04,764] Trial 30 finished with value: 0.8569650278910079 and parameters: {'batch_size': 32, 'learning_rate': 7.539444583132291e-05, 'dropout': 0.20317665108606225, 'warmup_ratio': 0.21499601150854475}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 31 | Spearman=0.8563, Pearson=0.9137, Duration=2.3min
[I 2025-11-17 09:10:30,601] Trial 31 finished with value: 0.8562999190506825 and parameters: {'batch_size': 8, 'learning_rate': 1.7452284774749136e-05, 'dropout': 0.1372411071223597, 'warmup_ratio': 0.2743039394883317}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 32 | Spearman=0.8563, Pearson=0.9143, Duration=2.2min
[I 2025-11-17 09:12:53,991] Trial 32 finished with value: 0.8563276319190293 and parameters: {'batch_size': 8, 'learning_rate': 2.2346503230945165e-05, 'dropout': 0.3903822715480958, 'warmup_ratio': 0.27427756498814426}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 33 | Spearman=0.8560, Pearson=0.9122, Duration=2.3min
[I 2025-11-17 09:15:19,101] Trial 33 finished with value: 0.8559950774988666 and parameters: {'batch_size': 8, 'learning_rate': 1.2137799660358624e-05, 'dropout': 0.1646514856378455, 'warmup_ratio': 0.27463854713176983}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 34 | Spearman=0.8559, Pearson=0.9124, Duration=2.3min
[I 2025-11-17 09:17:44,790] Trial 34 finished with value: 0.855911938893826 and parameters: {'batch_size': 8, 'learning_rate': 4.6078864571905945e-05, 'dropout': 0.10202463353848748, 'warmup_ratio': 0.09020201285437467}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 35 | Spearman=0.8563, Pearson=0.9118, Duration=2.2min
[I 2025-11-17 09:20:04,307] Trial 35 finished with value: 0.8562722061823356 and parameters: {'batch_size': 16, 'learning_rate': 1.675981843521171e-05, 'dropout': 0.3848716885390143, 'warmup_ratio': 0.10931227187420002}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 36 | Spearman=0.8568, Pearson=0.9153, Duration=2.2min
[I 2025-11-17 09:22:23,356] Trial 36 finished with value: 0.8568264635492735 and parameters: {'batch_size': 16, 'learning_rate': 7.066809927935712e-05, 'dropout': 0.3630451569201374, 'warmup_ratio': 0.1920771508338679}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 37 | Spearman=0.8564, Pearson=0.9122, Duration=2.2min
[I 2025-11-17 09:24:44,594] Trial 37 finished with value: 0.8563830576557231 and parameters: {'batch_size': 16, 'learning_rate': 1.753838677550487e-05, 'dropout': 0.48920422190097823, 'warmup_ratio': 0.1482744311666901}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 38 | Spearman=0.8564, Pearson=0.9140, Duration=2.3min
[I 2025-11-17 09:27:11,548] Trial 38 finished with value: 0.8563553447873763 and parameters: {'batch_size': 8, 'learning_rate': 3.181537841237907e-05, 'dropout': 0.3307615538505436, 'warmup_ratio': 0.173129423454716}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 39 | Spearman=0.8556, Pearson=0.9090, Duration=2.2min
[I 2025-11-17 09:29:34,122] Trial 39 finished with value: 0.855634810210357 and parameters: {'batch_size': 16, 'learning_rate': 1.0575866655581705e-05, 'dropout': 0.35818891836286715, 'warmup_ratio': 0.09427766985176224}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 40 | Spearman=0.8566, Pearson=0.9140, Duration=2.2min
[I 2025-11-17 09:31:55,582] Trial 40 finished with value: 0.8566324940301075 and parameters: {'batch_size': 16, 'learning_rate': 2.345085601922796e-05, 'dropout': 0.10618264661154697, 'warmup_ratio': 0.28207964064693136}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 41 | Spearman=0.8567, Pearson=0.9152, Duration=2.2min
[I 2025-11-17 09:34:18,250] Trial 41 finished with value: 0.8566878992075391 and parameters: {'batch_size': 16, 'learning_rate': 7.128685505446137e-05, 'dropout': 0.2177795568278343, 'warmup_ratio': 0.1462744321504813}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 42 | Spearman=0.8563, Pearson=0.9138, Duration=2.3min
[I 2025-11-17 09:36:42,313] Trial 42 finished with value: 0.8563276319190293 and parameters: {'batch_size': 8, 'learning_rate': 3.6041367634054393e-05, 'dropout': 0.47446190966431245, 'warmup_ratio': 0.22400744916874327}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 43 | Spearman=0.8571, Pearson=0.9150, Duration=2.2min
[I 2025-11-17 09:39:00,671] Trial 43 finished with value: 0.8571313051010894 and parameters: {'batch_size': 32, 'learning_rate': 9.773584004575754e-05, 'dropout': 0.15603360609460962, 'warmup_ratio': 0.1795824130909342}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 44 | Spearman=0.8556, Pearson=0.9119, Duration=2.3min
[I 2025-11-17 09:41:27,339] Trial 44 finished with value: 0.855634810210357 and parameters: {'batch_size': 8, 'learning_rate': 5.0406214590774e-05, 'dropout': 0.24379646048790207, 'warmup_ratio': 0.12339796106612334}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 45 | Spearman=0.8569, Pearson=0.9150, Duration=2.2min
[I 2025-11-17 09:43:47,729] Trial 45 finished with value: 0.856937315022661 and parameters: {'batch_size': 32, 'learning_rate': 8.189182554057196e-05, 'dropout': 0.3045369595443751, 'warmup_ratio': 0.1753790736717999}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 46 | Spearman=0.8550, Pearson=0.9096, Duration=2.3min
[I 2025-11-17 09:46:13,878] Trial 46 finished with value: 0.8549974142383785 and parameters: {'batch_size': 8, 'learning_rate': 6.24874308896045e-05, 'dropout': 0.4560021367270265, 'warmup_ratio': 0.13449878921288394}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 47 | Spearman=0.8547, Pearson=0.9052, Duration=2.1min
[I 2025-11-17 09:48:31,689] Trial 47 finished with value: 0.8547479984232564 and parameters: {'batch_size': 32, 'learning_rate': 1.0862812260685506e-05, 'dropout': 0.2862392072529841, 'warmup_ratio': 0.18566115867689414}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 48 | Spearman=0.8558, Pearson=0.9094, Duration=2.2min
[I 2025-11-17 09:50:52,498] Trial 48 finished with value: 0.8558288002887853 and parameters: {'batch_size': 16, 'learning_rate': 1.0898034759198183e-05, 'dropout': 0.42904022426386335, 'warmup_ratio': 0.14004766035281574}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 49 | Spearman=0.8555, Pearson=0.9082, Duration=2.1min
[I 2025-11-17 09:53:10,571] Trial 49 finished with value: 0.8554685330002757 and parameters: {'batch_size': 32, 'learning_rate': 1.6436942181944063e-05, 'dropout': 0.34915619032760015, 'warmup_ratio': 0.071336866248442}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 50 | Spearman=0.8566, Pearson=0.9139, Duration=2.1min
[I 2025-11-17 09:55:29,083] Trial 50 finished with value: 0.8566047606024983 and parameters: {'batch_size': 32, 'learning_rate': 4.339402166711995e-05, 'dropout': 0.3904365334890646, 'warmup_ratio': 0.29396301986563367}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 51 | Spearman=0.8557, Pearson=0.9090, Duration=2.2min
[I 2025-11-17 09:57:46,744] Trial 51 finished with value: 0.855662523078704 and parameters: {'batch_size': 32, 'learning_rate': 1.8656589301678123e-05, 'dropout': 0.2755885682822544, 'warmup_ratio': 0.06961409533556649}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 52 | Spearman=0.8571, Pearson=0.9153, Duration=2.2min
[I 2025-11-17 10:00:10,242] Trial 52 finished with value: 0.8570758793643956 and parameters: {'batch_size': 16, 'learning_rate': 4.9656282838521796e-05, 'dropout': 0.263581177765708, 'warmup_ratio': 0.09332358001771145}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 53 | Spearman=0.8567, Pearson=0.9139, Duration=2.1min
[I 2025-11-17 10:02:28,850] Trial 53 finished with value: 0.8566878992075391 and parameters: {'batch_size': 32, 'learning_rate': 5.1831756025789e-05, 'dropout': 0.3640789506870925, 'warmup_ratio': 0.11998347423648571}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 54 | Spearman=0.8561, Pearson=0.9131, Duration=2.3min
[I 2025-11-17 10:04:54,525] Trial 54 finished with value: 0.8561336418406011 and parameters: {'batch_size': 8, 'learning_rate': 4.089975874386939e-05, 'dropout': 0.267840024971116, 'warmup_ratio': 0.11193274737528937}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 55 | Spearman=0.8559, Pearson=0.9103, Duration=2.2min
[I 2025-11-17 10:07:15,535] Trial 55 finished with value: 0.8559396517621728 and parameters: {'batch_size': 16, 'learning_rate': 1.3063893777852112e-05, 'dropout': 0.11840105680870111, 'warmup_ratio': 0.06018220057974254}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 56 | Spearman=0.8559, Pearson=0.9121, Duration=2.3min
[I 2025-11-17 10:09:41,133] Trial 56 finished with value: 0.8559396517621728 and parameters: {'batch_size': 8, 'learning_rate': 1.2526627427914564e-05, 'dropout': 0.29664635004673295, 'warmup_ratio': 0.16836794269514144}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 57 | Spearman=0.8570, Pearson=0.9150, Duration=2.2min
[I 2025-11-17 10:12:00,964] Trial 57 finished with value: 0.8569650278910079 and parameters: {'batch_size': 16, 'learning_rate': 4.129049582940027e-05, 'dropout': 0.3540374603470575, 'warmup_ratio': 0.06132600244301113}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 58 | Spearman=0.8566, Pearson=0.9149, Duration=2.2min
[I 2025-11-17 10:14:21,508] Trial 58 finished with value: 0.8566324734708453 and parameters: {'batch_size': 16, 'learning_rate': 7.186043489298727e-05, 'dropout': 0.363477452647578, 'warmup_ratio': 0.09073360677035743}. Best is trial 1 with value: 0.8573252951795175.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Trial 59 | Spearman=0.8570, Pearson=0.9154, Duration=2.2min
[I 2025-11-17 10:16:41,289] Trial 59 finished with value: 0.8570204536277017 and parameters: {'batch_size': 16, 'learning_rate': 3.852792157972197e-05, 'dropout': 0.4760920965699831, 'warmup_ratio': 0.19386854446896973}. Best is trial 1 with value: 0.8573252951795175.


After the study.optimize function has finished running all 60 trials, the first line retrieves the optimal set of hyperparameters—the combination of batch size, learning rate, dropout, and warmup ratio that yielded the highest performance—and assigns this dictionary to the best_params variable. The subsequent lines serve purely for reporting the final results to the user: the first print statement outputs the header "Best Hyperparameters:", and the second prints the dictionary contents of best_params in a readable format. Finally, the third print statement reports the maximum Spearman correlation score found across all 60 trials, which is stored in study.best_value. This step provides the essential actionable insight derived from the entire process, pinpointing the exact configuration to be used for the final, production-ready training of the Sentence Embedding model.

In [None]:
best_params = study.best_params

print("Best Hyperparameters:")
print(best_params)
print(f"Best Spearman Score: {study.best_value}.4f")

Best Hyperparameters:
{'batch_size': 16, 'learning_rate': 5.105903209394759e-05, 'dropout': 0.10823379771832098, 'warmup_ratio': 0.2924774630404986}
Best Spearman Score: 0.8573252951795175.4f


First, a new model instance, final_model, is created by calling the build_model function, specifically utilizing the optimal dropout value found in the best_params dictionary. This ensures the model is constructed with the best regularization setting discovered during the Optuna study. Next, the train_eval function is called to execute the definitive training run; it is supplied with all the optimized hyperparameters—batch_size, learning_rate, and warmup_ratio—retrieved directly from the best_params dictionary. Unlike the optimization trials which ran for 2 epochs, this final training run is executed for 3 epochs, giving the model more time to converge using the superior settings. The function returns the final performance metrics ($sp$, $pr$) and duration ($d$) on the evaluation set. Finally, the log_result function records the metrics of this crucial run under the identifiers "Final" and "Best_Model," ensuring that the definitive, high-performance result is saved alongside all the preceding experimental trials for comprehensive reporting and comparison.

In [None]:
final_model = build_model(dropout=best_params["dropout"])

sp, pr, d = train_eval(
    final_model,
    train_examples,
    eval_A, eval_B, eval_y,
    batch=best_params["batch_size"],
    epochs=3,
    lr=best_params["learning_rate"],
    warmup=best_params["warmup_ratio"],
    desc="Final Model"
)

log_result("Final", "Best_Model", best_params, sp, pr, d)

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


Final Model | Spearman=0.8569, Pearson=0.9164, Duration=3.1min


This saves the built final model into a directory with all of its configuration files and trained parameters, with the provided name of `MiniLM-L6-SciSim-SciTLDR`.

In [None]:
final_model.save_pretrained("MiniLM-L6-SciSim-SciTLDR")

This converts the `results` list into a Pandas `DataFrame` and saves it into a Comma-separated Values (CSV) file, allowing for easier viewing and analysis of the results of the hyperparameter tuning process.

In [None]:
pd.DataFrame(results).to_csv("results.csv")

The results list is also displayed, with its top and bottom 5 performing results.

In [None]:
print(pd.DataFrame(results).sort_values(by=["spearman", "pearson"], ascending=False).reset_index(drop=True))

    batch_size  learning_rate   dropout  warmup_ratio  member       exp  \
0           16       0.000051  0.108234      0.292477  Optuna   Trial_1   
1           16       0.000053  0.247113      0.208076  Optuna  Trial_26   
2           16       0.000046  0.224684      0.180017  Optuna   Trial_7   
3           16       0.000050  0.263581      0.093324  Optuna  Trial_52   
4           32       0.000098  0.156034      0.179582  Optuna  Trial_43   
..         ...            ...       ...           ...     ...       ...   
56           8       0.000062  0.456002      0.134499  Optuna  Trial_46   
57          32       0.000011  0.286239      0.185661  Optuna  Trial_47   
58           8       0.000068  0.228312      0.096630  Optuna  Trial_27   
59          32       0.000010  0.304837      0.106624  Optuna  Trial_28   
60           8       0.000089  0.486253      0.252099  Optuna   Trial_5   

    spearman  pearson  duration_min  
0     0.8573   0.9162          2.19  
1     0.8572   0.9158  

With the saved trained model, it can be easily loaded into a model instance by passing the name of the model into the constructor of the `SentenceTransformer` class.

In [None]:
new_model = SentenceTransformer("MiniLM-L6-SciSim-SciTLDR")

The code calls the previously defined train_eval function, but it is repurposed here solely for its evaluation capabilities. The function is passed the new_model (which is assumed to be an already trained or loaded model), the train_examples placeholder (which is ignored by the evaluation logic since no training steps are configured for this call), and the crucial test_A, test_B, and test_y lists containing the final, unseen test data. By not supplying optional training parameters like epochs or lr, the function defaults to its evaluation phase, where it immediately calculates the embeddings for test_A and test_B, determines the cosine similarity between the pairs, and computes the two core statistical metrics: Spearman's $\rho$ ($sp$) and Pearson's $r$ ($pr$). The two final print statements explicitly output these two correlation scores, providing a quantitative, objective measure of the new model's performance on the final, production-level test set for the semantic similarity task.

In [None]:
sp, pr, d = train_eval(
    new_model,
    train_examples, # This is a placeholder, as the model is already trained, it won't be used in eval mode
    test_A, test_B, test_y,
    desc="New Model Evaluation"
)
print(f"Spearman correlation for new_model: {sp:.4f}")
print(f"Pearson correlation for new_model: {pr:.4f}")

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


New Model Evaluation | Spearman=0.8554, Pearson=0.9139, Duration=2.4min
Spearman correlation for new_model: 0.8554
Pearson correlation for new_model: 0.9139


The function `semantic_similarity` takes the trained model (a SentenceTransformer object), text1, and text2 as inputs. The core of the operation happens within the torch.no_grad() context manager, which ensures that no computational graph is built, saving memory and speeding up inference since no training is being performed. It first calls model.encode to convert both input texts into fixed-size vector embeddings (e1 and e2), ensuring the resulting vectors are normalized to unit length, which is a prerequisite for accurate cosine similarity calculation. Next, util.cos_sim(e1, e2) computes the cosine similarity between the two embeddings, yielding a value that ranges from -1 (perfect dissimilarity) to +1 (perfect similarity) . Finally, the code takes the resulting cosine similarity score, adds 1, and divides by 2. This simple min-max normalization step maps the original range of [-1, 1] to the more intuitive and common similarity score range of [0, 1], where 0 means no similarity and 1 means identical meaning, before returning the final result as a standard Python float.

In [None]:
def semantic_similarity(model, text1, text2):
  with torch.no_grad():
    e1 = model.encode(text1, convert_to_tensor=True, normalize_embeddings=True)
    e2 = model.encode(text2, convert_to_tensor=True, normalize_embeddings=True)
    return float((util.cos_sim(e1, e2).item() + 1) / 2)

Using the `files` module of the `google.colab` library, documents in PDF, DOCX, or PPT can be uploaded to the runtime. These files will be stored in the `docs` directory.

In [None]:
from google.colab import files

docs_path = 'docs'

uploaded = list(files.upload(docs_path).keys())

Saving Introduction to Predictive Analytics.pdf to docs/Introduction to Predictive Analytics.pdf


This cell imports the necessary ipywidgets library for interactive components and the listdir, isfile, and join functions from the os module for file system navigation. It then uses a list comprehension to iterate through the directory specified by docs_path (which was set to 'docs') and creates two lists: docs (containing only the filenames) and full_path_docs (containing the full paths needed to access the files). The code attempts to set the default selected_doc to the first file's full path, or None if the list is empty. A simple function, on_dropdown_change, is defined to update the global selected_doc variable whenever the dropdown selection changes. The script then creates and configures the widgets.Dropdown component, populating its options by zipping the user-friendly filenames (docs) with their corresponding full paths (full_path_docs). Finally, the dropdown.observe method links the on_dropdown_change function to the dropdown's value property, ensuring that the selected file's path is tracked in the selected_doc variable, and the display(dropdown) function renders the interactive element for the user to select the file.

In [None]:
import ipywidgets as widgets

from os import listdir
from os.path import isfile, join

docs = [f for f in listdir(docs_path) if isfile(join(docs_path, f))]
full_path_docs = [join(docs_path, f) for f in docs]
selected_doc = full_path_docs[0] or None

def on_dropdown_change(change):
  global selected_doc
  selected_doc = change.new

print("Which file do you want to make a summary of?")
dropdown = widgets.Dropdown(
    options=list(zip(docs, full_path_docs)),
    description="File: ",
    value=selected_doc
)

dropdown.observe(on_dropdown_change, names='value')

display(dropdown)

Which file do you want to make a summary of?


Dropdown(description='File: ', options=(('Introduction to Business Analytics.pdf', 'docs/Introduction to Busin…

With the selected document from the previous cell, the raw text is extracted using the `textract` library's `process` method. This function returns a binary string, which needs to be decoded into a given encoding, which in this is the `UTF-8` encoding. Furthermore, the decoded string is stripped of 2 or more consecutive whitespace characters, allowing for a cleaner presentation and analysis.

In [None]:
raw_text = textract.process(selected_doc)
decoded = raw_text.decode("utf-8")
cleaned_text = re.sub(r'\s+', ' ', decoded)

print(cleaned_text)

Introduction to Predictive Analytics Predictive analytics is a branch of advanced analytics that uses historical data, statistical algorithms, and machine learning techniques to predict future outcomes or trends. Predictive analytics focuses on modeling the relationships between predictors and outcomes. In simpler terms, we try to understand how one set of data (predictors or input variables) can help us forecast or estimate another set of values (outcomes or targets). Process Overview: • Data Collection Gather data from various sources (databases, sensors, CRMs, etc.). • Data Preprocessing Clean, transform, and prepare data for analysis. • Model Building Choose and train a model using historical data. • Model Evaluation Assess the model using metrics like accuracy, precision, recall, etc. • Prediction & Deployment Use the model to predict future trends and integrate it into decision-making systems. Common Predictive Techniques: • Linear/Logistic Regression • Decision Trees & Random Fo

The user is asked for an input, which will act as the summary of the selected document.

In [None]:
print(f"Enter an essay talking about your understanding of the contents of file '{selected_doc}'")
summary = input("Summary: ")

Enter an essay talking about your understanding of the contents of file 'docs/Introduction to Predictive Analytics.pdf'
Summary: Predictive analytics uses historical data, statistical algorithms, and machine learning to forecast future outcomes or trends. It works by trying to figure out the relationship between predictors (input data) and outcomes (the targets we want to estimate). The whole process starts with collecting and cleaning the data, then you build and evaluate the model to make sure it's accurate. Key steps include using statistical analysis to find patterns, then training and testing the model before it can finally output useful predictions. Techniques used include things like Logistic Regression, Decision Trees, and Neural Networks.


First, a Python list named examples is initialized, containing a single tuple of (cleaned_text, summary). This indicates that the script is testing how closely the extracted text from a document (cleaned_text) relates to the user's given summary, serving as a quick, functional check on the model's quality. A for loop then iterates through the pairs in the examples list, although only one is present here. Inside the loop, the semantic_similarity function is called, passing the previously evaluated new_model and the two text strings. This function calculates the normalized cosine similarity between the embeddings of the two texts, yielding a score between 0 and 1. Finally, three print statements are executed for each pair, clearly displaying the Text 1 and Text 2 content, followed by the calculated Similarity Score rounded to four decimal places. This output is the direct, tangible result of the entire pipeline, confirming the model's ability to quantitatively measure the semantic relationship between different pieces of text.

In [None]:
examples = [
    (cleaned_text, summary),
]

for s1, s2 in examples:
  score = semantic_similarity(new_model, s1, s2)
  print(f"Text 1: {s1}")
  print(f"Text 2: {s2}")
  print(f"Similarity Score: {score:.4f}")

Text 1: Introduction to Predictive Analytics Predictive analytics is a branch of advanced analytics that uses historical data, statistical algorithms, and machine learning techniques to predict future outcomes or trends. Predictive analytics focuses on modeling the relationships between predictors and outcomes. In simpler terms, we try to understand how one set of data (predictors or input variables) can help us forecast or estimate another set of values (outcomes or targets). Process Overview: • Data Collection Gather data from various sources (databases, sensors, CRMs, etc.). • Data Preprocessing Clean, transform, and prepare data for analysis. • Model Building Choose and train a model using historical data. • Model Evaluation Assess the model using metrics like accuracy, precision, recall, etc. • Prediction & Deployment Use the model to predict future trends and integrate it into decision-making systems. Common Predictive Techniques: • Linear/Logistic Regression • Decision Trees & R