To implement Bidirectional Encoder Representations from Transformers (BERT) models and compare their performance, the following libraries were used:
- **transformers** is a library by Hugging Face that builds a framework that allows analysts to create and analyze various machine learning models.
- **textwrap** is a built-in Python module that is primarily used to format text, particularly to wrap text to a specific width.
- **time** is a build-in Python module that provides the functionality of various time-related operations.

In [None]:
from transformers import pipeline, BertForQuestionAnswering, BertTokenizer
import textwrap
import time

Pre-trained models are acquired from Hugging Face, a platform that allows people to share machine learning models for various operations. The following models will be analyzed for this implementation:

- [`deepset/bert-base-cased-squad2`](https://huggingface.co/deepset/bert-base-cased-squad2) is the baseline model for the analysis in this implementation.
- [`deepset/bert-base-uncased-squad2`](https://huggingface.co/deepset/bert-base-uncased-squad2) is the uncased version of the baseline model. This will be used to compare the performance between cased and uncased BERT models.
- [`google-bert/bert-large-cased-whole-word-masking-finetuned-squad`](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad) is BERT model that is larger (more parameters) than the baseline model. This will be used to compare how much parameter amount can affect the performance of a model.
- [`salti/bert-base-multilingual-cased-finetuned-squad`](https://huggingface.co/salti/bert-base-multilingual-cased-finetuned-squad) is a BERT model that is trained in a multilingual corpus. This will be used to compare the performance of an English-only model to a multilingual model.

The loading of the pre-trained Question Answering and Tokenizer models are done using the `from_pretrained` methods of the `BertForQuestionAnswering` and `BertTokenizer` classes from the `transformers` library. The method accepts the identifier of a model from the Hugging Face hub.

In [None]:
model_name = 'deepset/bert-base-cased-squad2'
# model_name = 'deepset/bert-base-uncased-squad2'
# model_name = 'google-bert/bert-large-cased-whole-word-masking-finetuned-squad'
# model_name = 'salti/bert-base-multilingual-cased-finetuned-squad'

model = BertForQuestionAnswering.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


The question answering pipeline is created using the `transformers` library's `pipeline` function, which takes in the following arguments:
- **task** is the first positional argument and takes in the keyword of what task the pipeline will be doing. In this implementation, `question-answering` was provided which sets the base class of the pipeline to be `QuestionAnsweringPipeline`.
- **model** is the pre-trained model that will be used by the pipeline. In this implementation, the pre-trained `BertForQuestionAnswering` model will be used.
- **tokenizer** is the tokenizer that will be used by the pipeline. In this implementation, the pre-trained `BertTokenizer` model will be used.

It is worth noting that for this implementation, a `T4 GPU` runtime was used in the Google Colab platform.

In [None]:
qna_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)

Device set to use cuda:0


The implementation then asks for an article that will act as the context for the question answering pipeline. This is done by asking for user input through Python's built-in `input()` function.

The entered article by the user is cleaned up for presentation. The first step done is by dedenting the given text, using the `textwrap` library's `dedent` method. This method looks at every line in a multiline string and removes leading spaces. This allows the text to look uniformly flush to the left side of the paragraph.

To further clean the context, the `strip` method for Python strings are applied, which clears leading (start) and trailing (end) whitespaces in the whole string.

Lastly, to present the text in a clean manner, the whole text is wrapped using the `fill` method from the `textwrap` library. This function takes in the text to be wrapped into a paragraph,  as well as the preferred width of the paragraph. For any word that exceeds this predefined length, it is wrapped to the next line of the paragraph, improving the presentation of the context article.

In [None]:
context = input("enter Context Article: ")
dedented_text = textwrap.dedent(context).strip()

print("Context Article:\n")
print(textwrap.fill(dedented_text, width=120))

enter Context Article: MANILA – The government is arranging chartered flights for the repatriation of more than 200 overseas Filipino workers in Beirut, Lebanon, the Department of Migrant Workers (DMW) said Wednesday.  “We are trying to provide for chartered flights. We’re talking to airline companies so that the chartered flights would be able to accommodate for example, no less than 300 overseas Filipino workers from Beirut,” DMW Undersecretary Bernard Olalia said in a Palace press briefing.  This was after the scheduled flights of around 15 OFWs on Sept. 25 were cancelled because of the recent bombings in Beirut.  Olalia said around 111 OFWs are staying in four temporary shelters in Beirut and waiting for their repatriation.  An additional 110 OFWs are applying for exit permits from the Lebanese government, Olalia said.  “Apart from the documented OFWs, we have undocumented OFWs who need to secure travel documents and once they’re given travel documents, we will help them in securin

Once the question answering pipeline is created and the context established, the user is asked for input which will serve as the question that the model will need to answer.

To facilitate this, the built-in Python function `input` is used. Once the user enters an question, the implementation analyzes it. The asterisk character (`*`) is used as an indicator that the implementation should stop prompting the user for input.

Before the question is fed to the pipeline, the start time is first gathered using the Python `time` module's `time` method. This function returns a timestamp, which is the seconds that has passed since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time).

Once the start time is recorded, the question is fed to the pipeline by calling it, where a dictionary is entered as an argument. This dictionary has two keys, `question` which is the inquiry that the model will need to answer, and `context` which is the reference that the the model will use to answer said question.

When the pipeline finishes and returns with an answer, the end time is recorded using the same method as the one used to get the start time. This will be used to get the amount of time the model has taken to answer the question, in seconds.

The model's results is then displayed to the user, starting with the answer to the question (found in the returned dictionary's `answer` key), the start and end index where the answer is located in the context (found in the returned dictionary's `start` and `end` keys), and the probability score or the confidence of the model in its returned answer (found in the returned dictionary's `score` key). The elapsed time is also display to the user, truncated to four decimal units.

Once the results have been displayed, the user is once again prompted for a question and the whole process starts over again.

In [None]:
inquiry = input("\nType your question: ")
while (inquiry != '*'):
  start_time = time.time()
  answer = qna_pipeline({ "question": inquiry, "context": context })
  end_time = time.time()

  elapsed = end_time - start_time
  print("Answer found: " + answer['answer'])
  print("At Index: ", answer['start'], " - ", answer['end'])
  print("With Probability:", answer['score'], "\n")
  print(f"Time Elapsed: {elapsed:.4f} seconds")

  inquiry = input("Enter another question (* to stop): ")


Type your question: Who is arranging chartered flights for overseas Filipino workers in Beirut?
Answer found: the Overseas Workers Welfare Administration
At Index:  1548  -  1591
With Probability: 0.09304330497980118 

Time Elapsed: 0.2273 seconds
Enter another question (* to stop): How many overseas Filipino workers are expected to be accommodated in the chartered flights from Beirut?
Answer found: no less than 300
At Index:  352  -  368
With Probability: 0.8396175503730774 

Time Elapsed: 0.2336 seconds
Enter another question (* to stop): Why were the scheduled flights of some OFWs on September 25 canceled?
Answer found: the recent bombings in Beirut.
At Index:  570  -  601
With Probability: 0.40956392884254456 

Time Elapsed: 0.2990 seconds
Enter another question (* to stop): How many OFWs are currently staying in temporary shelters in Beirut?
Answer found: 111
At Index:  621  -  624
With Probability: 0.5157351493835449 

Time Elapsed: 0.2288 seconds
Enter another question (* to st