Architecture

The architecture of the chatbot describes how it is built and laid out internally -- or in other words, how the different parts of the chatbot integrate and cooperate in order to power the whole system.

Backpropagation

This is the process that occurs during training, where the model's weights are adjusted backwards. So, during training, the model processes some data, and the data is compared with the expected output. If the actual output is too far off from the expected output, backpropagation adjusts the model's weights so that the actual output matches the expected output. However, adjusting the weights may (or most likely will) inadvertently break the model's association with other inputs (like a two steps forward, one step back scenario). Training is all about changing these weights in such a way that the model outputs something close to the expected value most of the time (its training accuracy). Then, it can be tested with the testing dataset.

Backpropagation through Time

Backpropagation through Time, as opposed to other methods (such as through Structure), is just a variant of the backpropagation algorithm that works by "unfolding" a recurrent neural network over time. It's a technical detail on which specific backpropagation algorithm is used, the general concept altogether of adjusting model weights is backpropagation.

Vanishing Gradients and Long Short-Term Memory

The problem with classic RNNs is computational (or practical) in nature: when training a classic RNN using back-propagation, the long-term gradients which are back-propagated can "vanish", meaning they can tend to zero due to very small numbers creeping into the computations, causing the model to effectively stop learning.

It's a problem with the falloff of gradients used for training the neural network, and in the end, means that it gets exponentially more time-consuming to train the network, in some cases stopping the network from learning any further completely, as it would take an infinite amount of time for it to progress to the next step. Long Short-Term Memory (a type of Recurrent Neural Network) fixes this issue.

RNNs using LSTM units partially solve the vanishing gradient problem, because LSTM units allow gradients to also flow with little to no attenuation. However, LSTM networks can still suffer from the exploding gradient problem.