The dataset is, simply put, the set of data that a model works with. It's commonly split into two datasets: training and testing. If you know statistical models (e.g. kNN), it's essentially the same thing.
Essentially, first you need to go out and get a bunch of data from somewhere. In the case study specifically, they talk about real data and synthetic data.
An important principle to understand with anything related to AI is that it's output is entirely dependent on the quality of its input. In other words: garbage in, garbage out. This means that the training dataset has to be diverse and unbiased, in order to provide a wide range of information for the chatbot to pull its answers from.
Usually, when collecting data, it's split between "training" and "testing" datasets (say, 80% training, 20% testing). The reason for this is to verify how the model responds to example data that specifically wasn't in the training dataset (the "testing" dataset is not used during training). This is to ensure that the model doesn't simply memorise and verbatim repeat many parts of its training dataset, which can be a problem if the memory/capacity/something/whatever of the model is large enough to fit the entire dataset?
The testing part of a dataset is a small part to verify that the model works correctly, and to determine its accuracy on never-before-seen data. This data isn't used for training, and the model doesn't have access to it at all until the training is complete, and the whole chatbot is verified top-to-bottom using the testing data.