One hot encoding

Data is of two types:

Categorical
Numerical

Furthermore categorical data can be:

Nominal : There's no implicit order between the catogories. Eg, "bird", "dog", "cat"
Ordinal : There exists an implicit order between the categories. Eg, "First", "Second", "Third"

Often machine learning algorithms require input and output to be numeric and hence if we have categorical data we need to be able to convert it to numeric format.

This can be done in two ways:

Integer or label encoding
One hot encoding

When we have ordinal categorical data, we can simply assign a unique integer to each category. This is called integer or label encoding. This also advantageous because the ML algorithm can take advantage of the knowledge about the implicit order among these categories.

On the other hand, if we have nominal categorical data, integer encoding actually makes it worse since we imply an order via the integers where none exist. Instead we create "dummy variables" which are just binary variables for each category. For each such variable we assign a 0 or 1 depending on the absence or presence of that category respectively for that data point.

In essence we convert the categorical data to a binary numeric vector.