Login Utente

Ricorda Password


Optimal coding of categorical data in Machine Learning

A very interesting seminar will take place on-line on next Friday Nov 5 at 12 pm. It has been organized by Prof. Roberto Rocci and held by Prof. Agostino Di Ciaccio, Department of Statistical Science at Sapienza University of Rome and both teaching modules within our Master Program.

Here is the link to access:

Passcode: 432940 

Here is a short introduction by Prof. Di Ciaccio:

"If we have to analyze large data sets, with hundreds of features, we will generally have many quantitative variables and many qualitative variables, which could have many modalities. In classical statistics these data are very difficult to analyze, but even in machine learning an optimal approach has not been proposed. 
The purpose of this presentation is to suggest a method to analyze categorical variables with many categories, in machine learning methods. 
Several approaches have been proposed in the literature, in this presentation we will focus on the problem of coding categorical data in order to apply neural networks. 
The traditional methods that are used to encode categorical variables can be divided into three categories: methods that do not use the target variable or other variables; 
methods that use only the target variable; One Hot Encoding based methods that use a dummy variable for each category. 
These methods have numerous drawbacks. Starting from a definition of optimal quantification, we will see that through a low-dimensional multiple quantification we can obtain a very effective coding that allows us to build more efficient Neural Networks with a low number of parameters. Some examples will show the usefulness of this method."

Do not miss it!