Unsupervised Learning on the Health and Retirement Study using Geometric Data Analysis

Presented at the 2019 IEEE International Conference on Machine Learning and Applications

(with: Roberto Williams Batista)

The main focus of this work is to show the ability of multiple correspondence analysis (MCA) in discovering response patterns in survey data where the majority of measurements result in categorical variables. A lower-dimensional representation of both individuals and measured variables is used to detect and represent underlying structures in the US Health and Retirement Study, a longitudinal survey of a representative sample of Americans over age 50 that captures information on how changing health interacts with social, economic, and psychological factors and retirement decisions.

The use of unsupervised techniques presented in this work represents an opportunity to extract valuable insights from longitudinal datasets like the one made available by the US Health and Retirement Study. MCA allows for new interpretations and discovery of patterns that take advantage of the qualitative nature of the data collected from survey respondents. The hierarchical clustering technique applied to the lower-dimensional representation of participants, provided by the MCA method, suggested a reasonable separation of the respondent profile as characterized by a personality scale. Results provided by this approach may be used to explore other areas that have yet to be captured using the items in the questionnaires, helping in the design of the survey and sampling procedure, and allowing for correlation studies with other physical and mental health indicators.

This work was presented at the 18th IEEE International Conference on Machine Learning and Applications (ICMLA).

Reinaldo (Rei) Sanchez-Arias
Assistant Professor of Data Science