Data Integration
In the era of data science, data that provide useful information for answering a scientific
question are oftentimes available from multiple sources. An analysis based on a single
data source may yield biases in estimation or results that are not accurate enough.
Integrating data from multiple sources becomes essential in order to pull together
different pieces of information to provide a unified view, draw more accurate conclusions,
and make more insightful decisions. Challenges in this process arise because of data
heterogeneity across different sources. Examples include data stored at different
repositories with changing sets of variables, measurements and volumes, and data from
different published research papers that are summarized in varying forms. Faculty
in Biostatistics are engaged in the development of new methodologies to address data
integration problems.
Faculty: M. Elliott, Peisong Han, G. Li, J.Taylor, X. Zhou