Biostatistics and Data Science Research Center

Personnel

Section Director: Jian Li, Ph.D.
Research Scientist: Jigang Zhang, Ph.D.
Postdoc fellow: Jun Chen, Ph.D.
Postdoc fellow: Fuping Zhao, Ph.D.
Research Analyst: Chao Xu
Senior Database Administrator: Qing Tian, M.S.

Research Focus 

Biostatistics and bioinformatics are the application of statistics and/or computer science to a variety of biological topics. In our center, the research focus of biostatistics and bioinformatics is to develop new methodology for analyzing genetic and genomic data, assist researchers in our center in statistical-related issues such as experimental design, power calculation, and data analysis, maintaining and managing data generated from various experiments, and aid in publication and grant application.

Studies

Utility of next-generation sequencing data in complex trait gene mapping

The improvement in technology and the reduction in cost have increased the affordability and accessibility of the whole-genome sequencing for researchers in various biological studies. This has provided a tremendous amount of information for genetic studies. One of our research focuses is to design and develop statistics and bioinformatics methodology for using the information from the genome-wide sequence data to mapping complex trait genes. We have proposed several strategies for improved detection of rare genetic variants and efficient utilization of rare variants for detection of disease-related genome regions.

Methodology development and application for association study

Association study to test whether genetic variants, such as single nucleotide polymorphisms (SNPs), are associated with a complex trait or disease. In particular, genome-wide association study (GWAS) uses a large number of genetic variants across the whole genome. Our research in this area focuses on the methodology development for various situations, including using multiple correlated traits in association analysis, correcting for population substructure, and conducting association analysis using haplotypes.

Methodology development and application for mRNA expression, proteomic, and epigenomic data

Determination of disease status is affected both by DNA variants and by factors at mRNA and protein levels, and/or through DNA modification. Studies at these biological levels provide insights in the genetic determination of complex traits and diseases. To analyze mRNA and protein expression data and epigenomic data such as genome-wide methylation, we have developed and continue to develop various approaches, such as classification of microarray expression data based on Bayes errors, and multivariate profiling for detecting differential expression of microarray data.

Study and comparison of genetic variations within and between various ethnic groups

Genetic variations play important roles in disease determination, not just within a specific ethnic group, but also among different ethnic groups. It has been observed that a number of complex diseases such as osteoporosis have different occurrence rates in different populations. However, how much the differences are due to environmental factors and how much to genetic factors have not been fully understood. Using our data from various ethnic groups, we study and compare the genetic variations both within and among these different ethnic groups. Example topics include DNA variants diversity between Caucasians and Asians, and the effects of the genetic variation differences on disease status.

Integrated data analysis for genetic and genomic studies

Due to the complexity of biological data, many analyses are conducted only using one level of data. For example, DNA variant data, mRNA expression level, and protein expression data are usually analyzed separately. However, these data may have intrinsic relationship and considering them simultaneously may provide greater power in gene detection. Through our own research and collaborations, we are working on combining data from different biological levels, such as DNA variants and mRNA expression levels, in a single analytical framework. The success of this work will provide increased power in genetic and genomic studies.