I am interested in applying machine learning method on next generation sequencing (NGS) data to extract signal underlying complex diseases.
As NGS becomes more popular, we now have access to tons of sequencing data from various cohorts, including general population and population with various traits. I have been working on sequencing data of Crohn’s disease (CD), Tourette’s disorder (TS), chronic obstructive pulmonary disease (COPD), etc. to get disease signal inside these cohorts. I inspect data in various ways: (1) Traditional methods, such as genome-wide association analysis, are powerful in detecting phenotype-associated markers in the genome when there is large enough sample size for statistically significant associations. (2) Alternatively, I use predicted protein function change of variants as features, apply feature selection method to select top-ranked phenotype-related features (proteins/genes) or feature subsets, and learn the pattern behind the changes with machine learning methods (AVA,Dx).
I have built models for CD predisposition prediction, Bipolar disorder prediction, Warfarin dose prediction, Venous Thromboembolism prediction with whole exome sequencing (with or without clinical) data. I am currently working on TS data and putting together the AVA,Dx pipeline package.
M.Sc. in Clinical Pharmacokinetics, 2014
B.Sc. in Pharmacy (Clinical Pharmacy Track), 2011