My name is Alex and I love to

Explore the world through data.

About Me

I am a statistician experienced with genomics (population, neuroimaging, and single-cell genetics), biobank and phenotypic data, sports analytics, and administrative and census databases. I am currently a postdoctoral researcher at the Data Science Institute at Brown University, where I hold a Canada Postdoctoral Research Award from the Natural Sciences and Engineering Research Council of Canada (NSERC).

My research has included statistical machine learning methods in dimension reduction, visualization, and clustering of genomic data. I also have expertise in genome-wide association analysis, polygenic scoring, and statistical genetics tools (ADMIXTURE, PLINK, etc) and high-performance and cloud computing environments. I have studied biobanks including the 1000 Genomes Project, CARTaGENE, the UK biobank, and AllofUs.

My primary research interest is developing and applying methods for high-dimensional data. I like to study the distributions of phenotypes and environmental variables in large and diverse biobanks through the lens of genetics (e.g. through population structure) and seeing how the distributions differ. I am also interested in studying the social impacts of human genetics research, such as how decades of genomics and consumer genetic testing shape how we think of and define populations.

I enjoy statistics and machine learning for personal projects, such as traffic safety. Among these are projects using computer vision to track traffic and collecting news stories of pedestrian collisions for further statistical analyses.

My main tools and skills:
  • R & Python
  • Statistical modelling
  • Machine learning
  • Dimension reduction
  • Clustering
  • Data visualization
  • Relational databases

Professional experience

Postdoctoral researcher - Brown University
2023 – present
I am currently working with Dr. Sohini Ramachandran researching theory and applications of nonlinear dimension reduction in biobanks to identify gene-environment interactions. I am also studying the social impact of human genetic research with the aim of mitigating potential harms of genomic research. I currently hold a Canada Postdoctoral Research Award from NSERC.
Biobank research
  • Dimension reduction, clustering, modelling, visualization of biobank data (1000 Genomes Project, UK biobank, AllofUs)
  • Analysis of gene-environment interaction
Social impact of human genetic research
  • Text-mining 25 years of Wikimedia data collected with REST API
  • Quantifying uncertainty in visualizations of genetic data
Graduate researcher - McGill University
2017 – 2023
My doctoral research was in population and statistical genetics, studying how to apply nonlinear dimension reduction and density clustering to genomic data under the supervision of Dr. Simon Gravel. I worked with several biobanks (1000 Genomes Project, UK biobank, CARTaGENE) and collaborated in neuroimaging and single-cell genomics. I also rotated through the labs of Steph Weber (writing software to track extra-nucleolar droplets within C. elegans embyros) and the McGill Centre for Integrative Neuroscience, where I developed methods for neuroimaging genetics.
Graduate research
  • Developed machine learning and statistical methods for large-scale biobank data
  • Wrote software pipelines in R and Python
  • Genome-wide association studies and polygenic scores
Teaching
  • Graduate-level course Introduction to statistical programming
  • Workshops on linear/logistic regression, dimension reduction, data visualization, and statistical methods.
Mathematical statistician - Statistics Canada
2010 – 2017

I worked as a methodologist on a variety of surveys and technical projects, including the International Travel Survey, Canadian Income Survey, and developing the Longitudinal Immigration Database. Among my responsibilities were:

  • Sample design
  • Variance and bias estimation
  • Imputation, weighting, and calibration
  • Record linkage
  • Database development and management
  • Statistical programming (SAS, R, SQL).
Technical consultant - The Co-operators
2008 – 2010

I worked in several roles on an ad-hoc basis. These included:

  • Developing internal apps in VB.NET
  • Database management and troubleshooting
  • Maintaining website content

Education

2017–2023
Doctor of Philosophy, Quantatitive Life Sciences
McGill University, Montreal

The Entangled Biobank: On the Topology of High-Dimensional Human Genetic Data

2013–2015
Master of Science, Probability & Statistics
Carleton University, Ottawa

Data Mining the Play-by-Play: Assessing and Applying NHL Performance Metrics Using Statistical Methods

  • Masters thesis using machine learning, regression, and affinity analysis (association rules learning) to measure team and player performance in publicly available National Hockey League data.
2005–2010
Bachelor of Mathematics, Honours Statistics
University of Waterloo, Waterloo
Pure Mathematics minor

Projects and collaborations

Topstrat
Dimension reduction Clustering Biobanks Genomics
Topstrat
A tool developed for dimensionally reducing and clustering diverse genomic data in large biobanks using density clustering.
Visualizing complex biobank data
Dimension reduction Data visualization Biobanks Genomics
Visualizing complex biobank data
A research project on applications of uniform manifold approximation and projection (UMAP) to biobank data.
Neuroimaging and gene expression
High-dimensional statistics Neuroimaging Single-cell genetics
Neuroimaging and gene expression
A collaboration to predict the positions of cells in the hippocampus based on gene expression data.
PVD Streets
REST API OCR Public safety
PVD Streets
Code for the Providence Streets Coalition, such as OCR to extract data from police collision reports and collecting time-lapse Google Maps data using an API.