Jedidiah Carlson, PhD

Doomsayer: detection of batch effects and outliers in next-generation sequencing data using mutation signature analysis

Introduction Whole-genome sequencing data must go through extensive quality control measures to ensure that the variants identified in an individual’s genome are true biological differences and not the result of errors that can occur throughout the many stages of sample preparation and sequencing. Many such errors can be avoided by collecting, storing, transferring, and preparing the biological samples according to established best practices. Human error is inevitable, however, and sometimes a few DNA samples will get degraded or oxidized or sloshed into another well of the plate, etc.

Helmsman: fast and efficient mutation signature analysis for massive sequencing datasets

The spectrum of somatic single-nucleotide variants in cancer genomes often reflects the signatures of multiple distinct mutational processes, which can provide clinically actionable insights into cancer etiology. Existing software tools for identifying and evaluating these mutational signatures do not scale to analyze large datasets containing thousands of individuals or millions of variants. We introduce Helmsman, a program designed to perform mutation signature analysis on arbitrarily large sequencing datasets. Helmsman is up to 300 times faster than existing software.

Mutation Rate Browser

IntroductionThe Mutation Rate Browser is a preconfigured UCSC Genome Browser track to explore and visualize patterns of fine-scale variability in human germline mutation rates alongside other genomic data. Download the dataThe raw data used to create these tracks can be downloaded at http://mutation.sph.umich.edu/hg19/. Tracks are only available for assembly GRCh37 (hg19) of the human reference genome, but can be converted to coordinates in other assemblies using the UCSC liftOver utility.

Need et al., 2009, Fig. 2

This post is part of a series examining figures from population genetics papers which are modified into white nationalist memes. See here for a brief overview of the project. 1 The figureHere is a PCA plot from Need et al., Genome Biol, 2009, examining the population structure of a sample of individuals with self-reported Jewish ancestry. As indicated in the legend, points have been colored to indicate how many of each individual’s grandparents were Jewish: 2 The modificationNow compare this to the following modified figure, where the legend has been replaced by annotations near each cluster of points, but instead of stating the number of grandparents, clusters are identified as “Gentiles,” “Quadroon-Jews,” “Half-Jews,” “Almost Jews,” and “Full Jews”: 3 The problemTweaking the legends of figures for clarity is commonplace among scientists when giving talks, but what’s so hard to understand about having 0, 1, 2, 3, or 4 grandparents of a particular ethnicity?

Interview with Science Magazine

IntroductionI recently had to opportunity to chat with Science Magazine about a side project I’ve been working on, tracking how figures from population genetics studies get misappropriated by white nationalist groups. Over the next few weeks, I will be posting examples of these figures and discussing a bit of the science of the original paper, the specific ways in which figures are manipulated, and where the modified figures end up online.