Genomic Analysis In Spark

Tom White and Jonathan Keebler show off hail, a package to allow you to perform genomic analysis in Apache Spark:

One of the most important downstream analyses is finding genetic trait associations. Association studies look for statistical associations between genetic variation and phenotypic traits, that is, an observable characteristic of an individual, such as hair color or disease. With the increasing availability of whole-genome sequence data, it’s possible to look for variants from across the whole genome that may be associated with a disease, rather than heavily relying only on commonly known variants as in a traditional genome-wide association study (GWAS).

The challenge for downstream processing is scale. Tools that can cope with a few hundred or even a few thousand genomes, such as the well-known 1000 Genomes dataset, can’t handle datasets that are one or more orders of magnitude larger. These datasets are now becoming commonplace, thanks to the multiple sequencing efforts taking place around the world like the 100,000 Genomes Project in the UK and the Precision Medicine Initiative in the US.

Genomic analysis has been right in Hadoop’s wheelhouse for a while.

Related Posts

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library: Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and […]

Read More

Testing Spatial Equilibrium Concepts With tidycensus

Ignacio Sarmiento Barbieri walks us through the concept of spatial equilibrium and tests using data from the tidycensus package: Let’s take the model to the data and reproduce figures 2.1. and 2.2 of “Cities, Agglomeration, and Spatial Equilibrium”. The focus are two cities, Chicago and Boston. These cities are chosen because both differ in how easy […]

Read More