This project aims to identify and assist
in confronting current issues with the clinical diagnosis and treatment of
Bipolar Disorder (BD) patients through the application of Bioinformatics tools.
Machine learning and a new algorithm will be employed to not only gain more
understanding of the comprising biology of BD through the analysis of risk loci
and co-expression/ co-regulation interactions, but also create a model and
genetic risk pathway to better predict an individual’s association with BD.
Bipolar Disorder is a heritable mood disorder characterised by
recurring episodes of depression and mania. It affects 2% of the world’s
population with an additional 2% affected by sub-threshold variants. 1 The World Health
Organisation lists BD as one of the leading causes of disability-adjusted life
years in young adults. 2 In addition to other socio-economic challenges
the disorder carries with it, patients are exposed to an increased mortality
rate from suicide at 7.8% in men and 4.9% in women. 3
Current methods for clinical
diagnosis and treatment of BD are inadequate 4 and can have a
significant impact on a patient’s functioning and quality of life 5. A number of
interconnected reasons can explain these inadequacies:
1. Diagnosis is based on observation of patient behaviour with
reference to criteria defined in manuals such as the ICD. 10. Treatment
decisions are based on clinician and patient preferences 5. Psychiatric
disorders are often highly heterogeneous in etiology and symptomatic manifestation
and yet at no point is the biology that underpins their own variant of the
Symptoms and genetic
composition are shared between other mood disorders. Notably Schizophrenia and
Major Depressive Disorder. 6 7 8 Mis- and delayed diagnosis
of BD patients is common at 60% and it can take between 5 and 10 years for a
patient to receive an accurate diagnosis. 9
There are currently no
reliable and objective methods to predict which patients will likely respond to
what medication. 5
challenges remain and are tightly coupled to shortfalls in our current
understanding of the biological aetiology of BD. Bioinformatics has and
continues to assist with the challenges in this domain through the application
of a variety of analytical tools to explain the causal relationship genetic
variation within a population and the observed phenotypic differences between
its members and provides the foundations for predicting genetic risk by identifying
the risk loci and explaining the genetic architecture of phenotypic traits.
Association Studies (GWAS) are useful as a foundation exercise to explain the
genetic architecture of a trait and have identified a number of risk loci SNPs for
BD and other mood disorders. The Psychiatric Genomics Consortium conducted meta
analyses comprising over 20000 BD patients with 30000 controls and identified
21 such loci. 4
Table 1: Risk loci hits in BD from GWAS 10 11
MIR2113, POU3F2 (OTF7)
confounding factors in the success of GWAS such as sample size and disease
heterogeneity, population stratification and many more loci remain undiscovered.
DNA microarrays allow us to perform a gene expression profiling and implicate
genes through measuring their levels of expression. Gene expression data is of
particular interest to this project in examining BD because its format as a
vector of real numbers is amenable to the application of machine learning.
algorithms and other multivariate pattern analyses have been helpful in
understanding gene function and regulation and can also identify subgroups
within a population by identifying and separating sets of genes that somehow
play a similar role in a disease. Co-expressed genes in a cluster are likely to
be involved in the same cellular processes and expression patterns between them
infers co-regulation. Details of the transcriptional regulatory network can be inferred
analyses of gene expression data with the aid of machine learning tools can
assist in understanding more about a disease, we can do much more with the data
and even employ methods to model it for prediction. This project will take
resulting cluster data and organise it in a binary vector to represent the
presence of absence of upregulation of genes in each cluster. We can use this
data as input for the HyperTraPS algorithm, a method for sampling paths on a
hypercubic transition network. which create a model of the progression pathways
studies have set out to build a genetic risk model to aid in the diagnosis of
Providing ineffective therapies for patients has significant individual
and societal costs, especially considering the high prevalence of BD. 5 Evidence-based
medicine has helped to further understanding of BD, its prognosis and provide
optimal treatments particularly when accounting for heterogeneity 3 and machine learning
tools are gaining traction in psychiatric research. 13 Additionally, similar
studies have suggested that current literature is lacking in providing the
means to asses a patient’s genetic risk of BD 12 there is scope for additional
contributions from Bioinformaticians within this problem domain.
Do some basic clustering here to show whether
microarray data for both suffers of BD and healthy patient controls.
clustering and normalise the data.
clustered data into a format suitable for the HyperTraPS algorithm, deciding
upon the threshold levels of upregulation in clusters.
the HyperTraPS algorithm
Algorithm 1: Hypercubic
Transition Path Sampling
Initialise a set of Nh trajectories
2. For each trajectory i in
the set of Nh:
a. Compute the probability of making a move to a t-compatible
next step (for the first step, all trajectories are at the same point and the
probability for each is thus the same); record this probability as ?’i.
b. If current state is s, set ?i= ?’i, otherwise set ?i??i?’i.
c. Select one of the available t-compatible steps according to their
relative weight. Update trajectory i by
making this move.
3. If current state is everywhere t go
to 4., otherwise go to 2.
Lifetime and 12-month prevalence of bipolar spectrum disorder in the
National Comorbidity Survey replication
Arch. Gen. Psychiatry, vol. 64, p. 543.
Adjusting for dependent
comorbidity in the calculation of healthy life expectancy
Popul. Health Metr., vol. 4, p. 4, 2006.
The impact of machine learning
techniques in the study of bipolar disorder: A systematic review
Neurosci. & Behav. Rev., vol. 90, pp. 538-554, 2007.
Harrison, J. Geddes and E. M. Tunbridge
The Emerging Neurobiology of
Trends in Neurosci, vol. 41, pp. 984-994, 2018.
DeQuevedo and L. Yatham
Biomarkers in Mood Disorders. Are
we there yet?
J. Affect. Disord.
Identification of shared risk loci
and pathways for bipolar disorder and schizophrenia
PLoS One, 2017.
S. H. Lee
Genetic relationship between five
psychiatric disorders estimated from genome-wide SNPs
Nat. Genet., vol. 45, pp. 984-994, 2013.
Individualized Prediction of
Euthymic Bipolar Disorder and Euthymic Major Depressive Disorder Patients
Using Neurocognitive scores, Neuroimaging Data and Machine Learning
Biological Psychiatry, vol. 81, p. 274, 2017.
L-type CaV1.2 channels: from in
vitro findings to in vivo function
Physiol. Rev. vol. 94, p. 303–326, 2014.
L. P. Hou
Genome-wide association study of
40000 individuals identifies two novel loci associated with bipolar disorder
Hum. Mol. Genet, vol. 25, pp. 3383-3394, 2016.