Project: Computational
Analysis of microRNA Binding |
|||
The project aims to develop novel computational methods and tools to
study microRNA binding interactions and microRNAs' role in gene regulation. Small (~22 nucleotide), non-coding RNAs called microRNAs have been
known to regulate genes involved in key aspects of animal development and
physiology through binding-interactions with their mRNA targets. Since the
first discovery of microRNAs in C. elegans in 1993, a large number of
microRNAs have been discovered in metazoan, plants and viruses. Today,
microRNAs are known to express ubiquitously in almost all cell types,
evolutionarily conserved in most of metazoan and plant species, and
potentially regulate more than 30% of mammalian gene products. Understanding
of microRNAs' regulatory functions in the fundamental biological processes is
thus essential towards gaining a global view of gene regulation, but still at
its early stages despite the rapid advances in microRNA biology. |
|||
Major Methods
and Tools |
|||
miRModule |
|||
|
MiRModule is a software tool for systematic discovery of miRNA modules
from a set of predefined miRNA target sites. Given a sets of miRNA binding
sites, miRModule efficiently identifies groups of
miRNAs, whose binding sites significantly co-occur in the same set of target
mRNAs, as putative miRNA modules. It works for both experimentally determined
miRNA-mRNA binding sites (e.g. from CLASH) and computationally predicted
miRNA-mRNA binding sites (e.g. from miRanda). As
long as the miRNA-mRNA binding information is provided, miRModule
can identify putative miRNA modules based on the provided miRNA binding sites
in mRNAs. We provided both Linux and Windows version of the miRModule software. |
||
|
|
The
pipeline to predict miRNA modules. (A) MiRNA-mRNA interaction data from CLASH.
Each line represents a target mRNA, each box represents a miRNA target site,
with different shapes representing different miRNAs. (B). Identify miRNA
groups with their target sites frequently co-occurring in common mRNAs. (C)
Identify miRNA module candidates by binomial tests. (D). Predict miRNA
modules based on hypergeometric tests |
|
|
We studied
miRNA modules based on experimentally determined miRNA target sites. We
predicted 181 miRNA modules and 306 potential miRNA modules. We demonstrated
that miRNA modules preferred to bind weak sites and favoured
a combination of all unconventional sites. We also observed that miRNA
modules preferred to bind in CDSs and favoured the first and the last exons. We confirmed that
more than 70% of miRNA modules bound sites within specific ranges, with
enrichment in two previously known ranges. However, many more adjacent sites
bound by miRNA modules were >130 nucleotides apart. We further showed that
unconventional target sites of miRNA modules were often within shorter
distances than other combinations of target sites. Our study shed new light
on miRNA binding. The
majority of adjacent target sites of miRNA modules were >130 nucleotides
apart, which contradicted with previous observations (Brennecke
et al., 2005; Doench and Sharp, 2004; Kloosterman et al., 2004; Saetrom
et al., 2007; Vella et al., 2004). To understand what resulted in different
observations, we focused on target sites of the 181 miRNA modules in 3′
UTRs. We found even when we considered only target sites in 3′ UTRs,
more than 75% of adjacent target sites of miRNA modules were >130 nucleotides
apart. We also predicted miRNA module candidates using only the 6096 CLASH
target sites in 3′ UTRs and then studied the distances of adjacent
target sites of these candidates. We still observed that the majority of
adjacent target sites of these candidates were >130 nucleotides apart
(Supplementary File S4). Therefore, the different
observations were unlikely because we used target sites in entire mRNA
regions while previous studies used only target sites in 3′ UTRs.
Instead, it may be due to the small number of experimentally determined sites
in previous experimental studies and the limited quality of predicted sites
in the previous computational study, compared with the 18 514
high-quality experimentally determined sites we used. We
predicted (potential) miRNA modules on the condition that they downregulated
target genes significantly more than some of their miRNA subsets. We further
checked whether these (potential) modules downregulated their target genes
significantly more than any subset contained in the modules. We confirmed
that for all (potential) miRNA modules, their target genes were significantly
more down-regulated than the target genes of any of their subsets. We
discovered 201 non-synergistic modules. The non-synergistic modules may also
play important roles in regulating target genes, as supported by GO and
pathway analyses, order preference, and the literature. Moreover, these
non-synergistic modules may be competitive miRNA modules that are worth
further investigation (Khan et al., 2009) |
||
|
|
||
TarPmiR is a software for predicting miRNA target site from CLASH
(cross-linking ligation and sequencing of hybrids) data. |
|||
The identification of microRNA (miRNA) target sites is
fundamentally important for studying gene regulation. There are dozens of
computational methods available for miRNA target site prediction. Despite
their existence, we still cannot reliably identify miRNA target sites,
partially due to our limited understanding of the characteristics of miRNA
target sites. The recently published CLASH (cross-linking ligation and
sequencing of hybrids) data provide an unprecedented opportunity to study the
characteristics of miRNA target sites and improve miRNA target site
prediction methods. Applying four different machine learning approaches to
the CLASH data, we identified seven new features of miRNA target sites.
Combining these new features with those commonly used by existing miRNA
target prediction algorithms, we developed an approach called TarPmiR for miRNA target site prediction. Testing on two human and one mouse non-CLASH datasets, we showed that TarPmiR predicted more than 74.2 % of true miRNA target
sites in each dataset. Compared with three existing approaches, we
demonstrated that TarPmiR is superior to these
existing approaches in terms of better recall and better precision. Although TarPmiR is based on the
published CLASH data, users can easily apply TarPmiR
to any new data set by extending the 'binding' class. Please check 'How to
extend TarPmiR' for more details. |
|||
we identified seven new features together with six conventional
features of miRNA target sites. Based on these 13 selected features, we
developed a new approach called TarPmiR to predict
miRNA target sites. We tested TarPmiR on a human
CLASH dataset, two human PAR-CLIP datasets, a mouse HITS-CLIP dataset and a
general dataset from TarBase 7.0, and showed that TarPmiR performed at least the same or better than three
existing approaches. Not all new features were completely new. We claimed some
features as new because they were not used by most of the existing tools,
such as miRanda (Enright et al., 2004), TargetScan (Friedman et al., 2009; Grimson
et al., 2007), DIANA-microT-CDS (Maragkakis et al., 2009; Paraskevopoulou
et al., 2013), rna22-gui (Loher
and Rigoutsos, 2012), TargetMiner
(Bandyopadhyay and Mitra,
2009), PITA (Kertesz et al., 2007) and RNAhybrid (Krüger and Rehmsmeier, 2006). However, several new features were
mentioned in previous studies directly or indirectly. For instance, Thomson
et al. (2011) stated that ‘some validated miRNA target sites do not have a
complete seed match but instead exhibit 11–12 continuous base pairs in the
central region of the miRNA’. We observed similar target sites in the CLASH
dataset and proposed the feature ‘The length and position of the longest
consecutive pairs’. The selected new features significantly improved the prediction
accuracy of TarPmiR. To show the contribution of
the new features to the accuracy of TarPmiR, we
removed the seven new features and retrained random forests in TarPmiR. Compared with the original TarPmiR
with 13 features, the recall and precision of the modified TarPmiR dropped 8.6% and 9.7%, respectively. We also compared the predicted true target sites by different
approaches (Supplementary File S4). TarPmiR had the largest number of predicted true sites
shared by other tools. However, the percentage of shared true target sites
predicted by TarPmiR was lower than that of other
tools, suggesting that TarPmiR complements existing
tools by predicting sites that cannot be predicted by other tools. In fact,
there are 2090 ‘non-seed-matching’ sites in the first CLASH test dataset. TarPmiR was able to identify 1585 (75.8%) of those sites.
On the other hand, miRanda and TargetScan
were only able to predict 173 (8.28%) and 34 (1.6%) sites, respectively. This
also suggested that the traditional tools like TargetScan
and miRanda almost cannot predict non-seed-matching
binding sites. It is also worth mentioning that CLASH experiments may pick up
direct and indirect miRNA target sites. The Argonaut proteins are guided by
miRNAs to bind mRNAs, which is referred to as miRNA-dependent recruitment and
results in direct miRNA target sites. There is also a miRNA-independent
Argonaut protein recruitment mechanism, in which Argonaut proteins are
recruited to target mRNAs by protein–protein interaction with RNA-binding
proteins and thus miRNAs do not interact with the mRNAs directly (Meister,
2013). In the future, one may want to distinguish these two types of target
sites from the CLASH experiments before training predictors for target site
prediction. In this way, we may also obtain better features and improve the
prediction accuracy. Because of the existence of indirect target sites in CLASH data,
the recall of TarPmiR on the CLASH testing datasets
may be underestimated. In fact, TarPmiR had a much
higher recall on the three independent human and mouse datasets, suggesting
that TarPmiR may have a recall larger than 74%. On
the other hand, TarPmiR had a much lower precision
on the independent datasets, which may be underestimated as well. This was
because we treated all segments other than the CCRs
or identified miRNA target sites in these independent datasets as true
negative target sites, which may not be the case. By the time of this study, only one CLASH dataset was publicly
available (Helwak et al., 2013). This human CLASH
dataset was used to train TarPmiR. We applied TarPmiR to human and mouse datasets and demonstrated that
it works well on these datasets. In the future, with more CLASH datasets
available, more important miRNA target site features including
tissue-specific features may be discovered and the accuracy of TarPmiR, especially its precision, may be further
improved. |
|||
Features
selected by four different methods |
|||
|
|||
|
|
||
|
CCmiR is a software for
predicting miRNA target site by considering miRNA cooperation. |
||
The
identification of microRNA (miRNA) target sites is an important and
challenging problem. In the past decade, dozens of computational methods have
been developed to predict miRNA target sites. Despite their existence, rarely
is there a method that considers the well-known competition and cooperation
among miRNAs when attempts to discover target sites. To fill this gap, we
developed a new approach called CCmiR, which takes
the cooperation and competition of multiple miRNAs into account in a
statistical model to predict their target sites. Tested on 4 different types
of datasets, CCmiR predicted miRNA target sites
with a high recall and a reasonable precision. Moreover, we demonstrated that
CCmiR identified known and new cooperative and
competitive miRNAs supported by literature. Compared with three
state-of-the-art computational methods, CCmiR had a
higher recall and a higher precision than these popular methods. |
|||
|
|||
MDPS
algorithm |
|
||
|
Considering the position dependency of neighboring pairings, we used a
Markov model to learn the position-wise binding patterns for a given miRNA
and its targets. We first defined the five states for the pairings in the
alignment of a given miRNA sequence and one of its target sequences: match ( Five
states in an miRNA-target interaction With the five states, we designed a 5 by 5 transition matrix We defined two types of models: miRNA-specific and miRNA-general
model. The miRNA-specific model was learned by calculating the transition and
weight matrices given the pairing information of a specific miRNA and its
targets. The miRNA-general model was trained by the pairing information of
all available miRNAs and their targets. Note that, a miRNA-general model was
parametrized by only one transition matrix and one weight matrix. The
transition and weight matrices were the unweighted average of the transition
and weight matrices of all the involved miRNA-specific models, respectively. MDPS
scoring strategy MDPS selects miRNA target sites by scoring miRNA-target interactions using
a dynamic programming algorithm. For a given miRNA and a calculated weight
matrix and transition matrix, we have the following DP algorithm to score a target
RNA sequence to determine whether it may be a potential target site of this
miRNA. Here, we first define two notations, With the two notations,
it is evident that Similarly, we calculate
The iteration has the following
initialization: Similarly, we
initialize With the above three types of
iterations, we obtain the maximum of Using the above CLASH
training datasets, we generated the MDPS models
that consisted of the average w and
t matrices and a score cutoff that
gave the best predictions in cross validation on the corresponding CLASH
training dataset. We generated these models for both the target-enriched
dataset and the energy-filtered dataset using 10 fold
cross validation on the corresponding 80% training data. Since the column
size of the w matrix was the length
of the corresponding miRNAs, The column size of the
average w matrix in the models was
the length of the longest miRNAs in the training datasets. If the score was larger than a
given cutoff, this sequence was called the target of this miRNA. We tested five different cutoffs and chosen
the Average score + 2*Standard Deviation as the final cutoff for the
final MDPS models, where the Average score
and the Standard deviation are the mean and the standard deviation of
the alignment scores of the miRNA-target duplexes in the training datasets,
respectively. |
||
Educational Materials |
|||
References: |
|
||
|
· Li X, Hu H. Improving
miRNA target prediction using CLASH data. in A. Lagana (Ed): microRNA Target
Identification, Springer Nature, New York: NY, pp. 75-83. DOI:
10.1007/978-1-4939-9207-2_6. 2019. · Ding J, Li X, Hu H. CCmiR: a computational approach for competitive and cooperative microRNA binding prediction.
Bioinformatics, DOI:10.1093/bioinformatics/btx606. 2017. ·
Wang Y, Goodison S, Li X, Hu H. Prognostic cancer gene signatures
share common regulatory motifs. Scientific Reports, DOI:10.1038/s41598-017-05035-3. 2017. ·
Ding J, Li X, Hu H. TarPmiR: a new approach for microRNA target site prediction. Bioinformatics. doi:
10.1093/bioinformatics/btw318. 2016. ·
Ding J, Li X, Hu H. MicroRNA
modules prefer to bind weak and unconventional target sites. Bioinformatics, 31 (9): 1366 - 1374. doi: 10.1093/bioinformatics/btu833.
2015. |
||
Acknowledgement |
· |
||