<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>Girke Lab</title>
 <link href="https://tgirke.github.io/atom.xml" rel="self"/>
 <link href="https://tgirke.github.io/"/>
 <updated>2026-01-02T20:20:38+00:00</updated>
 <id>https://tgirke.github.io</id>
 <author>
   <name>Thomas Girke</name>
   <email>thomas.girke@ucr.edu</email>
 </author>

 
 <entry>
   <title>Research Page</title>
   <link href="https://tgirke.github.io/2015/11/12/research/"/>
   <updated>2015-11-12T00:00:00+00:00</updated>
   <id>https://tgirke.github.io/2015/11/12/research</id>
   <content type="html">&lt;p&gt;To navigate this site, please click the &lt;strong&gt;☰&lt;/strong&gt; symbol to the left.&lt;/p&gt;

&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;

&lt;blockquote&gt;
  &lt;p&gt;The Girke lab focuses on fundamental research questions at the intersection
of genome biology and chemical genomics. These include: Which factors in genomes,
proteomes and metabolomes are functionally relevant and perturbable by small
molecules? What properties of small molecules and their targets are the main
drivers for compound-target interactions? How can these insights be used to
develop precision perturbation strategies for biological processes with
translational applications in both agriculture and human health? To address
these questions, the group develops computational methods for analyzing both
large-scale omics and small molecule bioactivity data. This includes
discovery-oriented projects, as well as algorithm and software development
projects for data types from a variety of Big Data technologies, such as NGS,
genome-wide profiling approaches and chemical genomics.  As part of the
multidisciplinary nature of my field, the group frequently collaborates with
experimental scientists on data analysis projects of complex biological
problems. Another important activity is the development of integrated data
analysis systems for the open source software projects R and Bioconductor. The
following gives a short summary of a few selected projects.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1 id=&quot;selected-projects&quot;&gt;Selected Projects&lt;/h1&gt;

&lt;h2 id=&quot;table-of-content&quot;&gt;Table of Content&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;#signaturesearch&quot;&gt;&lt;em&gt;signatureSearch&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#spatialheatmap&quot;&gt;&lt;em&gt;spatialHeatmap&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#longevity&quot;&gt;Longevity&lt;/a&gt;
    &lt;ol&gt;
      &lt;li&gt;&lt;a href=&quot;#longevitygenomics&quot;&gt;Longevity Genomics&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#longevityconsortium&quot;&gt;Longevity Consortium&lt;/a&gt;&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#workflowsoftware&quot;&gt;Workflow Software&lt;/a&gt;
    &lt;ol&gt;
      &lt;li&gt;&lt;a href=&quot;#systempiper&quot;&gt;&lt;em&gt;systemPipeR&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#systempipeshiny&quot;&gt;&lt;em&gt;systemPipeShiny&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#ngsassembly&quot;&gt;Assembly of NGS Data&lt;/a&gt;
    &lt;ol&gt;
      &lt;li&gt;&lt;a href=&quot;#refgenomeassembly&quot;&gt;Reference-Assisted Genome Assembly&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#reftxassembly&quot;&gt;Reference-Assisted Transciptome Assembly&lt;/a&gt;&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#geneexpressionanalysis&quot;&gt;Gene Expression Networks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#cheminformatics&quot;&gt;Cheminformatics Software&lt;/a&gt;
    &lt;ol&gt;
      &lt;li&gt;&lt;a href=&quot;#chemminer&quot;&gt;&lt;em&gt;ChemmineR&lt;/em&gt; and ChemMine Tools&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#fmcsr&quot;&gt;&lt;em&gt;fmcsR&lt;/em&gt; and &lt;em&gt;eiR&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#proteinfct&quot;&gt;Protein Function&lt;/a&gt;
    &lt;ol&gt;
      &lt;li&gt;&lt;a href=&quot;#subhmm&quot;&gt;&lt;em&gt;sub-HMM&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;div id=&quot;signaturesearch&quot; /&gt;

&lt;h2 id=&quot;1-gene-expression-searching-with-signaturesearch&quot;&gt;1. Gene Expression Searching with &lt;em&gt;signatureSearch&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;http://bioconductor.org/packages/release/bioc/html/signatureSearch.html&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;signatureSearch&lt;/em&gt;&lt;/a&gt; 
is an R/Bioconductor package that integrates a suite of
existing and novel algorithms into an analysis environment for gene expression
signature (GES) searching combined with functional enrichment analysis (FEA)
and visualization methods to facilitate the interpretation of the search
results &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/33068417/&quot; target=&quot;_blank&quot;&gt;(Duan et al., 2020)&lt;/a&gt;. 
In a typical GES search (GESS), a query GES is searched against a
database of GESs obtained from large numbers of measurements, such as different
genetic backgrounds, disease states and drug perturbations. Database matches
sharing correlated signatures with the query indicate related cellular
responses frequently governed by connected mechanisms, such as drugs mimicking
the expression responses of a disease. To identify which processes are
predominantly modulated in the GESS results, we developed specialized FEA
methods combined with drug-target network visualization tools. The provided
analysis tools are useful for studying the effects of genetic, chemical and
environmental perturbations on biological systems, as well as searching single
cell GES databases to identify novel network connections or cell types. The
&lt;em&gt;signatureSearch&lt;/em&gt; software is unique in that it provides access to an integrated
environment for GESS/FEA routines that includes several novel search and
enrichment methods, efficient data structures, and access to pre-built GES
databases, and allowing users to work with custom databases.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/33068417/&quot;&gt;&lt;img src=&quot;/public/images/signatureSearch_vis_abstract.png&quot; alt=&quot;image&quot; style=&quot;width:500px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 1:&lt;/b&gt; Overview of &lt;i&gt;signatureSearch&lt;/i&gt; environment.&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;spatialheatmap&quot; /&gt;

&lt;h2 id=&quot;2-visualizing-spatial-assays-in-anatomical-images&quot;&gt;2. Visualizing Spatial Assays in Anatomical Images&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://bioconductor.org/packages/release/bioc/html/spatialHeatmap.html&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;spatialHeatmap&lt;/em&gt;&lt;/a&gt; 
package provides functionalities for visualizing cell-,
tissue- and organ-specific data of biological assays by coloring the
corresponding spatial features defined in anatomical images according to a
numeric color key. The color scheme used to represent the assay values can be
customized by the user. This core functionality of the package is called a
spatial heatmap (SHM) plot. It is enhanced with visualization tools for groups
of measured items (e.g. gene modules) sharing related abundance profiles,
including matrix heatmaps combined with hierarchical clustering dendrograms and
network representations. The functionalities of spatialHeatmap can be used
either in a command-driven mode from within R or a graphical user interface
(GUI) provided by a Shiny App that is also part of this package. While the
R-based mode provides flexibility to customize and automate analysis routines,
the Shiny App includes a variety of convenience features that will appeal to
experimentalists and other users less familiar with R. Moreover, the Shiny App
can be used on both local computers as well as centralized server-based
deployments (e.g. cloud-based or custom servers) that can be accessed remotely
as a public web service for using spatialHeatmap’s functionalities with
community and/or private data. The functionalities of the spatialHeatmap
package are illustrated in Figure 2.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;http://bioconductor.org/packages/release/bioc/vignettes/spatialHeatmap/inst/doc/spatialHeatmap.html&quot;&gt;&lt;img src=&quot;/public/images/spatialHeatmap_vis_abstract.jpeg&quot; alt=&quot;image&quot; style=&quot;width:500px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 2:&lt;/b&gt; Overview of &lt;i&gt;spatialHeatmap&lt;/i&gt; environment.&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;longevity&quot; /&gt;

&lt;h2 id=&quot;3-projects-related-to-longevity-and-healthy-aging&quot;&gt;3. Projects Related to Longevity and Healthy Aging&lt;/h2&gt;

&lt;p&gt;Human longevity is heritable, and statistically and biologically compelling
genetic associations with longevity and age-related traits have been
identified. The translation of these genetic associations into insights that
can lead to pharmacological interventions designed to promote healthy aging
requires an approach and infrastructure that integrates many genomic resources.&lt;/p&gt;

&lt;div id=&quot;longevitygenomics&quot; /&gt;

&lt;h3 id=&quot;31-longevity-genomics&quot;&gt;3.1 Longevity Genomics&lt;/h3&gt;

&lt;p&gt;To address this challenge, the &lt;a href=&quot;http://www.longevitygenomics.org/&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;Longevity Genomics&lt;/em&gt;&lt;/a&gt; research group
has been established, an NIA funded research project to develop an integrative 
genomic resource and infrastructure to develop translational strategies to 
promote human longevity. The infrastructure will include data from longitudinal 
cohort studies with genome-wide genotype and sequence data, computational methods 
for annotating genetic variants, information from tissue-specific expression quantitative
trait locus (eQTL) studies, and datasets of chemical properties and protein
targets of small molecule compounds.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;https://www.longevitygenomics.org/&quot;&gt;&lt;img src=&quot;/public/images/lg_visual_abstract.png&quot; alt=&quot;image&quot; style=&quot;width:500px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 3:&lt;/b&gt; Drug-target network of the small-molecule assayed proteome (&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/28178331/&quot; target=&quot;_blank&quot;&gt;Backman et al, 2017&lt;/a&gt;). &lt;/font&gt;

&lt;div id=&quot;longevityconsortium&quot; /&gt;

&lt;h3 id=&quot;32-longevity-consortium&quot;&gt;3.2 Longevity Consortium&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://www.longevityconsortium.org/&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;Longevity Consortium
(LC)&lt;/em&gt;&lt;/a&gt; aims to
integrate analyses of the genomic, proteomic, and metabolomic bases of human
longevity and the lifespans of animal species into models of the molecular
pathways that contribute to human longevity. An important goal is to identify
pathways that are amenable to pharmacologic intervention. The Girke group is
leading the drug-discovery aspects for the LC.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;https://www.longevityconsortium.org/&quot;&gt;&lt;img src=&quot;/public/images/LC_visual_abstract.png&quot; alt=&quot;image&quot; style=&quot;width:500px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 4:&lt;/b&gt; Drug-target data mining strategy (&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/31724058/&quot; target=&quot;_blank&quot;&gt;McCorrison et al, 2019&lt;/a&gt;).&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;workflowsoftware&quot; /&gt;

&lt;h2 id=&quot;4-workflow-environment-for-large-scale-data-analysis&quot;&gt;4. Workflow Environment for Large-scale Data Analysis&lt;/h2&gt;

&lt;div id=&quot;systempiper&quot; /&gt;

&lt;h3 id=&quot;41-systempiper-ngs-workflow-and-report-generation-environment&quot;&gt;4.1 &lt;em&gt;systemPipeR&lt;/em&gt;: NGS workflow and report generation environment&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;http://bioconductor.org/packages/devel/systemPipeR/&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;systemPipeR&lt;/em&gt;&lt;/a&gt; is an
R/Bioconductor package for building and running automated analysis workflows
for a wide range of next generation sequence (NGS) applications. Important
features include a uniform workflow interface across different NGS
applications, automated report generation, and support for running both R and
command-line software, such as NGS aligners or peak/variant callers, on local
computers or compute clusters. Efficient handling of complex sample sets and
experimental designs is facilitated by a consistently implemented sample
annotation infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/public/images/systempiper.png&quot; alt=&quot;systempiper&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;
&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 5:&lt;/b&gt; Workflow design structure of &lt;i&gt;systemPipeR&lt;/i&gt;.&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;systempipeshiny&quot; /&gt;

&lt;h3 id=&quot;42-shiny-app-for-systempiper-workflows&quot;&gt;4.2 Shiny App for &lt;em&gt;systemPipeR&lt;/em&gt; workflows&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://bioconductor.org/packages/release/bioc/html/systemPipeShiny.html&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;systemPipeShiny&lt;/em&gt;&lt;/a&gt; 
(SPS) extends the widely used systemPipeR (SPR) workflow
environment with a versatile graphical user interface provided by a Shiny App.
This allows non-R users, such as experimentalists, to run many systemPipeR’s
workflow designs, control, and visualization functionalities interactively
without requiring knowledge of R. Most importantly, SPS has been designed as a
general purpose framework for interacting with other R packages in an intuitive
manner. Like most Shiny Apps, SPS can be used on both local computers as well
as centralized server-based deployments that can be accessed remotely as a
public web service for using SPR’s functionalities with community and/or
private data. The framework can integrate many core packages from the
R/Bioconductor ecosystem. Examples of SPS’ current functionalities include: (a)
interactive creation of experimental designs and metadata using an easy to use
tabular editor or file uploader; (b) visualization of workflow topologies
combined with auto-generation of R Markdown preview for interactively designed
workflows; (c) access to a wide range of data processing routines; (d) and an
extendable set of visualization functionalities. Complex visual results can be
managed on a ‘Canvas Workbench’ allowing users to organize and to compare plots
in an efficient manner combined with a session snapshot feature to continue
work at a later time. The present suite of pre-configured visualization
examples include different methods to plot a count table. The modular design of
SPR makes it easy to design custom functions without any knowledge of Shiny, as
well as extending the environment in the future with contributions from the
community.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;https://bioconductor.org/packages/release/bioc/vignettes/systemPipeShiny/inst/doc/systemPipeShiny.html&quot;&gt;&lt;img src=&quot;/public/images/systemPipeShiny_vis_abstract.png&quot; alt=&quot;image&quot; style=&quot;width:500px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 6:&lt;/b&gt; Snapshot of &lt;i&gt;systemPipeShiny&lt;/i&gt; environment.&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;ngsassembly&quot; /&gt;

&lt;h2 id=&quot;5-assembly-of-next-generation-sequence-data&quot;&gt;5. Assembly of Next Generation Sequence Data&lt;/h2&gt;

&lt;div id=&quot;refgenomeassembly&quot; /&gt;

&lt;h3 id=&quot;51-reference-assisted-genome-assembly&quot;&gt;5.1 Reference-Assisted Genome Assembly&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;De novo&lt;/em&gt; assemblies of genomes remain one of the most challenging applications
in next-generation sequencing. Usually, their results are incomplete and
fragmented into hundreds of contigs. Repeats in genomes and sequencing errors
are the main reasons for these complications. With the rapidly growing number
of sequenced genomes, it is now feasible to improve assemblies by guiding them
with genomes from related species. This project introduces &lt;em&gt;AlignGraph&lt;/em&gt;, an
algorithm for extending and joining &lt;em&gt;de novo&lt;/em&gt;-assembled contigs or scaffolds
guided by closely related reference genomes &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/24932000&quot;&gt;(Bao et al.,
2014)&lt;/a&gt;.  It aligns paired-end
(PE) reads and preassembled contigs or scaffolds to a close reference. From the
obtained alignments, it builds a novel data structure, called the PE
multipositional de Bruijn graph. The incorporated positional information from
the alignments and PE reads allows us to extend the initial assemblies, while
avoiding incorrect extensions and early terminations. In our performance tests,
AlignGraph was able to substantially improve the contigs and scaffolds from
several assemblers.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/public/images/aligngraph.png&quot; alt=&quot;aligngraph&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;
&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 7:&lt;/b&gt; Overview of &lt;i&gt;AlignGraph&lt;/i&gt; algorithm. (A)
shows &lt;i&gt;AlignGraph&lt;/i&gt; in the context of common genome assembly workflows, and the one
on the bottom (B) illustrates its three main processing steps. &lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;reftxassembly&quot; /&gt;

&lt;h3 id=&quot;52-reference-assisted-transcriptome-assembly&quot;&gt;5.2 Reference-Assisted Transcriptome Assembly&lt;/h3&gt;

&lt;p&gt;Owing to the complexity and often incomplete representation of transcripts in
RNA-Seq libraries, the assembly of high-quality transcriptomes can be extremely
challenging. To improve this, my group is developing 
algorithms for guiding these assemblies with genomic sequences of related organisms as
well as reducing the complexity in NGS libraries. The software tools we have published for this
purpose so far include &lt;em&gt;SEED&lt;/em&gt; &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/21810899&quot;&gt;(Bao et al., 2011)&lt;/a&gt;
and &lt;em&gt;BRANCH&lt;/em&gt; &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/23493323&quot;&gt;(Bao et al., 2013)&lt;/a&gt;. &lt;em&gt;BRANCH&lt;/em&gt;
is a reference assisted post-processing method for enhancing &lt;em&gt;de novo&lt;/em&gt; 
transcriptome assemblies (Figure 8). It can be used in combination with most de novo
transcriptome assembly software tools. The assembly improvements are achieved
with help from partial or complete genomic sequence information. They can be
obtained by sequencing and assembling a genomic DNA sample in addition to the
RNA samples required for a transcriptome assembly project. This approach is
practical because it requires only preliminary genome assembly results in form
of contigs. Nowadays, the latter can be generated with very reasonable cost and
time investments. In case the genome sequence of a closely related organism is
available, one can skip the genome assembly step and use the related gene
sequences instead. This type of reference assisted assembly approach provides
many attractive opportunities for improving de novo NGS assemblies in the
future by making use of the rapidly growing number of reference genome
information available to us.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/public/images/branch.jpg&quot; alt=&quot;BRANCH Image&quot; /&gt;&lt;/p&gt;
&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 8:&lt;/b&gt; Outline of *BRANCH* algorithm published in &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/23493323&quot;&gt;Bao et al. 2013&lt;/a&gt;. (a) Read alignments against preassembled transcripts  and closely related genomic reference. (b) Junction graph based on this alignment result. (c) Assembly of extended transcripts.&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;geneexpressionanalysis&quot; /&gt;

&lt;h2 id=&quot;6-modeling-gene-expression-networks-from-rna-seq-and-chip-seq-data&quot;&gt;6. Modeling Gene Expression Networks from RNA-Seq and ChIP-Seq Data&lt;/h2&gt;

&lt;p&gt;As part of several collaborative research projects, my group has developed a
variety of data analysis pipelines for profiling data from next generation
sequencing projects (e.g. RNA-Seq and ChIP-Seq), microarray experiments and
high-throughput small molecule screens. Most of the data analysis resources
developed by these projects are described in the associated online manuals for
&lt;a href=&quot;http://manuals.bioinformatics.ucr.edu/home&quot;&gt;next generation data analysis&lt;/a&gt;.
Recent research publications of these projects include: 
&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/23793751&quot; target=&quot;_blank&quot;&gt;Yang et al., 2013&lt;/a&gt;; 
&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/23633570&quot; target=&quot;_blank&quot;&gt;Zou et al., 2013&lt;/a&gt;; 
&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/23633570&quot; target=&quot;_blank&quot;&gt;Yadav et al., 2013&lt;/a&gt;; 
&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/21979915&quot; target=&quot;_blank&quot;&gt;Yadav et al., 2011&lt;/a&gt;; 
&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/19843695&quot; target=&quot;_blank&quot;&gt;Mustroph et al., 2009&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;cheminformatics&quot; /&gt;

&lt;div id=&quot;chemminer&quot; /&gt;

&lt;h2 id=&quot;7-software-for-small-molecule-discovery-and-chemical-genomics&quot;&gt;7. Software for Small Molecule Discovery and Chemical Genomics&lt;/h2&gt;

&lt;p&gt;Software tools for modeling the similarities among drug-like small molecules
and high-throughput screening data are important for many applications in drug
discovery and chemical genomics. In this area we are working on the development
of the &lt;a href=&quot;http://manuals.bioinformatics.ucr.edu/home/chemminer&quot; target=&quot;_blank&quot;&gt;ChemmineR&lt;/a&gt;
environment (&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/18596077&quot; target=&quot;_blank&quot;&gt;Cao et al., 2008&lt;/a&gt;; 
&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/21576229&quot; target=&quot;_blank&quot;&gt;Backman et al., 2011&lt;/a&gt;). This modular
software infrastructure consists currently of five R/Bioconductor packages along with a
user-friendly web interface, named &lt;a href=&quot;https://chemminetools.ucr.edu/&quot;&gt;&lt;em&gt;ChemMine Tools&lt;/em&gt;&lt;/a&gt; that
is intended for non-expert users (Figures 9-10). The integration of cheminformatic 
tools with the R programming environment has many advantages for small molecule discovery, such as easy access to a wide spectrum
of statistical methods, machine learning algorithms and graphic utilities.
Currently, the ChemmineR toolkit
provides utilities for processing large numbers of molecules,
physicochemical/structural property predictions, structural similarity
searching, classification and clustering of compound libraries and screening
results with a wide spectrum of algorithms. More recently, we have developed
for this infrastructure the &lt;em&gt;fmcsR&lt;/em&gt; algorithm which is the first mismatch tolerant 
maximum common substructure search tool in
the field (&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/23962615&quot; target=&quot;_blank&quot;&gt;Wang et al., 2013&lt;/a&gt;).
In our comparisons with related structure similarity search tools, &lt;em&gt;fmcsR&lt;/em&gt;
showed the best virtual screening (VS) performance.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/public/images/ChemmineR.png&quot; alt=&quot;chemm&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/public/images/fig1b.png&quot; alt=&quot;fig1&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 9:&lt;/b&gt; &lt;i&gt;ChemmineR&lt;/i&gt; small molecule modeling environment with its add-on packages and selected functionalities.&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;fmcsr&quot; /&gt;

&lt;p&gt;&lt;img src=&quot;/public/images/crosstarget.png&quot; alt=&quot;crosstarget&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 10:&lt;/b&gt; Selectivity Analysis with &lt;i&gt;ChemmineR&lt;/i&gt; and &lt;i&gt;bioassayR&lt;/i&gt;&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div id=&quot;proteinfct&quot; /&gt;

&lt;h2 id=&quot;8-function-prediction-of-gene-and-protein-sequences&quot;&gt;8. Function Prediction of Gene and Protein Sequences&lt;/h2&gt;

&lt;div id=&quot;subhmm&quot; /&gt;

&lt;p&gt;Computational methods for characterizing the functions of protein sequences
play an important role in the discovery of novel molecular, biochemical and
regulatory activities. To facilitate this process, we have developed the
sub-HMM algorithm that extends the application spectrum of profile HMMs to
motif discovery and active site prediction in protein sequences (&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/20420695&quot; target=&quot;_blank&quot;&gt;Horan et al.
2010&lt;/a&gt;). Its most interesting
utility is the identification of the functionally relevant residues in proteins
of known and unknown function (Figure 11). Additionally, sub-HMMs can be used
for highly localized sequence similarity searches that focus on shorter
conserved features rather than entire domains or global similarities. As part
of this study we have predicted a comprehensive set of putative active sites
for all protein families available in the Pfam database which has become a
valuable knowledge resource for characterizing protein functions in the future.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/public/images/hmmlogo.jpg&quot; alt=&quot;hmmlogo&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;font size=&quot;3&quot;&gt;&lt;b&gt;Figure 11:&lt;/b&gt; Illustration of the sub-HMM extraction process from conserved protein domains, here Pfam desaturase domain (PF00487).&lt;/font&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;see &lt;a href=&quot;/pubs/&quot;&gt;Publication List&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</content>
 </entry>
 

</feed>
