HealthPredict

Tools for Life Sciences

TOOLS AND PROJECTS

1.Discovery platform for life scineces
2. Specialized search for bioinformatics
3. Semantic representation of clinical trial data and other trusted sources of medical knowledge
4. Processing Life Science Data Analysis at Scale – using Semantic Web Technologies


1. A discovery platform for bioinformatics

Healthpredict Discovery Platform is for exploration and analysis of human diseases.

A) This is a very useful platform for biomedical researchers, as the platform can score the rates of confidence of each disease-rule association based on the supporting evidence.

B) The platform allows answering complex biomedical questions.

B) The suite of tools for knowledge discovery: 1) a Web interface that supports user-friendly data exploration and downloading; 2) the app for network analysis of data; 3) the SPARQL endpoint to explore, query and expand to a variety of external RDF resources already present in the Linked Open Data cloud using Semantic Web technologies, 4) allows to perform ontology-based queries using disease ontologies.

Aim

The platform is aimed at a variety of users: from the bioinformatician that interrogates the database, to the systems biology expert that explores and analyze network representations of the information, and healthcare practitioners who interrogates the database using its user-friendly Web interface. It also allows to build the specific applications on top of data.

The platform consists of a comprehensive knowledge base of disease associations arising from both expert-curated databases and information extracted from the scientific literature using text mining, with special attention paid to the explicit provenance of the association.


2. Building specialized search engines for bioinformatics

Data retrieval is common in bioinformatics. However, its very important to address the  quality of retrieved information. Helthpredict TM’s specialized search engines offer optimal strategies for indexing rich biomedical data that are less well understood. Biomedical data often contains complex annotations (to ontologies) and therefore do not conform to the model imposed by most search technologies.

Healthpredict offers state of the art in indexing and querying of biomedical data. Our retrieval systems are written with open source software based on the proprietary (patent pending) algorithms. The prototype engines were built in collaboration between software community and medical experts, bringing together experts in biological data and experts in utilizing the world-leading Apache Lucene/Solr search engine framework to address the challenges of making biomedical data more accessible.

Challenges include integrating ontology-enabled search and searching by common classification systems (taxonomy, enzyme classifications, protein families etc). Healthpredict is developing software to facilitate indexing of ontologies, ontology driven faceting and other techniques of search that use the additional semantics provided in data-sets that have been enriched with ontology annotations. Customized search  include:

  • Entity extracting and indexing of web sources with ontology annotations
  • Using web application to search this data
  • Performing ontology-powered searches

2. Semantic Representation of Clinical Trial Data and other trusted sources of medical knowledge

Joint industry and academic projects in USA and Europe express clinical data in RDF for drug research and translational medicine. Semantic representation of clinical data in RDF can improve the availability and utility for research, decision support, adverse events detection, outbreak surveillance, etc. Healthpredict offer tools  to interpret and reason about clinical l trial data. This will include downloadable personal health care data such as that mandated by the US’s “Blue Button” and genomics which can affect diagnostic decisions and medication efficacy. This technology will enable the translational medicine systems of tomorrow.

Healthpredict offer includes:

  1. (Clinical) terminology and rule models for various diseases. This will describe the way clinical data is modeled addressing the difference between the terminology models (meta-data)
  2. Mapping of source data to knowledge representation systems (OWL) and validation schemas (ShEx) for practical use
  3. Mapping between conventional XML clinical data and the RDF graph conveying the same information
  4. Transformation of conventional clinical data to RDF with validation on the generated data
  5. Application for exploration of semantic clinical data for:
    • Evaluation of treatment efficacy
    • Drug surveillance

Semantic annotation of information sources in life sciences

Healthpredict’s semantic annotation include using existing ontologies and cross-referencing to other RDF-based biomedical resources. In addition to the RDF data modeling, we offer search through interconnected databases using Semantic Web technologies, i.e. SPARQL queries and logic-based inference, to address complicated questions on behalf of biomedical research. The project is aimed ata contributors across biological domains (disease) who are interested in publishing their data for integration with other resources to enhance knowledge based discovery


3. Life Science Data Analysis at Scale – using Semantic Web Technologies

The life sciences domain has been one of the early adopters of linked data and, a considerable portion of the Linked Open Data cloud is comprised of datasets from Life Sciences Linked Open Data (LSLOD). The growth in size of data sets lead to the needs for integrating multiples of these data-sets. This growth requires large scale distributed infrastructure and specific techniques for managing large linked data graphs. In combination Semantic Web and Linked Data technologies enable the processing of large as well as semantically heterogeneous data sources and the capturing of a new knowledge from those.

Healthpredict offers state of the art technology set in knowledge acquisition and discovery for targeted applications, including:

  • Classical Querying: SPARQL query federation to access multiple heterogeneous biological datasets to draw meaningful co-relations. Real Life sciences Dataset e.g Drugbank, Dailymed can be queried to gain the specific insights.
  • Scalable Infrastructure: Healthpredict focuses on the specific data Pipelines to address the particular medical concerns. We have built our own Big Linked Data Knowledge pipelines to met these concerns.
  • Visualisation: Healthpredict built applications to visualize Big RDF Data that augment discovery in cancer research and allergy research.
  • Large data-set research: Healthpredict offers practical  tools and technologies to create a graph summary of large rdf dataset. Using simplified graph allows to exploring a larger data-sets in a meaningful ways.

4. A discovery platform for bioinformatics

Healthpredict Discovery Platform is for the dynamic exploration and analysis of human diseases. The platform consists of a comprehensive knowledge base of disease associations arising from both expert-curated databases and information extracted from the scientific literature using text mining, with special attention paid to the explicit provenance of the association.

A) This is a very useful platform for biomedical researchers, as the platform can score the rates of confidence of each disease-rule association based on the supporting evidence.

B) The platform allows answering complex biomedical questions.

B) The suite of tools for knowledge discovery: 1) a Web interface that supports user-friendly data exploration and downloading; 2) the app for network analysis of data; 3) the SPARQL endpoint to explore, query and expand to a variety of external RDF resources already present in the Linked Open Data cloud using Semantic Web technologies, 4) allows to perform ontology-based queries using disease ontologies.

Aim

The platform is aimed at a variety of userss: from the bioinformatician that interrogates the database, to the systems biology expert that explores and analyze network representations of the information, and healthcare practitioners who interrogates the database using its user-friendly Web interface. It also allows to build the specific applications on top of data.