Since the sequencing of the human genome, new approaches for studying disease systems at the genomic, epigenetic, proteomic and metabolomics levels are being continually developed.  The twin challenges with the analysis of such data are the volume, resolution and complexity of the data generated and the issue of data quality. In the past, analysis of different data sets for a given clinical question yielded different results.  We now know that any such inconsistency is driven either by the level of ‘noise’ in the data or by shortfalls in the number of cases, resulting in insufficient statistical power.  We also now know  that analysis of biological data requires acceptance of the non-linearity of biological systems, the interaction of molecular entities within pathways, and the fluidity of biological systems.

Intelligent OMICS has developed its systems biology and bioinformatics approach based on artificial (computational) intelligence which identifies robust non-linear biomarkers associated with clinical features which are shown to be consistently important across multiple data sets.  The methodology studies interactions between key features, determining the identity of important biomarkers and the level of influence of a set of driver markers in a given biological system.

Based on our patented technology we have developed an approach that addresses the challenges of “Big Data” whilst ensuring that results are relevant and validated in terms of each question under study. Intelligent OMICS determines the molecular drivers of a system that govern phenotype.

1. Single target data mining

Initial analysis mines the data to identify a rank order of targets that predict the appropriate clinical outcome.  This ranking is based on the statistical error of predictions, averaged across multiple repeats.  An example distribution of markers identified from this analysis is presented in Figure 1.  The algorithm ranks biomarkers by the level of predictive relevance.

2. Integrated analysis

Expression array studies are frequently underpowered – do not achieve statistical validity – if considered in isolation.  INTELLIGENT OMICS applies its data mining algorithms across multiple data sets and the top ranked probes are then cross compared to find commonality (Figure 2.).  Probes that are found in common between multiple data sets are unlikely to have occurred by random chance.  This concordant approach  increases the reliability of the markers discovered, increases the statistical power of the marker set and significantly reduces the risk of false discovery.  Furthermore, the approach increases the generality of markers discovered so that Intelligent OMICS markers are more likely to be reliable when applied to the general population.

The concordance approach was demonstrated in the Abdel-Fatah … Ball et al study (2016, Lancet Oncology) to identify a key set of markers for proliferation in breast cancer by integrating datamining across 4 datasets.  The biomarker panel identified in the study exhibited a very low probability of a false discovery (2×10-74) and have been validated using immuno-histo-chemistry in over 15000 cases.

3. NIM: Network Inference Modelling

Network Inference Modelling is a systems biology and pathways analysis methodology.  The biomarkers that form the output of the PANN analysis described above become the inputs to the network inference algorithm to identify a network of interactions.  Each network is analysed to identify the key molecular drivers ranked by their influence in a given system.  The approach has been used in a commercial contract with Syngenta to successfully identify transcriptomic regulators of phenotype and described in Pan Y….Ball et al, 2013.  Patent US20140137296 A1, PCT/EP2011/066773.  The NIM approach was also used in the Abdel-Fatah … Ball et al (2016), Lancet Oncology study. The application of a systems approach provides greater clinical utility than a simple list of biomarkers because biology is defined by the interaction of molecular markers.  NIM can thus be used to identify molecular-based disease processes that differentiate between healthy individuals and disease carriers within a population (Therapeutic target identification in silico), to identify previously unknown processes associated with therapeutic response or to describe pharmacodynamics.

A pictorial representation of the NIM results is presented in Figure 3. showing the interactions between markers, the direction of each interaction and the importance of central nodes. these simplified graphics summarise significant advances in science.