TCOF4_Project_Introduction

Today, the digital world have so much of data like an asset and data driven methods have become modern tools to wrench and pull out the trends and patterns from huge data sets. With respect to the data deluge, more sophisticated algorithms and tools are yet to be designed for carrying out the big data analysis in an insightful way.
Cheminformatics, the domain of collection, management, analysis and interpretation of chemical data has developed within the computational framework. This arena is often interlinked and has overlaps with the related Bioinformatics and other omics related domains to understand the deeper relationships in data intensive fields. Our main motive is to delve on “How big data applications can be applied in advanced analysis for deciphering the biological and chemical networks”?

 

Data Classification

The need of designing and applying complex analysis arises when complexity of data types increases. Generally, when the population size is small, data points from small sub-populations are generally categorized as “outliers” and it is hard to systematically model them due to insufficient observations. Whereas in large sample, heterogeneity is well understood, shedding light towards for exploring association between certain covariates and rare outcomes.”[1] In this context, samples that contain unstructured data generally lacks the traditional data structure and therefore refers to those which includes texts, images, biomedical literature, electronic records, etc.,

Structured and semi-structured data are those which are well defined to the schema of database management system. These include data from clinical, banking and research databases.