NGFN2 SMP bioinformatics data integration
The NGFN2 (Nationalen Genomforschungsnetz, supported by the Bundesministerium für Bildung and Forschung) required the conceptual design, implementation and integration of databases for collection and management of molecular, clinical and phenotype data to establish a coherent view of NGFN2 data. The "Data Management" project within the SMP "Bioinformatics" will provide decentralized (e.g., iCHIP from the German Cancer Research Center, Deutsches Krebsforschungszentrum) and centralized (e.g., MIPSexpress, MEDAS, NAME-ML from GSF - Forschungszentrum für Umwelt und Gesundheit) database infrastructures as well as data integration tools (the BioRS Integration and Retrieval System from Biomax) for efficient management of multi-center research networks.
The Biomax BioRS system allows the integration of heterogeneous databases (flat-file as well as relational) to be integrated into a homogenous environment. Multiple databases can be accessed simultaneously by a single query, although the locations of the integrated databases are distributed.
For example, iCHIP at the DKFZ in Heidelberg and MIPSexpress at the GSF in Munich can be searched in parallel without requiring data replication. These databases can also be used in the context of public biological databases (e.g., UniProt) for mining of expression data and biological and biomedical content. In addition, they will be linked with systems for the biostatistical analysis of the expression experiments.
The aim of data integration is to make databanks at multiple sites where they are locally administered available to the entire network through a uniform interface. In this way, databank copies — and the inherent data inconsistency problems — can be avoided. The resulting NGFN-wide network of integrated databanks can be flexibly adapted to changing requirements. Correlations between integrated databanks can be managed and used as search criteria, such as integrated external information sources (e.g., hyperlinks to the Internet). In this way, an efficient and easily queried network of valuable data sources is realized.