Molecular Medicine – Knowledge-Based IT
Digging Out the Data
Our knowledge of the human genome is stored in gigantic databases. New software is now capable of mining this vast mass of data to uncover the key connections, such as which genes play a role in cancer and how such genes interact. The resulting information makes it easier to develop new medications.
Decoding the human genome. A gel electrophoresis pattern reveals widely varying DNA fragments. This information can be used to establish, for example, the genetic basis of a disease
John’s diagnosis is stomach cancer. At first he’s shocked. Will part of his stomach have to be removed? But there’s also some good news. His doctor is very experienced and was therefore able to diagnose the cancer while it was still in an early stage. In order to offer his patient the best therapy available, John’s doctor looks for additional information on the Internet and in medical literature. There are many public databases for molecular medicine all over the world that store the results of gene expression analyses and other processes—for example, the new ArrayExpress and PubMed portals. The National Center for Biotechnology Information in the U.S. alone maintains approximately 40 databases with various search and service functions. However, it’s extremely difficult to find the information that fits this particular patient in this mass of data.
But in the future, specialists like John’s doctor will have an assistant—the new GeneSim Internet-based knowledge portal. Equipped with all the available information on a patient, his case history, a description of his tumor and lab results, GeneSim dives into the whole world’s medical databases on the Internet. Just a mouse click later, the program presents its results: John has a suspicious gene known as PSMD11 that is particularly active.
Whether it’s a PET/CT image (bottom), tissue sections (2nd and 3rd from top) or microtiter plates (4th from top), the MIPortal (top) links all data from pre-clinical and clinical research
GeneSim also provides information on a related protein that is involved in the genesis of stomach cancer. But that’s not all. The program also tells John’s doctor that there’s already a medication on the market that blocks the suspicious protein and thus leads to the death of associated tumor cells. The program also refers him to a medical journal in which there’s an article about a therapy that has already been successful. Out of an ocean of data, GeneSim has extracted exactly the right information.
Revealing the Right Data. This story may sound like pie in the sky, and indeed the medication in question has not yet been officially approved—but GeneSim already exists as a prototype at Siemens Corporate Technology (CT) research center in Munich. "What still remains to be accomplished, however," says GeneSim developer Dr. Martin Stetter, "is a system that effectively brings together the immense mass of data in the Internet and can assess it in a targeted way."
GeneSim’s mission is to bring order into the worldwide flood of data from the fields of genetics and molecular biology. In the future, it will help doctors in their search for the right therapy and researchers in their efforts to develop new medications. The GeneSim platform interfaces with syngo, Siemens’ uniform software for the operation of imaging processes such as MR and CT (Universal Language for Medical Systems in Pictures of the Future, Fall 2006), including Siemens’ PACS (Picture Archiving and Communication System).
The link to such systems takes place in GeneSim’s central knowledge base module. This is in effect the brain of the system, which creates connections between individual information pools. GeneSim performs several tasks. It collects knowledge, creates links between the data with the help of mathematical processes, and then determines which genes and proteins are directly connected with a certain disease. Finally, it provides an extract of the knowledge available on the Internet regarding a disease.
The way this works was recently demonstrated using stomach cancer as an example, by the development team led by Stetter and Dr. Mathäus Dejori. For this purpose, they took on the role of a researcher who is initially unaware of the significance of the PSMD11 gene. The GeneSim research process begins with a comparative examination of sick and healthy test subjects. Gene expression analysis is used to examine the activity of more than 7,000 selected genes—i.e. protein synthesis—in 30 patients.
Because in most cases different genes are active depending on whether a subject is sick or healthy, GeneSim compares the data from individual patients to find out which genes differ most in terms of their levels of activity. The program uses statistical tests and mathematical models for this purpose. "These tests and models look for conspicuous connections between individual genes—for example, whether certain genes are always particularly active in combination," says Stetter.
The result is an image that appears if you click on the display: a network that depicts the 100 most conspicuous genes as spheres connected by lines. The more strongly the gene seems to be implicated in the genesis of the disease, the larger is its sphere, and the more important the relationship between two genes seems to be, the thicker is the line that connects them. At this point, things get really exciting. If the viewer clicks on one of the genes—such as PSMD11—the search function of GeneSim swings into action and brings together the most important information contained in databases from all over the world.
The program is self-learning and remembers in which databases relevant information can be found. If GeneSim were only a search engine, it would probably display thousands of hits for the search term "PSMD11." However, it can also as sess the information it finds. Its knowledge base compares items to find out which items of information or medical articles best match its own data, i.e. the patient’s age, the stage of his or her illness and other aspects. What finally appears on the display is a text window containing the most important information and the five to ten most relevant links to articles in scientific publications.
A mathematical model, BioSim identifies genes such as those responsible for cancers, and supplies decisive information for the development of perfectly matched medications
Mathematics and Medications.Of course GeneSim addresses not only the needs of physicians but also—and especially—those of researchers who are developing new medications or markers for molecular imaging. Researchers also benefit from the fact that the gene network on the display can be actively altered. For example, a simple mouse click is all it takes to suppress or intensify the activity of individual genes. The mathematical models then once again review the relationships between individual genes in the network and automatically change the activity of the other genes that are influenced by the change. The crucial factor here is that the display also shows whether this changes the course of the illness, intensifies it or makes it disappear altogether. GeneSim now operates as a support system for decision-making. It provides crucial recommendations concerning the areas that should be addressed by new medications in order to stop a disease. The same applies to marker substances. If the gene or protein associated with a disease is found, it is possible to create marker substances that dock onto it, thus making it visible. Stetter and his colleagues have been offering this GeneSim function on the market for two years as a consulting service under the name BioSim, in particular to pharmaceutical companies (see Molecular Detectives in Pictures of the Future, Spring 2005). This way, manufacturers can considerably narrow their focus to a smaller number of potentially pathogenic genes and possible points of attack for medications and markers.
The complete GeneSim package is not yet available, but various parts of it are in operation at partner institutions. For example, the Molecular Imaging (MI) Portal at the Massachusetts General Hospital in Boston was developed on the basis of GeneSim in cooperation with the hospital’s Center for Molecular Imaging Research (CMIR). The MIPortal is currently used primarily in preclinical research—that is, on animal models. The Portal manages the data generated by gene expression analyses and various laboratory tests or imaging processes, organizes it according to projects, and link it together. All in all, the Portal processes information from a total of 15 sources within CMIR. The volume of the data processed quickly reaches hundreds of terabytes (1 terabyte equals 1,000 gigabytes). MIPortal is used for many purposes at CMIR, including the development of new markers for molecular imaging. (see Interview).
Custom Cancer Therapy. Doctors at the Maastro-Klinik in Maastricht, Netherlands, which specializes in radiation therapy, want to be able to identify tumors more accurately in the future—especially in order to optimize radiation therapy planning. That’s why they are using the MIPortal to link images from PET, CT and MR scans with the genetic or molecular analysis of tumor tissues. One especially interesting application is the differentiation of hypoxic (oxygen-starved) tumors from non-hypoxic ones with the help of genetic testing. Hypoxic tumors are resistant to radiation and must be treated using more intensive methods. Thousands of results from preclinical and clinical research are accumulating in the clinic’s laboratories. But thanks to the MIPortal, all of the data can be systematically combined and organized according to individual projects. Cancer, neurodegenerative diseases such as Alzheimer’s, and diseases of the heart and the circulatory system are currently the most important potential areas of application for GeneSim. Stetter’s top objective for the coming years is to make this software the standard tool for therapy planning and the support of decision-making. Stetter has already taken the initial promising steps in this direction through GeneSim.
Tim Schröder
Health-e-Child is an EU-funded, four-year, 16.7 mill. € project designed to develop a prototype integrated healthcare platform for European pediatrics, providing seamless integration of traditional sources of biomedical information, as well as emerging sources, such as genetic and proteomic data. "The idea is to gain a comprehensive view of a child’s health by integrating biomedical data from genetic to clinical to epidemiological information," says Project Coordinator Dr. Jörg Freund from Siemens Image and Knowledge Management, a division of Siemens Medical Solutions. Plans call for the resulting biomedical information platform to be supported by robust search and optimization techniques empowered by grid computing, integrated disease models, database-guided biomedical decision support systems, and data mining for biomedical knowledge discovery. Focusing on individualized disease prevention, screening, early diagnosis, therapy and follow-up of pediatric heart diseases, inflammatory diseases and brain tumors, the program, which is coordinated by Siemens, merges the information technology talents of a consortium of companies, universities and research centers with the clinical expertise and biomedical skills of three major children’s hospitals and collaborating research groups. Now in its first year, the project is already taking shape as researchers from Siemens Corporate Technology and Siemens Medical Solutions are researching analytical tools aimed at supporting clinicians in their decision making. Further information is available at: www.Health-e-Child.org
Arthur F. Pease