Knowledge Discovery and Data Mining

The amount of data we have at our fingertips regarding many things is extraordinary. In other cases we have a limited data set from which we must draw conclusions. IHMC researchers are developing tools and techniques for gleaning useful information in both situations.

The first step in prediction is understanding the past, the causes and effects we have seen before. Some causes may be obvious, while others might be subtle, uncovered only after analysis of volumes of data. Some causes may be direct, while others might act through a long chain of events. IHMC researchers are creating tools for analyzing causes. Their algorithms create causation maps, called Bayes nets, which graphically show the relationships between a variety of causes and the ultimate effect.

Unfortunately, some of the standard causal relationship algorithms are too simple for understanding complex systems, such as those found in biology. Bayes nets, for instance, assume sequential relationships, but biology is full of feedback loops. Algorithms being created by IHMC scientists take into consideration some background knowledge to create more accurate causal models.

Analyzing causes helps us create a model for predicting future outcomes. These models are based on a set of premises, or statements we accept to be true. From these premises, we frequently must infer conclusions that go beyond the information contained in the evidence. There is uncertainty in our conclusions, therefore, even if all of our premises are true. IHMC scientists are examining methods for formalizing such inductive reasoning for use by computers in mining databases for general knowledge.

An understanding of the data is easiest when the data are impeccable. Sometimes, though, the data are flawed, filled with human and machine errors.People often decide what data are correct based on the interdependency of the data, double-checking data entry or sensor output, when the results just don’t seem right. IHMC scientists are developing a computer system that “polishes” data, examining interdependencies to find and fix errors.

The role of humans in decision making is still quite central. IHMC researchers study human understanding of causal relations and how they best learn to interpret causality. In turn, they share this information with people who must analyze systems for complex causality.