Next: Acknowledgments Up: PHENOMENAL DATA MINING: Previous: Ontology

Remarks.

Suppose a customer of type i has a probability of including item j in a basket. We can infer an approximate number of types by looking at the approximate rank of the matrix .
Classifying customers into discrete types may not give as good results as a more complex model that take into account the age of the customer as a continuous variable.
A linear relation between phenomena and observations is the simplest case, and such relations can probably discovered by methods akin to factor analysis.
We could infer that there were two subpopulations if we didn't already know about sex.
We might infer from data from our stores in India, that there was a substantial part of the population that didn't purchase meat products. We can tell this from a situation in which everyone buys meat but less, because certain other purchase patterns are associated with not buying meat.
Tire mounting services are purchased in connection with the purchase of tires. The phenomenon is that tires are useless unless mounted. Does knowing this fact give more than just the correlation?
Suppose a new item, e.g. a hula hoop, is increasing its sales rapidly, and 5 percent of the customers have bought it. Suppose, however, that the customers that buy it rarely buy another, and these customers are only those with young girls in the family, and those customers have almost all bought one. Under these hypotheses, which identifying customers might verify, it is reasonable to conclude that the fad for hula hoops has reached its peak, and that if a lot more are ordered, the store is likely to be stuck with them.
Suppose we have the baskets grouped by customer--either because the data was given or because we have inferred it as described above. Can we determine how far the customers live from the store? The information might be useful in anticipating how much business might be lost to a newly opened competitor. No immediate idea occurred to me when I thought of the question. However, it is rash to conclude that it can't be done. Someone cleverer than I, or who knows more about customers of supermarkets, might figure a way. One just shouldn't jump to negative conclusions.
Grouping by customer might permit observing that no-one who buys item 531 ever buys anything from that store again. Such a fact would not show up as a direct correlation in the data unless item 531 were bought in quantities that significantly affected sales of some other items.
If a customer buys a certain product but doesn't buy a necessary complementary product, we can infer that he buys the complementary product from someone else.

The only experimental work with phenomenal data mining is reported in [LT98].

Next: Acknowledgments Up: PHENOMENAL DATA MINING: Previous: Ontology

John McCarthy
Thu Apr 6 16:23:28 PDT 2000