Next: The Logic of Phenomenal
Up: PHENOMENAL DATA MINING:
Previous: Mail Order Bookstores
Grouping supermarket purchases by customer as proposed in Section
3 can be tested with the aid of a supermarket database that
does contain customer identification. We discard the customer
identification, run our grouping algorithm and compare the results
with the genuine customer data.
My present opinion is that grouping baskets by customers is likely to
be a difficult but feasible task. As will be seen, it will involve
taking advantage special features of the behavior of supermarket
customers. In this respect, it may resemble cryptanalysis which
often takes advantage of special features of the behavior of senders
of messages. Moreover, the results cannot be perfect in terms of
identifying the purchasers, but the uncertainties about who bought
what may not affect the interesting statistics of customer behavior.
Here are some ideas about how to proceed.
- It may be best to start the experiments with a relatively
small store. That way there will be fewer assignments to try and
fewer similar signatures.
- Very likely we should start with a date in the middle of the
operation of the system and try to extend identifications both
forward and backward in time.
- At any time in the computation, there will be a certain
collection of putative customers and a set of possible assignments of
some of the baskets to customers. Maybe the computational resources
will be adequate to deal with hundreds to thousands of possible
assignments. Each of these assignments will have an anomaly
computed on the basis of what has been assigned so far.
- Since many people shop on a weekly basis, it may be worthwhile
to try to find some putative customers who buy on a particular day of
the week.
- It may be possible to find some signatures for some customers
that are repeated every week. For example, a shopper may buy both
whole milk and skim milk every time, because of the needs of
different family members.
- The algorithm may grow assignments forward and backward in
time. As it goes it will eliminate certain assignments.
- When it cannot decide among the assignments over some lengthy
period, say two months, it should probably just pick one in order to
keep down the number of open choices.
- Perhaps there will be a compact way of keeping certain choices
open in order to use long term aspects of the signature.
Next: The Logic of Phenomenal
Up: PHENOMENAL DATA MINING:
Previous: Mail Order Bookstores
John McCarthy
Thu Apr 6 16:23:28 PDT 2000