Learning Fixed-dimension Linear Thresholds From Fragmented Data
Paul W. Goldberg, Warwick University, UK
Abstract:
We investigate PAC-learning in a situation in which examples
(consisting of an input vector and 0/1 label) have some of the
components of the input vector concealed from the learner. This is a
special case of Restricted Focus of Attention (RFA) learning. Our
interest here is in 1-RFA learning, where only a single component of
an input vector is given, for each example. We argue that 1-RFA
learning merits special consideration within the wider field of RFA
learning. It is the most restrictive form of RFA learning (so that
positive results apply in general), and it models a typical "data
fusion" scenario, where we have sets of observations from a number of
separate sensors, but these sensors are uncorrelated sources.
Within this setting we study the well-known class of linear threshold
functions, or Euclidean half-spaces. The sample complexity of this
learning problem is affected by the input distribution. We identify
fairly general sufficient conditions for an input distribution to give
rise to sample complexity that is polynomial in the PAC parameters
epsilon and delta. We exhibit a method for defining
"bad" input distributions for which the sample complexity can grow
arbitrarily fast. We give an algorithm (using empirical
epsilon-covers) that is polynomial in the PAC parameters
epsilon and delta, provided that the input
distribution is well-behaved and the dimension (number of inputs) of
data is any constant.