E xp l ai n i n g t h e S u c c e s s of N e ar e s t N e i gh b or M e t h od s i n P r e d i c t i on Full text available at: http://dx.doi.org/10.1561/2200000064 O t h e r t i t l e s i n F ou n d at i on s an d T r e n d s R⃝ i n M ac h i n e L e ar n i n g N o n - co n vex O p t i m i z a t i o n f o r M a ch i n e L ea r n i n gy P r at e e k J ai n an d P u r u s h ot t am K a I S B N: 978- 1- 68083- 368- 3 K er n el M ea n E m bed d i n g o f Di s t r i bu t i o n s : A R evi ew a n d B ey o n d K r i k am ol M u an d e t , K e n j i F u k u m i z u , B h ar at h S r i p e r u m b u d u r an d B e r n h ar d S c h ol k op f I S B N: 978- 1- 68083- 288- 4 T en s o r N et w o r ks f o r Di m en s i o n a l i t y R ed u ct i o n a n d L a r ge- s ca l e O p t i m i z a t i o n : P a r t 1 L o w - R a n k T en s o r Deco m p o s i t i o n s An d r z e j C i c h oc k i , An h - Hu y P h an , Q i b i n Z h ao, Nam gi l Le e , I v an O s e l e d e t s , M as as h i S u gi y am a an d D an i l o P . M an d i c I S B N: 978- 1- 68083- 222- 8 T en s o r N et w o r ks f o r Di m en s i o n a l i t y R ed u ct i o n a n d L a r ge- s ca l e O p t i m i z a t i o n : P a r t 2 A p p l i ca t i o n s a n d F u t u r e P er s p ect i ves An d r z e j C i c h oc k i , An h - Hu y P h an , Q i b i n Z h ao, Nam gi l Le e , I v an O s e l e d e t s , M as as h i S u gi y am a an d D an i l o P . M an d i c I S B N: 978- 1- 68083- 276- 1 P a t t er n s o f S ca l a bl e B a y es i a n I n f er en ce E l ai n e An ge l i n o, M at t h e w J am e s J oh n s on an d R y an P . Ad am s I S B N: 978- 1- 68083- 218- 1 G en er a l i z ed L o w R a n k M o d el s M ad e l e i n e Ud e l l , C or i n n e Hor n , R e z a Z ad e h an d S t e p h e n B oy d I S B N: 978- 1- 68083- 140- 5 Full text available at: http://dx.doi.org/10.1561/2200000064 E xp l ai n i n g t h e S u c c e s s of N e ar e s t N e i gh b or M e t h od s i n P r e d i c t i on G e or ge H . C h e n C a r n e g i e M e l l o n U n i v e r s i t y g e o r g e c h e n @ c m u .e d u D e vavr at S h ah M a s s a c h u s e t t s I n s t i t u t e o f T e c h n o l o g y d e v a v r a t @ m i t .e d u B os t on — D e l f t Full text available at: http://dx.doi.org/10.1561/2200000064 F ou n d at i on s an d T r e n d s R⃝ i n M ac h i n e L e ar n i n g P u b l i s h e d , s o l d a n d d i s t r i b u t e d b y : no w P ubl i s he rs Inc . P O B o x 1 0 2 4 H a no v e r, MA 0 2 3 3 9 U ni t e d St a t e s T e l . + 1 - 7 8 1 - 9 8 5 - 4 5 1 0 w w w . no w publ i s he rs . c o m s a l e s @ no w publ i s he rs . c o m O u t s i d e No r t h Am e r i c a : no w P ubl i s he rs Inc . P O B o x 1 7 9 2 6 0 0 A D De l f t T he N e t he rl a nds T e l . + 3 1 - 6 - 5 1 1 1 5 2 7 4 T he pre f e rre d c i t a t i o n f o r t hi s publ i c a t i o n i s G . H . C he n a nd D. Sha h. E x p l a i n i n g t h e S u c c e s s o f Ne a r e s t Ne i g h b o r M e t h o d s i n P r e d i c t i o n . F o unda t i o ns a nd T re nds R⃝ i n Ma c hi ne L e a rni ng , v o l . 1 0 , no . 5 - 6 , pp. 3 3 7 – 5 8 8 , 2 0 1 8 . ISB N : 9 7 8 - 1 - 6 8 0 8 3 - 4 5 5 - 0 c⃝ 2 0 1 8 G . H . C he n a nd D. Sha h A l l r i g h t s r e s e r v e d . N o p a r t o f t h i s p u b l i c a t i o n m a y b e r e p r o d u c e d , s t o r e d i n a r e t r i e v a l s y s t e m , o r t r a n s m i t t e d i n a n y f o r m o r b y a n y m e a n s , m e c h a n i c a l , p h o t o c o p y i n g , r e c o r d i n g o r o t h e r w i s e , w i t h o u t p r i o r w r i t t e n p e r m i s s i o n o f t h e p u b l i s h e r s . P h o t o c o p y i n g . I n t h e U S A : T h i s j o u r n a l i s r e g i s t e r e d a t t h e C o p y r i g h t C l e a r a n c e C e n t e r , I n c . , 2 2 2 R o s e w o o d D r i v e , D a n v e r s , M A 0 1 9 2 3 . A u t h o r i z a t i o n t o p h o t o c o p y i t e m s f o r i n t e r n a l o r p e r s o n a l u s e , o r t h e i n t e r n a l o r p e r s o n a l u s e o f s p e c i fi c c l i e n t s , i s g r a n t e d b y n o w P u b l i s h e r s I n c f o r u s e r s r e g i s t e r e d w i t h t h e C o p y r i g h t C l e a r a n c e C e n t e r ( C C C ) . T h e ‘ s e r v i c e s ’ f o r u s e r s c a n b e f o u n d o n t h e i n t e r n e t a t : w w w . c o p y r i g h t . c o m F o r t h o s e o r g a n i z a t i o n s t h a t h a v e b e e n g r a n t e d a p h o t o c o p y l i c e n s e , a s e p a r a t e s y s t e m o f p a y m e n t h a s b e e n a r r a n g e d . A u t h o r i z a t i o n d o e s n o t e x t e n d t o o t h e r k i n d s o f c o p y i n g , s u c h a s t h a t f o r g e n e r a l d i s t r i b u t i o n , f o r a d v e r t i s i n g o r p r o m o t i o n a l p u r p o s e s , f o r c r e a t i n g n e w c o l l e c t i v e w o r k s , o r f o r r e s a l e . I n t h e r e s t o f t h e w o r l d : P e r m i s s i o n t o p h o t o c o p y m u s t b e o b t a i n e d f r o m t h e c o p y r i g h t o w n e r . P l e a s e a p p l y t o n o w P u b l i s h e r s I n c . , P O B o x 1 0 2 4 , H a n o v e r , M A 0 2 3 3 9 , U S A ; T e l . +1 7 8 1 8 7 1 0 2 4 5 ; w w w . n o w p u b l i s h e r s . c o m ; s a l e s @n o w p u b l i s h e r s . c o m n o w P u b l i s h e r s I n c . h a s a n e x c l u s i v e l i c e n s e t o p u b l i s h t h i s m a t e r i a l w o r l d w i d e . P e r m i s s i o n t o u s e t h i s c o n t e n t m u s t b e o b t a i n e d f r o m t h e c o p y r i g h t l i c e n s e h o l d e r . P l e a s e a p p l y t o n o w P u b l i s h e r s , P O B o x 1 7 9 , 2 6 0 0 A D D e l f t , T h e N e t h e r l a n d s , w w w . n o w p u b l i s h e r s . c o m ; e - m a i l : s a l e s @n o w p u b l i s h e r s . c o m Full text available at: http://dx.doi.org/10.1561/2200000064 F ou n d at i on s an d T r e n d s R⃝ i n M ac h i n e L e ar n i n g Vo l u m e 1 0 , I s s u e 5 - 6 , 2 0 1 8 E d i t or i al B oar d E d i t or - i n - C h i e f M i c h a el J o r d a n U ni v e rs i t y o f C a l i f o rni a , B e rk e l e y U ni t e d St a t e s E d i t or s P e t e r B a rt l e t t UC B e r k e l e y Y o s hua B e ng i o Un i v e r s i t é d e M o n t r é a l A v ri m B l um C M U C ra i g B o ut i l i e r Un i v e r s i t y o f T o r o n t o St e phe n B o y d S t a n f o r d Un i v e r s i t y C a rl a B ro dl e y T u f t s Un i v e r s i t y Inde rj i t Dhi l l o n T e x a s a t Au s t i n J e ro m e F ri e dm a n S t a n f o r d Un i v e r s i t y K e nj i F uk um i z u I S M Z o ubi n G ha hra m a ni C a m b r i d g e Un i v e r s i t y Da v i d H e c k e rm a n M i c r o s o f t R e s e a r c h T o m H e s k e s R a d b o u d Un i v e r s i t y G e o ff re y H i nt o n Un i v e r s i t y o f T o r o n t o A a po H y v a ri ne n He l s i n k i I I T L e s l i e P a c k K a e l bl i ng M I T Mi c ha e l K e a rns UP e n n Da phne K o l l e r S t a n f o r d Un i v e r s i t y J o hn L a ff e rt y Un i v e r s i t y o f C h i c a g o Mi c ha e l L i t t m a n B r o w n Un i v e r s i t y G a bo r L ug o s i P o m p e u F a b r a Da v i d Ma di g a n C o l u m b i a Un i v e r s i t y P a s c a l Ma s s a rt Un i v e r s i t é d e P a r i s - S u d A ndre w Mc C a l l um Un i v e r s i t y o f M a s s a c h u s e t t s Am h e r s t Ma ri na Me i l a Un i v e r s i t y o f W a s h i n g t o n A ndre w Mo o re C M U J o hn P l a t t M i c r o s o f t R e s e a r c h L uc de R a e dt Al b e r t - L u d w i g s - Un i v e r s i t a e t F r e i b u r g C hri s t i a n R o be rt P a r i s - Da u p h i n e Suni t a Sa ra w a g i I I T B o m b a y R o be rt Sc ha pi re P r i n c e t o n Un i v e r s i t y B e rnha rd Sc ho e l k o pf M a x P l a n c k I n s t i t u t e R i c ha rd Sut t o n Un i v e r s i t y o f Al b e r t a L a rry W a s s e rm a n C M U B i n Y u UC B e r k e l e y Full text available at: http://dx.doi.org/10.1561/2200000064 E d i t or i al S c op e T op i c s F ou n d at i on s an d T r e n d s R⃝ i n M ac h i n e Le ar n i n g p u b l i s h e s s u r v e y an d t u t or i al ar t i c l e s i n t h e f ol l ow i n g t op i c s : • Ad ap t i v e c on t r ol an d s i gn al p r oc e s s i n g • Ap p l i c at i on s an d c as e s t u d i e s • B e h av i or al , c ogn i t i v e an d n e u r al l e ar n i n g • B ay e s i an l e ar n i n g • C l as s i fi c at i on an d p r e d i c t i on • C l u s t e r i n g • D at a m i n i n g • D i m e n s i on al i t y r e d u c t i on • E v al u at i on • G am e t h e or e t i c l e ar n i n g • G r ap h i c al m od e l s • I n d e p e n d e n t c om p on e n t an al y s i s • I n d u c t i v e l ogi c p r ogr am m i n g • K e r n e l m e t h od s • M ar k ov c h ai n M on t e C ar l o • M od e l c h oi c e • Non p ar am e t r i c m e t h od s • O n l i n e l e ar n i n g • O p t i m i z at i on • R e i n f or c e m e n t l e ar n i n g • R e l at i on al l e ar n i n g • R ob u s t n e s s • S p e c t r al m e t h od s • S t at i s t i c al l e ar n i n g t h e or y • Var i at i on al i n f e r e n c e • Vi s u al i z at i on I n f or m at i on f or L i b r ar i an s F ou n d at i on s an d T r e n d s R⃝ i n M ac h i n e Le ar n i n g, 2018, Vol u m e 10, 6 i s s u e s . I S S N p ap e r v e r s i on 1935- 8237. I S S N on l i n e v e r s i on 1935- 8245. Al s o av ai l ab l e as a c om b i n e d p ap e r an d on l i n e s u b s c r i p t i on . Full text available at: http://dx.doi.org/10.1561/2200000064 C on t e n t s 1 I n t r od u c t i on 3 1. 1 E x p l a i n i n g t h e P op u l a r i t y of N e a r e s t N e i gh b or Me t h od s . . 5 1. 2 N e a r e s t N e i gh b or Me t h od s i n T h e or y . . . . . . . . . . . . 8 1. 3 T h e S c op e of T h i s Mon ogr a p h . . . . . . . . . . . . . . . 9 2 B ac k gr ou n d 13 2. 1 R e gr e s s i on . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2. 2 C l a s s i fi c a t i on . . . . . . . . . . . . . . . . . . . . . . . . . 17 2. 3 N e a r e s t N e i gh b or a n d K e r n e l R e gr e s s i on . . . . . . . . . . 21 2. 4 N e a r e s t N e i gh b or a n d K e r n e l C l a s s i fi c a t i on . . . . . . . . . 24 3 T h e or y on R e gr e s s i on 26 3. 1 O v e r v i e w of R e s u l t s . . . . . . . . . . . . . . . . . . . . . 27 3. 2 K e y D e fi n i t i on s , T e c h n i c a l i t i e s , a n d A s s u m p t i on s . . . . . 40 3. 3 T h e or e t i c a l G u a r a n t e e s f or k - N N R e gr e s s i on . . . . . . . . 45 3. 4 T h e or e t i c a l G u a r a n t e e s f or F i x e d - R a d i u s N N R e gr e s s i on . . 61 3. 5 T h e or e t i c a l G u a r a n t e e s f or K e r n e l R e gr e s s i on . . . . . . . 66 3. 6 P r oof s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3. 7 C h oi c e of N u m b e r of N e a r e s t N e i gh b or s a n d K e r n e l B a n d w i d t h 100 4 T h e or y on C l as s i fi c at i on 105 4. 1 C on v e r t i n g G u a r a n t e e s f r om R e gr e s s i on t o C l a s s i fi c a t i on . 108 Full text available at: http://dx.doi.org/10.1561/2200000064 4. 2 G u a r a n t e e s U n d e r W e a k e r A s s u m p t i on s . . . . . . . . . . 112 4. 3 G u a r a n t e e s U n d e r a Ma r gi n B ou n d C on d i t i on . . . . . . . 128 5 P r e d i c t i on G u ar an t e e s i n T h r e e C on t e m p or ar y A p p l i c at i on s 131 5. 1 T i m e S e r i e s F or e c a s t i n g . . . . . . . . . . . . . . . . . . . 139 5. 2 O n l i n e C ol l a b or a t i v e F i l t e r i n g . . . . . . . . . . . . . . . . 160 5. 3 P a t c h - B a s e d I m a ge S e gm e n t a t i on . . . . . . . . . . . . . 180 5. 4 Mor e T r a i n i n g D a t a , Mor e P r ob l e m s ? . . . . . . . . . . . 197 6 C om p u t at i on 200 6. 1 O v e r v i e w . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 6. 2 E x a c t N e a r e s t N e i gh b or s . . . . . . . . . . . . . . . . . . 204 6. 3 A p p r ox i m a t e N e a r e s t N e i gh b or s . . . . . . . . . . . . . . . 206 6. 4 O p e n S ou r c e S of t w a r e . . . . . . . . . . . . . . . . . . . . 219 7 A d ap t i ve N e ar e s t N e i gh b or s an d F ar A w ay N e i gh b or s 221 7. 1 A d a p t i v e N e a r e s t N e i gh b or Me t h od s . . . . . . . . . . . . 223 7. 2 F a r A w a y N e i gh b or s . . . . . . . . . . . . . . . . . . . . . 228 R e f e r e n c e s 237 Full text available at: http://dx.doi.org/10.1561/2200000064 E xp l ai n i n g t h e S u c c e s s of N e ar e s t N e i gh b or M e t h od s i n P r e d i c t i on G e or ge H. C h e n 1 an d D e v av r at S h ah 2 1 C a r n egi e M el l o n U n i ver s i t y ; geo r gech en @ cm u . ed u 2 M a s s a ch u s et t s I n s t i t u t e o f T ech n o l o gy ; d eva vr a t @ m i t . ed u AB S T R AC T M an y m od e r n m e t h od s f or p r e d i c t i on l e v e r age n e ar e s t n e i gh - b or s e ar c h t o fi n d p as t t r ai n i n g e x am p l e s m os t s i m i l ar t o a t e s t e x am p l e , an i d e a t h at d at e s b ac k i n t e x t t o at l e as t t h e 11t h c e n t u r y an d h as s t ood t h e t e s t of t i m e . T h i s m on o- gr ap h ai m s t o e x p l ai n t h e s u c c e s s of t h e s e m e t h od s , b ot h i n t h e or y , f or w h i c h w e c ov e r f ou n d at i on al n on as y m p t ot i c s t a- t i s t i c al gu ar an t e e s on n e ar e s t - n e i gh b or - b as e d r e gr e s s i on an d c l as s i fi c at i on , an d i n p r ac t i c e , f or w h i c h w e gat h e r p r om i - n e n t m e t h od s f or ap p r ox i m at e n e ar e s t n e i gh b or s e ar c h t h at h av e b e e n e s s e n t i al t o s c al i n g p r e d i c t i on s y s t e m s r e l i an t on n e ar e s t n e i gh b or an al y s i s t o h an d l e m as s i v e d at as e t s . F u r - t h e r m or e , w e d i s c u s s c on n e c t i on s t o l e ar n i n g d i s t an c e s f or u s e w i t h n e ar e s t n e i gh b or m e t h od s , i n c l u d i n g h ow r an d om d e c i s i on t r e e s an d e n s e m b l e m e t h od s l e ar n n e ar e s t n e i gh b or s t r u c t u r e , as w e l l as r e c e n t d e v e l op m e n t s i n c r ow d s ou r c i n g an d gr ap h on s . I n t e r m s of t h e or y , ou r f oc u s i s on n on as y m p t ot i c s t at i s t i c al gu ar an t e e s , w h i c h w e s t at e i n t h e f or m of h ow m an y t r ai n i n g d at a an d w h at al gor i t h m p ar am e t e r s e n s u r e t h at a n e ar e s t n e i gh b or p r e d i c t i on m e t h od ac h i e v e s a u s e r - s p e c i fi e d e r r or t ol e r an c e . W e b e gi n w i t h t h e m os t ge n e r al of s u c h r e s u l t s G e o rg e H . C he n a nd De v a v ra t Sha h ( 2 0 1 8 ) , “ E x pl a i ni ng t he Suc c e s s o f N e a re s t N e i g hbo r Me t ho ds i n P re di c t i o n” , F o unda t i o ns a nd T re nds R⃝ i n Ma c hi ne L e a rni ng : V o l . 1 0 , N o . 5 - 6 , pp 3 3 7 – 5 8 8 . DO I: 1 0 . 1 5 6 1 / 2 2 0 0 0 0 0 0 6 4 . Full text available at: http://dx.doi.org/10.1561/2200000064 2 f or n e ar e s t n e i gh b or an d r e l at e d k e r n e l r e gr e s s i on an d c l as - s i fi c at i on i n ge n e r al m e t r i c s p ac e s . I n s u c h s e t t i n gs i n w h i c h w e as s u m e v e r y l i t t l e s t r u c t u r e , w h at e n ab l e s s u c c e s s f u l p r e - d i c t i on i s s m oot h n e s s i n t h e f u n c t i on b e i n g e s t i m at e d f or r e gr e s s i on , an d a l ow p r ob ab i l i t y of l an d i n g n e ar t h e d e c i - s i on b ou n d ar y f or c l as s i fi c at i on . I n p r ac t i c e , t h e s e c on d i t i on s c ou l d b e d i ffi c u l t t o v e r i f y e m p i r i c al l y f or a r e al d at as e t . W e t h e n c ov e r r e c e n t t h e or e t i c al gu ar an t e e s on n e ar e s t n e i gh b or p r e d i c t i on i n t h e t h r e e c as e s t u d i e s of t i m e s e r i e s f or e c as t i n g, r e c om m e n d i n g p r od u c t s t o p e op l e ov e r t i m e , an d d e l i n e at - i n g h u m an or gan s i n m e d i c al i m age s b y l ook i n g at i m age p at c h e s . I n t h e s e c as e s t u d i e s , c l u s t e r i n g s t r u c t u r e , w h i c h i s e as i e r t o v e r i f y i n d at a an d m or e r e ad i l y i n t e r p r e t ab l e b y p r ac t i t i on e r s , e n ab l e s s u c c e s s f u l p r e d i c t i on . Full text available at: http://dx.doi.org/10.1561/2200000064 1 I n t r od u c t i on T h i n gs t h at ap p e ar s i m i l ar ar e l i k e l y s i m i l ar . F or e x am p l e , a b as e b al l p l ay e r ’ s f u t u r e p e r f or m an c e c an b e p r e d i c t e d b y c om p ar i n g t h e p l ay e r t o ot h e r s i m i l ar p l ay e r s ( S i l v e r , 2003) . W h e n f or e c as t i n g e l e c t i on r e s u l t s f or a U. S . s t at e , ac c ou n t i n g f or p ol l i n g t r e n d s at s i m i l ar s t at e s i m p r ov e s f or e c as t ac c u r ac y ( S i l v e r , 2008) . I n i m age e d i t i n g, w h e n r e m ov i n g an ob j e c t f r om an i m age , on e of t h e m os t s u c c e s s f u l w ay s t o fi l l i n t h e d e l e t e d p i x e l s i s b y c om p l e t i n g t h e m i s s i n g p i x e l s u s i n g i m age p at c h e s s i m i l ar t o t h e on e s n e ar t h e m i s s i n g p i x e l s ( C r i m i n i s i et a l . , 2004) . T h e s e ar e b u t a f e w e x am p l e s of h ow fi n d i n g s i m i l ar i n s t an c e s or n ea r es t n ei gh bo r s h e l p p r od u c e p r e d i c t i on s . O f c ou r s e , t h i s i d e a i s h ar d l y gr ou n d b r e ak i n g, w i t h n e ar e s t n e i gh b or c l as s i fi c at i on al r e ad y ap p e ar i n g as an e x p l an at i on f or v i s u al ob j e c t r e c ogn i t i on i n a m e d i e v al t e x t B o o k o f O p t i cs b y ac c l ai m e d s c h ol ar Al h az e n i n t h e e ar l y 11t h c e n t u r y . 1 D e s p i t e t h e i r s i m p l i c i t y an d 1 A bri e f hi s t o ry o f ne a re s t ne i g hbo r c l a s s i fic a t i o n a nd i t s a ppe a ra nc e i n A l ha z e n’ s B o o k o f O p t i c s i s g i v e n by P e l i l l o ( 2 0 1 4 ) . T he e x a c t c o m pl e t i o n da t e o f O p t i c s i s unk no w n. A l - K ha l i l i ( 2 0 1 5 ) da t e s t he w o rk t o be f ro m y e a rs 1 0 1 1 t o 1 0 2 1 , c o i nc i di ng w i t h m uc h o f A l ha z e n’ s de c a de o f i m pri s o nm e nt i n C a i ro , w hi l e Sm i t h ( 2 0 0 1 ) c l a i m s a c o m pl e t i o n t i m e be t w e e n 1 0 2 8 a nd 1 0 3 8 , c l o s e r t o A l ha z e n’ s de a t h c i rc a 1 0 4 0 . 3 Full text available at: http://dx.doi.org/10.1561/2200000064 4 I n t r od u c t i on age , n e ar e s t n e i gh b or m e t h od s r e m ai n e x t r e m e l y p op u l ar , 2 of t e n u s e d as a c r i t i c al c og i n a l ar ge r p r e d i c t i on m ac h i n e . I n f ac t , t h e m ac h i n e c an b e b i ol ogi c al , as t h e r e i s n ow e v i d e n c e t h at f r u i t fl i e s ’ n e u r al c i r c u i t s e x e c u t e ap p r ox i m at e n e ar e s t n e i gh b or i n s e n s i n g od or s as t o c om e u p w i t h an ap p r op r i at e b e h av i or al r e s p on s e ( D as gu p t a et a l . , 2017) . Al t h ou gh n e ar e s t n e i gh b or c l as s i fi c at i on d at e s b ac k a m i l l e n n i u m , an al y s i s f or w h e n an d w h y i t w or k s d i d n ot b e gi n u n t i l f ar m or e r e c e n t l y , s t ar t i n g w i t h a p ai r of u n p u b l i s h e d t e c h n i c al r e p or t s b y F i x an d Hod ge s ( 1951; 1952) on as y m p t ot i c c on v e r ge n c e p r op e r t i e s as w e l l as a s m al l d at as e t s t u d y , f ol l ow e d b y t h e l an d m ar k r e s u l t of C ov e r an d Har t ( 1967) t h at s h ow e d t h at k - n e ar e s t n e i gh b or s c l as s i fi c at i on ac h i e v e s an e r r or r at e t h at i s at m os t t w i c e t h e b e s t e r r or r at e ac h i e v ab l e . D e c ad e s l at e r , C ov e r r e c ol l e c t e d h ow h i s p ap e r w i t h Har t c am e ab ou t : E ar l y i n 1966 w h e n I fi r s t b e gan t e ac h i n g at S t an f or d , a s t u d e n t , P e t e r Har t , w al k e d i n t o m y offi c e w i t h an i n t e r e s t i n g p r ob l e m . He s ai d t h at C h ar l e s C ol e an d h e w e r e u s i n g a p at t e r n c l as s i fi c at i on s c h e m e w h i c h , f or l ac k of a b e t t e r w or d , t h e y d e s c r i b e d as t h e n e ar e s t n e i gh b or p r oc e d u r e . T h i s s c h e m e as s i gn e d t o an as y e t u n c l as s i fi e d ob s e r v at i on t h e c l as s i fi c at i on of t h e n e ar e s t n e i gh b or . W e r e t h e r e an y good t h e or e t i c al p r op e r t i e s of t h i s p r oc e d u r e ? ( C ov e r , 1982) I t w ou l d t ak e s om e t i m e f or t h e t e r m “n e ar e s t n e i gh b or ” t o e n t e r c om - m on p ar l an c e . How e v e r , t h e n e ar e s t n e i gh b or p r oc e d u r e s p r e ad q u i c k l y ac r os s ar e as i n c om p u t e r s c i e n c e . Not l on g af t e r C ov e r an d Har t ’ s 1967 p ap e r , D on al d K n u t h ’ s t h i r d v ol u m e of T h e A r t o f C o m p u t er P r o gr a m - m i n g i n t r od u c e d n e ar e s t n e i gh b or s e ar c h as t h e p o s t o ffi ce p r o bl em ( K n u t h , 1973) , p av i n g t h e b e gi n n i n gs of c om p u t at i on al ge om e t r y . I n v ar i ou s c od i n g t h e or y c on t e x t s , m a x i m u m l i kel i h o o d d eco d i n g t u r n s ou t t o m e an n e ar e s t n e i gh b or c l as s i fi c at i on ( Hi l l , 1986) . F as t f or w ar d i n g t o p r e s e n t t i m e , w i t h t h e e x p l os i on i n t h e av ai l ab i l i t y of d at a i n v i r t u al l y al l d i s c i p l i n e s , ar c h i t e c t i n g d at ab as e s y s t e m s t h at s c al e t o t h i s v ol u m e 2 N o t o nl y w a s t he k - ne a re s t ne i g hbo r m e t ho d na m e d a s o ne o f t he t o p 1 0 a l g o ri t hm s i n da t a m i ni ng ( W u e t a l . , 2 0 0 8 ) , t hre e o f t he o t he r t o p 1 0 m e t ho ds ( A da B o o s t , C 4 . 5 , a nd C A R T ) ha v e ne a re s t ne i g hbo r i nt e rpre t a t i o ns . Full text available at: http://dx.doi.org/10.1561/2200000064 1. 1. E x p l a i n i n g t h e P op u l a r i t y of N e a r e s t N e i gh b or Me t h od s 5 of d at a an d t h at c an e ffi c i e n t l y fi n d n e ar e s t n e i gh b or s h as b e c om e a f u n d am e n t al p r ob l e m ( P ap ad op ou l os an d M an ol op ou l os , 2005) . Un d e r - s t an d i n g w h e n , w h y , an d h ow w e l l n e ar e s t n e i gh b or p r e d i c t i on w or k s n ow d e m an d s ac c ou n t i n g f or c om p u t at i on al c os t s . 1. 1 E xp l ai n i n g t h e P op u l ar i t y of N e ar e s t N e i gh b or M e t h od s T h at n e ar e s t n e i gh b or m e t h od s r e m ai n p op u l ar i n p r ac t i c e l ar ge l y h as t o d o w i t h t h e i r e m p i r i c al s u c c e s s ov e r t h e y e ar s . How e v e r , t h i s e x p l a- n at i on i s p e r h ap s ov e r l y s i m p l i s t i c . W e h i gh l i gh t f ou r as p e c t s of n e ar e s t n e i gh b or m e t h od s t h at w e b e l i e v e h av e b e e n c r u c i al t o t h e i r c on t i n - u e d p op u l ar i t y . F i r s t , t h e fl e x i b i l i t y i n c h oos i n g w h at “n e ar ” m e an s i n n e ar e s t n e i gh b or p r e d i c t i on al l ow s u s t o r e ad i l y h an d l e a d - h o c d i s - t an c e s , or t o t ak e ad v an t age of e x i s t i n g r e p r e s e n t at i on an d d i s t an c e l e ar n i n g m ac h i n e r y s u c h as d e e p n e u r al n e t w or k s or d e c i s i on - t r e e - b as e d e n s e m b l e l e ar n i n g ap p r oac h e s . S e c on d , t h e c om p u t at i on al e ffi c i e n c y of n u m e r ou s ap p r ox i m at e n e ar e s t n e i gh b or s e ar c h p r oc e d u r e s e n ab l e s n e ar - e s t n e i gh b or p r e d i c t i on t o s c al e t o m as s i v e h i gh - d i m e n s i on al d at as e t s c om m on i n m od e r n ap p l i c at i on s . T h i r d , n e ar e s t n e i gh b or m e t h od s ar e n on p ar am e t r i c , m ak i n g f e w m od e l i n g as s u m p t i on s on d at a an d i n s t e ad l e t t i n g t h e d at a m or e d i r e c t l y d r i v e p r e d i c t i on s . Las t l y , n e ar e s t n e i gh b or m e t h od s ar e i n t e r p r e t ab l e : t h e y p r ov i d e e v i d e n c e f or t h e i r p r e d i c t i on s b y e x h i b i t i n g t h e n e ar e s t n e i gh b or s f ou n d . F l ex i bi l i t y i n d efi n i n g s i m i l a r i t y . S p e c i f y i n g w h at “n e ar ” m e an s f or a n e ar e s t n e i gh b or m e t h od am ou n t s t o c h oos i n g a “f e at u r e s p ac e ” i n w h i c h d at a ar e r e p r e s e n t e d ( as “f e at u r e v e c t or s ”) , an d a d i s t an c e f u n c t i on t o u s e w i t h i n t h e f e at u r e s p ac e . F or e x am p l e , a c om m on c h oi c e f or t h e f e at u r e s p ac e an d d i s t an c e f u n c t i on ar e E u c l i d e an s p ac e an d E u c l i d e an d i s t an c e , r e s p e c t i v e l y . O f c ou r s e , f ar m or e e l ab or at e c h oi c e s ar e p os s i b l e an d , i n p r ac t i c e , of t e n t h e s e ar e c h os e n i n an a d - h o c m an n e r d e p e n d i n g on t h e ap p l i c at i on . F or e x am p l e , w h e n w or k i n g w i t h t i m e s e r i e s , t h e d i s t an c e f u n c t i on c ou l d i n v ol v e a h i gh l y n on l i n e ar t i m e w ar p ( t o t r y t o al i gn t w o t i m e s e r i e s as w e l l as p os s i b l e b e f or e c om p u t i n g a s i m p l e r d i s t an c e l i k e E u c l i d e an d i s t an c e ) . I n c h oos i n g a “good ” f e at u r e s p ac e ( i . e. , a good w ay t o r ep r es en t d at a) , f e at u r e s c ou l d b e m an u al l y “h an d - e n gi n e e r e d ” d e p e n d i n g on t h e d at a m od al i t y ( e. g. , t e x t , i m age s , v i d e o, au d i o) or Full text available at: http://dx.doi.org/10.1561/2200000064 6 I n t r od u c t i on l e ar n e d , f or e x am p l e , u s i n g d e e p n e u r al n e t w or k s ( e. g. , G ood f e l l ow et a l . 2016, C h ap t e r 15) . M e an w h i l e , s e n s or f u s i on i s r e ad i l y p os s i b l e as f e at u r e s e x t r ac t e d f r om m u l t i p l e s e n s or s ( e. g. , d i ff e r e n t d at a m od al i t i e s ) c an b e c on c at e n at e d t o f or m a l ar ge f e at u r e v e c t or . S e p ar at e l y , t h e d i s t an c e f u n c t i on i t s e l f c an b e l e ar n e d , f or e x am p l e u s i n g M ah al an ob i s d i s t an c e l e ar n i n g m e t h od s ( K u l i s , 2013) or S i am e s e n e t w or k s ( B r om l e y et a l . , 1994; C h op r a et a l . , 2005) . I n f ac t , d e c i s i on t r e e s an d t h e i r u s e i n e n s e m b l e m e t h od s s u c h as r an d om f or e s t s , Ad aB oos t , an d gr ad i e n t b oos t i n g c an b e s h ow n t o b e w e i gh t e d n e ar e s t n e i gh b or m e t h od s t h at l e ar n a d i s t an c e f u n c t i on ( w e d i s c u s s t h i s r e l at i on s h i p t ow ar d t h e e n d of t h e m on ogr ap h i n S e c t i on 7. 1, b u i l d i n g on a p r e v i ou s ob s e r v at i on m ad e b y Li n an d J e on 2006) . T h u s , n e ar e s t n e i gh b or m e t h od s ac t u al l y m e s h w e l l w i t h a n u m b e r of e x i s t i n g r e p r e s e n t at i on an d d i s t an c e l e ar n i n g r e s u l t s . C o m p u t a t i o n a l effi ci en cy . P e r h ap s t h e as p e c t of n e ar e s t n e i gh b or m e t h od s t h at h as c on t r i b u t e d t h e m os t t o t h e i r p op u l ar i t y i s t h e i r c om p u t at i on al e ffi c i e n c y , w h i c h h as e n ab l e d t h e s e m e t h od s t o s c al e t o m as s i v e d at as e t s ( “b i g d at a”) . D e p e n d i n g on t h e f e at u r e s p ac e an d d i s t an c e f u n c t i on c h os e n or l e ar n e d b y t h e p r ac t i t i on e r , d i ff e r e n t f as t ap p r ox i m at e n e ar e s t n e i gh b or s e ar c h al gor i t h m s ar e av ai l ab l e . T h e s e s e ar c h al gor i t h m s , b ot h f or ge n e r al h i gh - d i m e n s i on al f e at u r e s p ac e s ( e. g. , G i on i s et a l . 1999; D at ar et a l . 2004; B aw a et a l . 2005; An d on i an d I n d y k 2008; Ai l on an d C h az e l l e 2009; M u j a an d Low e 2009; B oy t s ov an d Nai d an 2013; D as gu p t a an d S i n h a 2015; M at h y et a l . 2015; An d on i et a l . 2017) an d s p e c i al i z e d t o i m age p at c h e s ( e. g. , B ar n e s et a l . 2009; T a et a l . 2014) , c an r ap i d l y d e t e r m i n e w h i c h d at a p oi n t s ar e c l os e t o e ac h ot h e r w h i l e p ar al l e l i z i n g ac r os s s e ar c h q u e r i e s . T h e s e m e t h od s of t e n u s e l oc al i t y - s e n s i t i v e h as h i n g ( I n d y k an d M ot w an i , 1998) , w h i c h c om e s w i t h a t h e or e t i c al gu ar an t e e on ap p r ox i m at i on ac c u r ac y , or r an d om i z e d t r e e s ( e. g. , B aw a et a l . 2005; M u j a an d Low e 2009; D as gu p t a an d S i n h a 2015; M at h y et a l . 2015) , w h i c h q u i c k l y p r u n e s e ar c h s p ac e s w h e n t h e t r e e s ar e s u ffi c i e n t l y b al an c e d . T h e s e r an d om i z e d t r e e s c an e v e n b e e ffi c i e n t l y c on s t r u c t e d f or s t r e am i n g d at a u s i n g an ar b i t r ar y d i s t an c e f u n c t i on ( M at h y et a l . , 2015) . Full text available at: http://dx.doi.org/10.1561/2200000064 1. 1. E x p l a i n i n g t h e P op u l a r i t y of N e a r e s t N e i gh b or Me t h od s 7 N o n p a r a m et r i c. R ou gh l y s p e ak i n g, n e ar e s t n e i gh b or m e t h od s b e - i n g n on p ar am e t r i c m e an s t h at t h e y m ak e v e r y f e w as s u m p t i on s on t h e u n d e r l y i n g m od e l f or t h e d at a. T h i s i s a p ar t i c u l ar l y at t r ac t i v e p r op e r t y s i n c e i n a gr ow i n g n u m b e r of m od e r n ap p l i c at i on s s u c h as s oc i al n e t w or k s , r e c om m e n d at i on s y s t e m s , h e al t h c ar e d e c i s i on s u p p or t , an d on l i n e e d u c at i on , w e w i s h t o an al y z e b i g d at a t h at w e d o n ot a p r i o r i k n ow t h e s t r u c t u r e of . A n on p ar am e t r i c ap p r oac h s i d e s t e p s t h e q u e s t i on of e x p l i c i t l y p os i t i n g or l e ar n i n g t h e s t r u c t u r e u n d e r l y i n g t h e d at a. W h e n w e p os i t i n t r i c at e s t r u c t u r e f or d at a, t h e s t r u c t u r e m ay s t r ay f r om r e al i t y or ot h e r w i s e n ot ac c ou n t f or t h e f u l l p al e t t e of p os s i b i l i t i e s i n w h at t h e d at a l ook l i k e . W h e n w e l e ar n s t r u c t u r e , t h e c om p u t at i on al ov e r h e ad an d am ou n t of d at a n e e d e d m ay d w ar f w h at i s s u ffi c i e n t f or t ac k l i n g t h e p r e d i c t i on t as k at h an d . I n s t e ad of p os i t i n g or l e ar n i n g s t r u c t u r e , n on p ar am e t r i c m e t h od s l e t t h e d at a m or e d i r e c t l y d r i v e p r e d i c t i on s . How e v e r , b e i n g n on p ar am e t r i c d oe s n ’ t m e an t h at n e ar e s t n e i gh b or m e t h od s h av e n o p ar am e t e r s . W e s t i l l h av e t o c h oos e a f e at u r e s p ac e an d d i s t an c e , an d a p oor c h oi c e of t h e s e c ou l d m ak e p r e d i c t i on i m p os s i b l e . I n t er p r et a bi l i t y . Ne ar e s t n e i gh b or m e t h od s n at u r al l y p r ov i d e e v i - d e n c e f or t h e i r d e c i s i on s b y e x h i b i t i n g t h e n e ar e s t n e i gh b or s f ou n d i n t h e d at a. A p r ac t i t i on e r c an u s e t h e n e ar e s t n e i gh b or s f ou n d t o d i agn os e w h e t h e r t h e f e at u r e s p ac e an d d i s t an c e f u n c t i on c h os e n ar e ad e q u at e f or t h e ap p l i c at i on of i n t e r e s t . F or e x am p l e , i f on v al i d at i on d at a, a n e ar e s t n e i gh b or m e t h od i s m ak i n g i n c or r e c t p r e d i c t i on s , w e c an l ook at t h e n e ar e s t n e i gh b or s of e ac h v al i d at i on d at a p oi n t t o s e e w h y t h e y t e n d t o h av e i n c or r e c t l ab e l s . T h i s of t e n gi v e s c l u e s t o t h e p r ac t i t i on e r as t o h ow t o c h oos e a b e t t e r f e at u r e s p ac e or d i s t an c e f u n c t i on . Al t e r n at i v e l y , i f t h e n e ar e s t n e i gh b or m e t h od i s p r od u c i n g ac c u r at e p r e d i c t i on s , t h e n e ar e s t n e i gh b or s f ou n d t e l l u s w h i c h t r ai n i n g d at a p oi n t s ar e d r i v i n g t h e p r e d i c t i on f or an y p ar t i c u l ar v al i d at i on or t e s t p oi n t . T h i s i n t e r - p r e t ab i l i t y i s v i t al i n ap p l i c at i on s s u c h as h e al t h c ar e t h at d e m an d a h i gh b u r d e n of p r oof b e f or e l e t t i n g s of t w ar e i n fl u e n c e p ot e n t i al l y c os t l y d e c i s i on s t h at aff e c t p e op l e ’ s w e l l - b e i n g. Full text available at: http://dx.doi.org/10.1561/2200000064
2022 • 8 Pages • 624.04 KB
2022 • 4 Pages • 151.22 KB
2022 • 19 Pages • 294.74 KB
2022 • 22 Pages • 448.67 KB
2022 • 5 Pages • 229.93 KB
2022 • 8 Pages • 777.7 KB
2022 • 5 Pages • 229.2 KB
2022 • 26 Pages • 4.36 MB
2022 • 15 Pages • 2.17 MB
2022 • 4 Pages • 121.08 KB
2022 • 17 Pages • 849.14 KB
2022 • 2 Pages • 47.26 KB
2022 • 10 Pages • 269.5 KB
2022 • 67 Pages • 776.92 KB
2022 • 12 Pages • 2.95 MB
2022 • 19 Pages • 445.39 KB