Data fusion & Geo-psychographical database Pascal van Hattum University Utrecht
Differentiated marketing Target customers as individually as possible Sell the same product or service, but change the marketing mix for (a group of) customers Reaction on different marketing mix strategies Customer information can be found everywhere One single source market research questionnaire
Solution Data fusion Geo-matching
Data fusion To fuse (or combine) two data sets
Marketing Application: Dutch energy supplier data set A Customer data set data set B Motivational segmentation data set A + Enriched customer data set
BSR clusters Energy Cluster 3 Usage for own comfort Energy is uncomplicated Low price Low contact frequency Laggards Cluster 1 Cozy and warm Balance own and other Late Majority Cluster 5 Smarter Superior rules and values Acknowledgement Innovators Eearly adopters Cluster 4 Guilty towards nature Usage is well-considered Energy savings Early adopters Eearly majority Cluster 2 Self evident Usage oriented on society Late Majority Laggards
Marketing goal: Dutch energy supplier get customer information sending differentiated questionnaires improve response increase sales leads one questionnaire cluster specific questionnaires versus
Marketing Application: Dutch energy supplier data set A Customer data set data set B Motivational segmentation data set A + Enriched customer data set
Data fusion techniques Nearest neighbor Polytomeous logistic regression Fusion value specific probabilities Model based clustering approach
Fusion value specific probabilities Fusion value 1 2 3 4 5 total P(fusion value) 0.30 0.20 0.16 0.18 0.17 1.00 gender male 0.40 0.20 0.44 0.52 0.52 0.42 female 0.60 0.80 0.56 0.48 0.48 0.58 age low 0.43 0.80 0.46 0.74 0.32 0.46 high 0.57 0.20 0.54 0.26 0.68 0.54 education low 0.73 0.28 0.56 0.45 0.13 0.42 high 0.27 0.72 0.44 0.55 0.87 0.58
Fusion value specific probabilities Fusion value 1 2 3 4 5 total P(fusion value) 0.30 0.20 0.16 0.18 0.17 1.00 gender male 0.40 P(gender=male fusion 0.20 0.44 0.52 0.52 value=1)=0.40 0.42 female 0.60 0.80 0.56 0.48 0.48 0.58 age low 0.43 0.80 0.46 0.74 0.32 0.46 high 0.57 0.20 0.54 0.26 0.68 0.54 education low 0.73 0.28 0.56 0.45 0.13 0.42 high 0.27 0.72 0.44 0.55 0.87 0.58
Fusion value specific probabilities Fusion value 1 2 3 4 5 total P(fusion value) 0.30 0.20 0.16 0.18 0.17 1.00 gender male 0.40 0.20 0.44 0.52 0.52 0.42 female 0.60 0.80 0.56 0.48 0.48 0.58 age low 0.43 0.80 0.46 0.74 0.32 0.46 high 0.57 0.20 0.54 0.26 0.68 0.54 education low 0.73 0.28 0.56 0.45 0.13 0.42 high 0.27 0.72 0.44 0.55 0.87 0.58 Fusion value for male, high age and high education?
Fusion value specific probabilities Fusion value 1 2 3 4 5 total P(fusion value) 0.30 0.20 0.16 0.18 0.17 1.00 gender male 0.40 0.20 0.44 0.52 0.52 0.42 female 0.60 0.80 0.56 0.48 0.48 0.58 age low 0.43 0.80 0.46 0.74 0.32 0.46 high 0.57 0.20 0.54 0.26 0.68 0.54 education low 0.73 0.28 0.56 0.45 0.13 0.42 high 0.27 0.72 0.44 0.55 0.87 0.58 Fusion value for male, high age and high education? Fusion value=5! P(fusion value=1 profile) 0.17 P(fusion value=2 profile) 0.05 P(fusion value=3 profile) 0.16 P(fusion value=4 profile) 0.13 P(fusion value=5 profile) 0.49
Model based clustering approach latent cluster 1 latent cluster q latent cluster Q
Validation Internal validation External validation
Internal validation Maximize the number of correct classifications Avoid model overfitting Divide into train data sets en test data sets data set B 1 J J+1 1 c3 c2 c5 c4 c1 c2 c3 c4 c5 M data set B train data set B test1 data set B test2 M train M test1 1 J J+1 1 c3 c2 c5 c4 c1 1 J J+1 1 c2 c3 1 J J+1 c3 c3 c1 < < < c2 c4 J+1 J+1 c5 c3 J+1 1 c4 c2 c5 c5 misclassification misclassification misclassification misclassification M test2
Internal validation - TCCRs method data set #recs total chance nearest train 875 21.4% neighbor test1 411 30.4% 22.7% test2 409 29.6% 22.5% logistic train 875 49.8% 21.4% regression test1 411 38.7% 22.7% test2 409 41.3% 22.5% fusion train 875 45.4% 21.4% value test1 411 43.8% 22.7% specific test2 409 46.2% 22.5% model train 875 54.1% 21.4% based test1 411 39.2% 22.7% clustering test2 409 35.5% 22.5%
Marketing Application: Dutch energy supplier data set A Customer data set data set B Motivational segmentation data set A + Enriched customer data set
Internal validation Cluster Cluster 1 334,083 (29.5%) Cluster 2 204,774 (18.1%) Cluster 3 165,416 (14.6%) Cluster 4 176,319 (15.6%) Cluster 5 131,970 (11.6%) No cluster 120,843 (10.7%) Total 1,133,405 (100.0%)
Differentiated questionnaires
External validation Response improvement? From 19.9%?? Increase in sales leads? From 2.25??
External validation Response improvement? From 19.9% 25.2% Increase in sales leads? From 2.25 2.63 Cluster Response Sales leads Cluster 1 29.9% 2.69 Cluster 2 26.9% 2.26 Cluster 3 18.5% 2.75 Cluster 4 19.4% 2.22 Cluster 5 31.2% 3.14 No cluster 20,1% 2.73 Total 25.2% 2.63
Marketing applications Several marketing applications Key concept is differentiated approach
Conclusions data fusion Fusion value specific probabilities performs best Model lift Prediction socio-economical psychographics Be aware of model overfitting Response/ sales/ knowledge improvement Reduced market research costs Succesfully used in several marketing applications Van Hattum, P. & Hoijtink, H. (2009). The Proof of the Pudding is in the Eating. Data Fusion: Een Applicatie in Marketing. Jaarboek 2009 MarktOnderzoekAssociatie, 83-101.
Geo-koppeling Wat als er geen klantdata beschikbaar is en er wel behoefte is aan psychografisch klantinzicht. Koppeling via 6 ppc (+huisnr+toevoeging) mbv geo-psychografische database
Geo-psychografie: de data De Onderzoek Groep SmartAgent onderzoeken Database met gegevens over 1,4 mln huishoudens GfK Panelservices Cendris Reallife Rapportage voor 450.000 postcodes 26
Geo-psychografie: de techniek Gebaseerd op het principe Birds of feather flock together of soort zoekt soort. Gewogen gemiddelde van 6ppc, buurt, wijk, plaats, gemeente, provincie Van Hattum, P., Doffer, A., Hoijtink, H. & Bijmolt, T. (2011). Birds on a feather flock together: hoe benader je consumenten met geo-psychografie?. Jaarboek 2011 MarktOnderzoekAssociatie, 99-112.
Geo-psychografie: de mogelijkheden Mogelijkheid om (verdeling van) psychografische segmenten te koppelen aan lege klantendatabases. Maar ook
Op welke doelgroep richten we ons met de folder Een stijlvolle blauwe folder met duurdere items, of tijdelijke designer items Loper Holierhoek Geen doelgroep. Moeten we hier eigenlijk wel folderen? Hoogstad Westwijk Hoofd centrum Een rode folder vol inspiratie en de nieuwste mode trends Een gezellige gele folder met eigentijdse aanbiedingen 29
Waar vind ik mijn targetgroep? Bjorn Borg: waar wonen relatief veel consumenten uit de rode belevingswereld Het verzorgingsgebied en de mogelijkheden om buitenreclame in de interessante wijken te plaatsen
Verder onderzoek Birds of a feather-principe icm GIS technieken Geo-psychografische database op 6ppc + huisnr+ toevoeging.
Conclusie Marketing stopt niet bij het identificeren en beschrijven van marktsegmenten. Van exploratie tot implementatie Doe er wat mee! Er zijn veel mogelijkheden.