Demo lessons, demo lessons, demo lessons. That is the rule. Advertising works well, customers call in to set demo lessons for their little and beloved children. This is the typical situation a manager finds herself in a local business that provides the community with English – as a foreign language – lessons for children. So far the manager engages with due courtesy with prospective customers so as to convince them to enroll their children in courses aimed at introducing English as a foreign language at an early age.
So far so good. After a careful study on proximity and locational issues related to how far customers (moms, dads and their beloved children) commute weekly to reach the school in Bologna, Italy, the manager looks at the following map and guesses that the demo lessons, set to demonstrate how effective courses are and ultimately win parents over to enroll, might have a ‘yes/no’ outcome dependent on their driving time to and from the school.
Based on data sampled on 85 parents who participated in demo lessons, arranged upon a phone call from them to gather basic information about courses and prices, the manager wants to check if there’s a meaningful relationship between driving time, child age and their decision to enroll.
From a statistical point of view the manager wants to predict whether parents who show up at the demo lesson will decide to enroll their children or not. In the following figure – inspired by the technical reading in  – the data sample has been plotted so as to show the predictors on x and y axis using blue and orange colors to respectively demonstrate the response ‘yes/no’ on whether to enroll or not enroll in the course – circle sizes show parental education level (a third likely predictor, not included in regression here) scaling from 1 to 3 whether parents have no university degree altogether, 1 of them has a university degree, or both have a university degree respectively.
The plot shows the negative relationship between the response ‘yes’ and the two predictors, driving time and age: longer commutes and older children seem to prevent parents from enrolling their beloved ones. Apart from the obvious conclusion that a superficial reader could reach based on this simple description, it’s interesting to show that using very simple data the manager can gather important information on a number of different topics ranging from marketing and strategic decisions (how to direct marketing to the local community vs. citywide advertising, whether dedicating time convincing far-away parents to enroll their children is sensible, etc.).
Ultimately, a store manager wants to measure in advance how much her efforts will be in promoting and marketing her courses. In fact, based on the probability of enrollment the black line classifies prospect customers into two categories: the ones who aren’t likely either because of their driving time to commute or their children age – the ones on the right side of the black line – and the ones who are ready to demonstrate as a matter of fact how English, as a foreign language, will be important to their children’s future – on the left side of the black line.
Commitment to early empower our little beloved ones with English language skills seems to be affected by commute time and their very same age: after all, saving mom and dad 20 minutes’ driving time can easily be offset by a bigger effort at the public school in all the years to come. Right?
 Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning - Data Mining, Inference, and Prediction - Second Edition, Springer, 2008.