I recently had an idea for an app that I would like to start developing for personal use and develop

Mohammad Cannon

Mohammad Cannon

Answered question

2022-06-07

I recently had an idea for an app that I would like to start developing for personal use and development, that attempts to present you with recipe idea's for lunch/dinner etc and by recording your responses learns your preferences. I was thinking it would do this by recording very specific details off each recipe such as carbcount caloriecount proteincount and etc, (factors which might act as determinants for our preference). Then this program would run an OLS regression with prob of being chosen as the dependent variable (for which we will have data on as we know what our user rejected (recipe) and what he accepted and how many times). We will then have various independant variables with which we will try and create an unbiased estimator. We can then run all recipe's to be presented under this regression and rank the recipes in order of probability to be chosen, highest to lowest.
Would this be a viable thing to do? If no, why not and what could perhaps be better?

Answer & Explanation

Belen Bentley

Belen Bentley

Beginner2022-06-08Added 28 answers

You could use an OLS regression for this, or you could just use a machine learning algorithm. A decision tree will probably work just as well as any regression model (and possibly better).
If you definitely want to use a parametric regression model, there are three pretty standard models with probability as an outcome.
What you're describing is a linear probability model. Let Y be a binary dependent variable and X the vector of covariates. Then, the model has the form
P [ Y = 1 | X = x ] = x T β .
One nice thing about this type of model is that the coefficients are very easy to understand. So, for example, if x1 represents calories, then β^1 is the predicted change in Y associated with an increase of 1 calorie. It's also very easy to compute the predicted values, since it's just a product of two vectors.
The major drawback is that there's no limit on the predicted outcome. Imagine plugging in a meal with an absurd number of calories. Then, depending on whether β^1 is positive or negative, we could end up with P[Y=1|X=x]>1 or P[Y=1|X=x]<0, which shouldn't be possible for a probability.
The way to combat this issue is to use a probit or logit model. A probit model has the following form:
P [ Y = 1 | X = x ] = Φ ( x T β ) ,
where Φ ( ) is the standard normal cumulative distribution function.
A logit model has the form
P [ Y = 1 | X = x ] = 1 1 + exp x T β .
Both the probit and logit models restrict the predicted probability to the interval [0,1], but on the flip side, the coefficients are more difficult to interpret.
The difference between probit and logit is in the assumption you're making about the distribution of the residuals - with probit, you're assuming that the residuals are normally distributed, and with logit, you're assuming they have a logistic distribution. In practice, they're usually pretty similar, and you'll likely get very similar outcomes with the two.

Do you have a similar question?

Recalculate according to your conditions!

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?