How Is Logistic Regression Different From OLS Regression?
Besides the difference in the type of outcome variable being modeled (OLS predicts a continuous outcome variable whereas logistic regression predicts a dichotomous variable), there are several differences between OLS, or linear, regression and logistic regression.
First, one of the assumptions of OLS regression is that the error variances (residuals) are normally distributed. This is not the case with logistic regression. Second, with logistic regression, there is no standardized solution printed while with OLS, there is. And the unstandardized solution of logistic regression does not have the same straightforward interpretation as it does with OLS regression. One more difference is that in logistic regression, there is no r-square to gauge the variance accounted for in the overall model. Instead, a chi-square test is used to indicate how well the logistic regression model fits the data.
Goal of Logistic Regression
The goal of logistic regression is different than OLS because the dependent variable is not continuous. In logistic regression, we are predicting the likelihood that Y is equal to 1 (rather than 0) given certain values of X. That is, if X and Y have a positive linear relationship, the probability that a person will have a score of Y=1 will increase as values of X increase. So, instead of predicting the scores of the dependent variable as we do with OLS regression, we are instead predicting probabilities that an event will occur.
Interpreting the Coefficients
The coefficients in a logistic regression equation are more difficult to interpret. The saying in OLS that b represents “the change in Y with one unit change in X” is no longer applicable. Instead, we have to translate the coefficient using the exponent function. When we do that, we have a number that is pretty useful, called the odds ratio.
The odds ratio is equal to exp(B), or sometimes written eb. For example, if your results printout indicates the regression slope is 0.75, the odds ratio is 2.12 (because exp(.75)=2.12). This means that the probability that Y is equal to 1 is twice as likely (or 2.12 times) as the value of X is increased by one unit.
Using a specific example, let’s say we are looking at how gender predicted respondents’ answers to a yes/no survey question (with yes=1 and no=0). If the odds ratio for male is 2.12, this means that men are more than twice as likely to answer “yes” to the survey question compared to females.
Model Fit
Instead of using r-square as the statistic for overall model fit in logistic regression, we use a chi-square test to get the deviance. Chi-square is a test that measures the fit of the observed values to the expected values. The bigger the difference, or deviance, of the observed from the expected values, the poorer the fit of the model. Therefore we want a small deviance. As we add more variables to the equation, the deviance should get smaller, indicating an improvement of fit.
References
Newsom. (2010). Logistic Regression. http://www.upa.pdx.edu/IOA/newsom/da2/ho_logistic.pdf
