中午温慧博过来问我Logit回归的问题,提到关于因变量(分类变量)取多个值的情况,我一时还真没想起来这种情况用什么方法处理。晚上趁着等电话的时间看了看SPSS,把Logit模型的有关帮助贴在这儿(对于中午的问题没有什么帮助,只是粗略介绍Logit对数线性模型而已)。
The Logit Loglinear Analysis procedure analyzes the relationship between dependent (or response) variables and independent (or explanatory) variables. The dependent variables are always categorical, while the independent variables can be categorical (factors). Other independent variables, cell covariates, can be continuous, but they are not applied on a case-by-case basis. The weighted covariate mean for a cell is applied to that cell. The logarithm of the odds of the dependent variables is expressed as a linear combination of parameters. A multinomial distribution is automatically assumed; these models are sometimes called multinomial logit models. This procedure estimates parameters of logit loglinear models using the Newton-Raphson algorithm.
You can select from 1 to 10 dependent and factor variables combined. A cell structure variable allows you to define structural zeros for incomplete tables, include an offset term in the model, fit a log-rate model, or implement the method of adjustment of marginal tables. Contrast variables allow computation of generalized log-odds ratios (GLOR). The values of the contrast variable are the coefficients for the linear combination of the logs of the expected cell counts.
SPSS automatically displays model information and goodness-of-fit statistics. You can also display a variety of statistics and plots or save residuals and predicted values in the working data file.
Example. A study in Florida included 219 alligators. How does the alligators’ food type vary with their size and the four lakes in which they live? The study found that the odds of a smaller alligator preferring reptiles to fish is 0.70 times lower than for larger alligators; also, the odds of selecting primarily reptiles instead of fish were highest in lake 3.
Statistics. Observed and expected frequencies; raw, adjusted, and deviance residuals; design matrix; parameter estimates; generalized log-odds ratio; Wald statistic; and confidence intervals. Plots: adjusted residuals, deviance residuals, and normal probability plots.
多说几句:红色标记的前三个词是提醒注意统计专业术语的表达的(不要只认识因变量而不认识response variable,不要只认识概率而不认识odds);最后一个红色标记的是强调统计计算的地位(这个模型需要用Newton-Raphson算法求解);也可以看看老外们的例子,便会觉得好玩儿,他们研究的是佛罗里达州的鳄鱼吃虫子还是吃鱼与鳄鱼的体型大小和所生长的湖的关系。在中国估计不会有人研究这个吧,毕竟填饱肚子要紧(还是能看出经济文化差异的)。
赞赏
作为一名没有固定工作的自由职业者,我非常感谢您通过捐赠的方式来支持我的写作和开源软件开发。当然,捐赠纯属自愿。无论金额多少,都是一片诚挚的心意。支付方式如下:
| 微信 | ← 奋力支开它俩 → | 支付宝 |
|---|---|---|
![]() |
其它爱心通道 ↓ Venmo: @yihui_xie Zelle: xie@yihui.name PayPal: xie@yihui.name |
![]() |
若使用 Venmo/Zelle/Paypal,请添加备注“gift”或“donation”,以免捐赠被视为我的可税收入。若使用 Paypal,支付类型请选 Family and Friends,而不要选 Goods and Services。
在不影响生活的前提下,我会将收到的捐赠以尽量大的比例回馈给开源社区和慈善机构。作为参考,2024-25 年间我共收到约三万美元捐赠,完税后我转手捐出了一万五千美元。

