Desde um diagnóstico de doença positivo ou negativo até a escolha de
todos os itens que se aplicam a uma pesquisa, os resultados são
frequentemente organizados em categorias para que as pessoas possam
entendê-los com mais facilidade. No entanto, analisar dados de respostas
categóricas requer técnicas especializadas.
O objetivo deste texto é ajudar estudantes e pesquisadores a aprender
a analisar adequadamente dados categóricos utilizando amplamente a
linguagem R. Usamos R não apenas como método de análise de dados, mas
também como ferramenta de aprendizado. Por exemplo, usamos simulação de
dados para ajudar os leitores a entender as suposições subjacentes de um
procedimento e, em seguida, avaliar o desempenho desse procedimento.
O foco deste livro está na análise de dados, e não na análise
matemática. Oferecemos vários exemplos de uma ampla gama de disciplinas
- medicina, psicologia, esportes, ecologia e outras - e fornecemos
código R extenso e saída à medida que trabalhamos com os exemplos. O uso
do cálculo é principalmente de um foco conceitual e não matemático.
6.2.4 Procedimentos adicionais de inferência exata
6.3 Análise de dados categóricos em projetos de pesquisa complexos
6.3.1 O paradigma de amostragem por inquérito
6.3.2 Visão geral das abordagens de análise
6.3.3 Contagens ponderadas de células
6.3.4 Inferência sobre proporções populacionais
6.3.5 Tabelas de contingência e modelos loglineares
6.3.6 Regressão logística
6.4 Dados “Escolha todos os que se aplicam”
6.4.1 Tabela de resposta do item
6.4.2 Teste de independência marginal
6.4.3 Modelagem de regressão
6.5 Modelos mistos e equações de estimação para dados correlacionados
6.5.1 Efeitos aleatórios
6.5.2 Modelos de efeitos mistos
6.5.3 Ajuste do modelo
6.5.4 Inferência
6.5.5 Modelo marginal usando equações de estimação generalizadas
6.6 Métodos bayesianos para dados categóricos
6.6.1 Estimando a probabilidade de sucesso
6.6.2 Modelos de regressão
6.6.3 Ferramentas computacionais alternativas
6.7 Exercícios
Referências
Agresti, A. (1996). An Introduction to Categorical Data Analysis. Wiley.
Agresti, A. (2002). Categorical Data Analysis. Wiley, 2nd edition.
Agresti, A. (2007). An Introduction to Categorical Data Analysis. Wiley,
2nd edition.
Agresti, A. and Caffo, B. (2000). Simple and effective confidence
intervals for proportions and differences of proportions result from
adding two successes and two failures. The American Statistician,
54:280–288.
Agresti, A. and Coull, B. (1998). Approximate is better than “exact” for
interval estimation of binomial proportions. The American Statistician,
52:119–126.
Agresti, A. and Liu, I. (1999). Modeling a categorical variable allowing
arbitrarily many category choices. Biometrics, 55:936–943.
Agresti, A. and Min, Y. (2001). On small-sample confidence intervals for
parameters in discrete distributions. Biometrics, 57:963–971.
Agresti, A. and Min, Y. (2005a). Frequentist performance of Bayesian
confidence intervals for comparing proportions in 22 contingency tables.
Biometrics, 61(2):515–523.
Agresti, A. and Min, Y. (2005b). Simple improved confidence intervals
for comparing matched proportions. Statistics in Medicine, 24:729–740.
Aseffa, A., Ishak, A., Stevens, R., Fergussen, E., Giles, M., Yohannes,
G., and Kidan, K. (1998). Prevalence of HIV, syphilis and genital
chlamydial infection among women in north-west Ethiopia. Epidemiology
and Infection, 120:171–177.
Becker, C., Loughin, T., and Santander, T. (2008). Identification of
forest-obligate birds by mist netting and strip counts in Andean
Ecuador. Journal of Field Ornithology, 79:229–244.
Beller, E. (2009). Bringing intergenerational social mobility research
into the twentyfirst century: Why mothers matter. American Sociological
Review, 74:507–528.
Belsley, D., Kuh, E., and Welsch, R. (1980). Regression Diagnostics:
Identifying Influential Data and Sources of Collinearity. Wiley.
Berry, K. and Mielke, P. (2003). Permutation analysis of data with
multiple binary category choices. Psychological Reports, 92:91–98.
Berry, S. and Wood, C. (2004). The cold-foot effect. Chance, 17:47–51.
Bilder, C. (2009). Human or Cylon? Group testing on ‘Battlestar
Galactica’. Chance, 22:46–50.
Bilder, C. and Loughin, T. (1998). “It’s Good!” An analysis of the
probability of success for placekicks. Chance, 11:20–24.
Bilder, C. and Loughin, T. (2002). Testing for conditional multiple
marginal independence. Biometrics, 58:200–208.
Bilder, C. and Loughin, T. (2004). Testing for marginal independence
between two categorical variables with multiple responses. Biometrics,
60:241–248.
Bilder, C. and Loughin, T. (2007). Modeling association between two or
more categorical variables that allow for multiple category choices.
Communications in Statistics: Theory and Methods, 36:433–451.
Bilder, C., Loughin, T., and Nettleton, D. (2000). Multiple marginal
independence testing for pick any/c variables. Communications in
Statistics: Simulation and Computation, 29:1285–1316.
Binder, D. (1983). On the variances of asymptotically normal estimators
from complex surveys. International Statistical Review, 51:279–292.
Binder, D. and Roberts, G. (2003). Statistical inference for survey data
analysis. In ASA Proceedings of the Joint Statistical Meetings, pages
568–572. American Statistical Association.
Binder, D. and Roberts, G. (2009). Design- and model-based inference for
model parameters. In Pfeffermann, D. and Rao, C., editors, Handbook of
Statistics 29B: Sample Surveys: Inference and Analysis, pages 33–54.
Elsevier.
Blaker, H. (2000). Confidence curves and improved exact confidence
intervals for discrete distributions. The Canadian Journal of
Statistics, 28:783–798.
Blaker, H. (2001). Corrigenda: Confidence curves and improved exact
confidence intervals for discrete distributions. The Canadian Journal of
Statistics, 29:681.
Bonett, D. and Price, R. (2006). Confidence intervals for a ratio of
binomial proportions based on paired data. Statistics in Medicine,
25:3039–3047.
Borkowf, C. (2006). Constructing binomial confidence intervals with near
nominal coverage by adding a single imaginary failure or success.
Statistics in Medicine, 25:3679–3695.
Breslow, N. and Lin, X. (1995). Bias correction in generalised linear
mixed models with a single component of dispersion. Biometrika,
82:81–91.
Bretz, F., Hothorn, T., and Westfall, P. (2011). Multiple Comparisons
Using R. Chapman & Hall/CRC.
Brown, L., Cai, T., and DasGupta, A. (2001). Interval estimation for a
binomial proportion. Statistical Science, 16:101–133.
Brown, L., Cai, T., and DasGupta, A. (2002). Confidence intervals for a
binomial proportion and asymptotic expansions. The Annals of Statistics,
30:160–201.
Brown, P., Stone, J., and Ord-Smith, C. (1983). Toxaemic signs during
pregnancy. Applied Statistics, 32:69–72.
Brownlee, K. (1955). Statistics of the 1954 polio vaccine trials.
Journal of the American Statistical Association, 50:1005–1013.
Buonaccorsi, J. (2010). Measurement Error: Models, Methods, and
Applications. Chapman & Hall/CRC.
Burnham, K. and Anderson, D. (2002). Model Selection and Multimodel
Inference: A Practical Information-Theoretic Approach. Springer, 2nd
edition.
Calcagno, V. and de Mazancourt, C. (2010). glmmulti: An R package for
easy automated model selection with (generalized) linear models. Journal
of Statistical Software, 34.
Carlin, B. and Louis, T. (2008). Bayesian Methods for Data Analysis.
Chapman & Hall/CRC.
Carroll, R., Ruppert, D., Stefanski, L., and Crainiceanu, C. (2010).
Measurement Error in Nonlinear Models: A Modern Perspective. Chapman
& Hall/CRC.
Casella, G. and Berger, R. (2002). Statistical Inference. Duxbury Press,
2nd edition.
Chambers, J. (2010). Software for Data Analysis: Programming with R.
Springer.
Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings
algorithm. The American Statistician, 49:327–335.
Clogg, C. and Eliason, S. (1987). Some common problems in log-linear
analysis. Sociological Methods & Research, 16:8–44.
Clopper, C. and Pearson, E. (1934). The use of confidence or fiducial
limits illustrated in the case of the binomial. Biometrika, 26:404–413.
Cognard, C., Gobin, Y., Pierot, L., Bailly, A., Houdart, E., Casasco,
A., Chiras, J., and Merland, J. (1995). Cerebral dural arteriovenous
fistulas: Clinical and angiographic correlation with a revised
classification of venous drainage. Radiology, 194:671–680.
Coombs, C. (1964). A theory of data. Wiley.
Cowles, M. and Carlin, B. (1996). Markov chain Monte Carlo convergence
diagnostics: A comparative review. Journal of the American Statistical
Association, 91:883–904.
Dalal, S., Fowlkes, E., and Hoadley, B. (1989). Risk analysis of the
space shuttle: Pre-Challenger prediction of failure. Journal of the
American Statistical Association, 84:945–957.
Davison, A. and Hinkley, D. (1997). Bootstrap Methods and their
Application. Cambridge University Press.
Dawson, L. (2004). The Salk polio vaccine clinical trial of 1954: Risks,
randomization and public involvement in research. Clinical Trials,
1:122–130.
,li>Deb, P. and Trivedi, P. (1997). Demand for medical care by the
elderly: A finite mixture approach. Journal of Applied Econometrics,
12:313–336.
DeHart, T., Tennen, H., Armeli, S., Todd, M., and Affleck, G. (2008).
Drinking to regulate romantic relationship interactions: The moderating
role of self-esteem. Journal of Experimental Social Psychology,
44:527–538.
Fang, L. and Loughin, T. (2012). Analyzing binomial data in a split-plot
design: Classical approach or modern techniques? Communications in
Statistics: Simulation and Computation, 42:727–740.
Firth, D. (1993). Bias reduction of maximum likelihood estimates.
Biometrika, 80:27–38.
Fox, J. (2008). Applied Regression Analysis and Generalized Linear
Models. Sage Publications, 2nd edition.
Foxman, B., Marsh, J., Gillespie, B., Rubin, N., Koopman, J., and Spear,
S. (1997). Condom use and first-time urinary tract infection.
Epidemiology, 8:637–641.
Francis, T., Korns, R., Voight, R., Boisen, M., Hemphill, F., Napier,
J., and Tolchinsky, E. (1955). An evaluation of the 1954 poliomyelitis
vaccine trials. American Journal of Public Health, 45:1–63.
Friendly, M. (1992). Graphical methods for categorical data. In SAS User
Group International Conference Proceedings, volume 17, pages 190–200.
Gange, S. (1995). Generating multivariate categorical variates using the
iterative proportional fitting algorithm. The American Statistician,
49:134–138.
Gelman, A., Carlin, J., Stern, H., and Rubin, D. (2004). Bayesian Data
Analysis. Chapman & Hall/CRC.
Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. (2008). A weakly
informative default prior distribution for logistic and other regression
models. The Annals of Applied Statistics, pages 1360–1383.
Gentle, J. (2009). Computational Statistics. Springer.
,li>Grechanovsky, E. (1987). Stepwise regression procedures:
Overview, problems, results, and suggestions. Annals of the New York
Academy of Sciences, 491:197–232.
Greven, S. and Kneib, T. (2010). On the behaviour of marginal and
conditional AIC in linear mixed models. Biometrika, 97:773–789.
Gustafson, P. (2004). Measurement Error and Misclassificaion in
Statistics and Epidemiology: Impacts and Bayesian Adjustments. Chapman
& Hall/CRC.
Halekoh, U., Højsgaard, S., and Yan, J. (2006). The R package geepack
for generalized estimating equations. Journal of Statistical Software,
15.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of
Statistical Learning. Springer, 2nd edition.
Heeringa, S., West, B., and Berglund, P. (2010). Applied Survey Data
Analysis. Chapman & Hall/CRC.
Heinze, G. (2006). A comparative investigation of methods for logistic
regression with separated or nearly separated data. Statistics in
Medicine, 25:4216–4226.
Heinze, G. and Schemper, M. (2002). A solution to the problem of
separation in logistic regression. Statistics in Medicine, 21:2409–2419.
Henderson, M. and Meyer, M. (2001). Exploring the confidence interval
for a binomial parameter in a first course in statistical computing. The
American Statistician, 55:337–344.
Hirji, K., Mehta, C., and Patel, N. (1987). Computing distributions for
exact logistic regression. Journal of the American Statistical
Association, 82:1110–1117.
Hoeting, J., Madigan, D., Raftery, A., and Volinsky, C. (1999). Bayesian
model averaging: A tutorial. Statistical Science, 14:382–417.
Hosmer, D., Hosmer, T., le Cessie, S., and Lemeshow, S. (1997). A
comparison of goodness-of-fit tests for the logistic regression model.
Statistics in Medicine, 16:965–980.
Hosmer, D. and Lemeshow, S. (1980). Goodness-of-fit tests for the
multiple logistic regression model. Communications in Statistics: Theory
and Methods, 9:1043–1069.
Hosmer, D. and Lemeshow, S. (2000). Applied Logistic Regression. Wiley,
2nd edition.
Hubert, J. (1992). Bioassay. Kendall Hunt Publishing Company, 3rd
edition.
Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and
graphics. Journal of Computational and Graphical Statistics, 3:299–314.
Imrey, P., Koch, G., Stokes, M., in collaboration with Darroch, J.,
Freeman, D., and Tolley, H. (1982). Some reflections on the log-linear
model and logistic regression. Part II: Data analysis. International
Statistical Review, 50:35–63.
Korn, E. and Graubard, B. (1999). Analysis of Health Surveys. Wiley.
Kott, P. and Carr, D. (1997). Developing an estimation strategy for a
pesticide data program. Journal of Official Statistics, 13:367–383.
Küchenhoff, H., Mwalili, S., and Lesaffre, E. (2006). A general method
for dealing with misclassification in regression: The misclassification
SIMEX. Biometrics, 62:85–96.
Kuonen, D. (1999). Saddlepoint approximations for distributions of
quadratic forms in normal variables. Biometrika, 86:929–935.
Kupper, L. and Haseman, J. (1978). The use of a correlated binomial
model for the analysis of certain toxicological experiments. Biometrics,
34:69–76.
Kutner, M., Nachtsheim, C., and Neter, J. (2004). Applied Linear
Regression Models. McGraw-Hill/Irwin, 4th edition.
Lambert, D. (1992). Zero-inflated poisson regression, with an
application to defects in manufacturing. Technometrics, 34:1–14.
Larntz, K. (1978). Small-sample comparisons of exact levels for
chi-squared goodnessof-fit statistics. Journal of the American
Statistical Association, 73:253–263.
Lederer, W. and Küchenhoff, H. (2006). A short introduction to the SIMEX
and MCSIMEX. R News, 6:26–31.
Lee, K. and Koval, J. (1997). Determination of the best significance
level in forward logistic regression. Communications in Statistics:
Simulation and Computation, 26:559–575.
Lesaffre, E. and Albert, A. (1989). Multi-group logistic regression
diagnostics. Journal of the Royal Statistical Society, Series C,
38:425–440.
Liang, K. and Zeger, S. (1986). Longitudinal data analysis using
generalized linear models. Biometrika, 73:13–22.
Littell, R., Milliken, G., Stroup, W., Wolfinger, R., and Schabenberger,
O. (2006). SAS for Mixed Models. SAS Institute, 2nd edition.
Liu, P., Shi, Z., Zhang, Y., Xu, Z., Shu, H., and Zhang, X. (1997). A
prospective study of a serum-pooling strategy in screening blood donors
for antibody to hepatitis C virus. Transfusion, 37:732–736.
Lohr, S. (2010). Sampling: Design and Analysis. Cengage Learning, 2nd
edition.
Long, J. (1990). The origins of sex differences in science. Social
Forces, 68:1297–1316.
Loughin, T. (2004). A systematic comparison of methods for combining
p-values from independent tests. Computational Statistics and Data
Analysis, 47:467–485.
Loughin, T. and Bilder, C. (2010). On the use of a log-rate model for
survey-weighted categorical data. Communications in Statistics: Theory
and Methods, 40:2661–2669.
Loughin, T., Roediger, M., Milliken, G., and Schmidt, J. (2007). On the
analysis of long-term experiments. Journal of the Royal Statistical
Society, Series A, 170:29–42.
Loughin, T. and Scherer, P. (1998). Testing for association in
contingency tables with multiple column responses. Biometrics,
54:630–637.
Lui, K. and Lin, C. (2003). A revisit on comparing the asymptotic
interval estimators of odds ratio in a single 2x2 table. Biometrical
Journal, 45:226–237.
Lumley, T. (2011). Complex Surveys: A Guide to Analysis using R. Wiley.
Margolin, B., Kaplan, N., and Zeiger, E. (1981). Statistical analysis of
the Ames Salmonella/microsome test. Proceedings of the National Academy
of Sciences USA, 78:3779–3783.
Martin, A. and Quinn, K. (2006). Applied Bayesian inference in R using
MCMCpack. R News, 6:2–7.
Martin, A., Quinn, K., and Park, J. (2011). MCMCpack: Markov chain Monte
Carlo in R. Journal of Statistical Software, 42.
Martin, C., Herrman, T., Loughin, T., and Oentong, S. (1998).
Micropycnometer measurement of single-kernel density of healthy,
sprouted, and scab-damaged wheats. Cereal Chemistry, 75:177–180.
Maugh, T. (2009). Results of AIDS vaccine trial ‘weak’ in second
analysis. Los Angeles Times.
McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman
& Hall/CRC, 2nd edition.
McLean, R., Sanders, W., and Stroup, W. (1991). A unified approach to
mixed linear models. The American Statistician, 45:54–64.
McNemar, Q. (1947). Note on the sampling error of the difference between
correlated proportions or percentages. Pyschometrika, 12:153–157.
Mebane, W. and Sekhon, J. (2004). Robust estimation and outlier
detection for overdispersed multinomial models of count data. American
Journal of Political Science, 48:392–411.
Mehta, C. and Patel, N. (1995). Exact logistic regression: Theory and
examples. Statistics in Medicine, 14:2143–2160.
Mehta, C., Patel, N., and Senchaudhuri, P. (2000). Efficient Monte Carlo
methods for conditional logistic regression. Journal of the American
Statistical Association, 95:99–108.
Meinshausen, N. (2007). Relaxed lasso. Computational Statistics and Data
Analysis, 52:374–393.
Michalewicz, Z. (1996). Genetic Algorithms + Data Structures = Evolution
Programs. Springer, 3rd edition.
Miller, A. (1984). Selection of subsets of regression variables (with
discussion). Journal of the Royal Statistical Society, Series A,
147:389–425.
Milliken, G. and Johnson, D. (2001). Analysis of Messy Data, Volume III:
Analysis of Covariance. Chapman & Hall/CRC.
Milliken, G. and Johnson, D. (2004). Analysis of Messy Data Volume I:
Designed Experiments. Chapman & Hall/CRC, 2nd edition.
Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal
Data. Springer.
Moore, D. and Notz, W. (2009). Statistics: Concepts and Controversies.
W.H. Freeman & Company, 7th edition.
Mullahy, J. (1986). Specification and testing of some modified count
data models. Journal of Econometrics, 33:341–365.
Newcombe, R. (1998). Improved confidence intervals for the difference
between binomial proportions based on paired data. Statistics in
Medicine, 17:2635–2650.
Newcombe, R. (2001). Logit confidence intervals and the inverse sinh
transformation. The American Statistician, 55:200–202.
Osius, G. and Rojek, D. (1992). Normal goodness-of-fit tests for
multinomial models with large degrees of freedom. Journal of the
American Statistical Association, 87:1145–1152.
Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA:
Convergence diagnosis and output analysis for MCMC. R news, 6:7–11.
Potter, D. (2005). A permutation test for inference in logistic
regression with smalland moderate-sized data sets. Statistics in
Medicine, 24:693–708.
Pregibon, D. (1981). Logistic regression diagnostics. The Annals of
Statistics, 9:705–724.
Raftery, A. (1995). Bayesian model selection in social research.
Sociological Methodology, 25:111–163.
Rao, J. and Scott, A. (1981). The analysis of categorical data from
complex sample surveys: Chi-squared tests for goodness of fit and
independence in two-way tables. Journal of the American Statistical
Association, 76:221–230.
Rao, J. and Scott, A. (1984). On chi-squared tests for multiway
contingency tables with cell proportions estimated from survey data. The
Annals of Statistics, 12:46–60.
Rao, J. and Thomas, D. (2003). Analysis of categorical response data
from complex surveys: An appraisal and update. In Chambers, R. and
Skinner, C., editors, Analysis of survey data, pages 85–108. Wiley.
Raudenbush, S. and Bryk, A. (2002). Hierarchical linear models:
Applications and data analysis methods. Sage Publications, 2nd edition.
Rerks-Ngarm, S., Pitisuttithum, P., Nitayaphan, S., Kaewkungwal, J.,
Chiu, J., Paris, R., Premsri, N., Namwat, C., de Souza, M., Adams, E.,
Benenson, M., Gurunathan, S., Tartaglia, J., McNeil, J., Francis, D.,
Stablein, D., Birx, D., Chunsuttiwat, S.,Khamboonruang, C.,
Thongcharoen, P., Robb, M., Michael, N., Kunasol, P., and Kim, J.
(2009). Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in
Thailand. New England Journal of Medicine, 361:2209–2220.
Richardson, M. and Haller, S. (2002). What is the probability of a kiss?
(It’s not what you think). Journal of Statistics Education, 10:9–9.
Richert, B., Tokach, M., Goodband, R., and Nelssen, J. (1995). Assessing
producer awareness of the impact of swine production on the environment.
Journal of Extension, 33.
Riemer, S., Wright, B., and Brown, R. (2011). Food habits of Steller sea
lions (Eumetopias jubatus) off Oregon and northern California,
1986–2007. Fishery Bulletin, 109:369–381.
Robert, C. (2001). The Bayesian Choice: From Decision-Theoretic
Foundations to Computational Implementation. Springer.
Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods.
Springer.
Robert, C. and Casella, G. (2010). Introducing Monte Carlo Methods with
R. Springer.
Rogan, W. and Gladen, B. (1978). Estimating the prevalence from the
results of a screening test. American Journal of Epidemiology,
107:71–76.
Root, R. (1967). The niche exploitation pattern of the blue-gray
gnatcatcher. Ecological Monographs, 37:317–350.
Rours, G., Verkooyen, R., Willemse, H., van der Zwaan, E., van Belkum,
A., de Groot, R., Verbrugh, H., and Ossewaarde, J. (2005). Use of pooled
urine samples and automated DNA isolation to achieve improved
sensitivity and cost-effectiveness of largescale testing for Chlamydia
trachomatis in pregnant women. Journal of Clinical Microbiology,
43:4684–4690.
Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian
inference for latent Gaussian models by using integrated nested Laplace
approximations. Journal of the Royal Statistical Society, Series B,
71:319–392.
Rust, K. and Rao, J. (1996). Variance estimation for complex surveys
using replication techniques. Statistical Methods in Medical Research,
5:293–310.
Salsburg, D. (2001). The Lady Tasting Tea: How Statistics Revolutionized
Science in the Twentieth Century. Henry Holt and Company, LLC.
Satterthwaite, F. (1946). An approximate distribution of estimates of
variance components. Biometrics Bulletin, 2:110–114.
Schonnop, R., Yang, Y., Feldman, F., Robinson, E., Loughin, M., and
Robinovitch, S. (2013). Prevalence of and factors associated with head
impact during falls in older adults in long-term care. Canadian Medical
Association Journal, 185:E803–E810.
Schwartz, C. and Mare, R. (2005). Trends in educational assortative
marriage from 1940 to 2003. Demography, 42:621–646.
Scott, A. (2007). Rao-Scott corrections and their impact. In Proceedings
of the Section on Survey Research Methods, pages 3514–3518. American
Statistical Association.
Scott, A. and Rao, J. (1981). Chi-squared tests for contingency tables
with proportions estimated from survey data. In Krewski, J., Platek, R.,
and Rao, J., editors, Current Topics in Survey Sampling, pages 247–266.
Academic Press.
Seeber, G. (2005). Poisson Regression. In Armitage, P. and Colton, T.,
editors, Encyclopedia of Biostatistics, online. Wiley.
Severini, T. (2000). Likelihood Methods in Statistics. Oxford University
Press.
Shtatland, E., Kleinman, K., and Cain, E. (2003). Stepwise methods using
SAS Proc Logistic and SAS Enterprise Miner for Prediction. In SAS Users
Group International, volume 28, paper 258. SAS Institute.
Skinner, C. and Vallet, L. (2010). Fitting log-linear models to
contingency tables from surveys with complex sampling designs: An
investigation of the Clogg-Eliason approach. Sociological Methods &
Research, 39:83–108.
Snee, R. (1974). Graphical display of two-way contingency tables. The
American Statistician, 28:9–12.
Stroup, W. (2013). Generalized Linear Mixed Models: Modern Concepts,
Methods, and Applications. CRC Press.
Stukel, T. (1988). Generalized logistic models. Journal of the American
Statistical Association, 83:426–431.
Suess, E., Sultana, D., and Gongwer, G. (2006). How much confidence
should you have in binomial confidence intervals? Stats: The Magazine
for Students of Statistics, 45:3–7.
Swift, M. (2009). Comparison of confidence intervals for a Poisson
mean—further considerations. Communications in Statistics: Theory and
Methods, 238:748–759.
Tango, T. (1998). Equivalence test and confidence interval for the
difference in proportions for the paired-sample design. Statistics in
Medicine, 17:891–908.
Tanner, M. (1996). Tools for Statistical Inference: Methods for the
Exploration of Posterior Distributions and Likelihood Functions.
Springer, 3rd edition.
Tauber, M., Tauber, C., and Nechols, J. (1996). Life history of
Galerucella nymphaeae and implications of reproductive diapause for
rearing univoltine chrysomelids. Physiological Entomology, 21:317–324.
Thomas, D. and Decady, Y. (2004). Testing for association using multiple
response survey data: Approximate procedures based on the Rao-Scott
approach. International Journal of Testing, 4:43–59.
Thomas, D. and Rao, J. (1987). Small-sample comparisons of level and
power for simple goodness-of-fit statistics under cluster sampling.
Journal of the American Statistical Association, 82:630–636.
Thomas, D., Singh, A., and Roberts, G. (1996). Tests of independence on
two-way tables under cluster sampling: An evaluation. International
Statistical Review, pages 295–311.
Thompson, S. (2002). Sampling. Wiley.
Thorburn, D., Dundas, D., McCruden, E., Cameron, S., Goldberg, D.,
Symington, I., Kirk, A., and Mills, P. (2001). A study of hepatitis C
prevalence in healthcare workers in the west of Scotland. Gut,
48:116–120.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO.
Journal of the Royal Statistical Society, Series B, 58:267–288.
Turner, D., Ralphs, M., and Evans, J. (1992). Logistic analysis for
monitoring and assessing herbicide efficacy. Weed Technology, 6:424–430.
Vance, A. (2009). Data analysts captivated by R’s power. The New York
Times.
Vansteelandt, S., Goetghebeur, E., and Verstraeten, T. (2000).
Regression models for disease prevalence with diagnostic tests on pool
of serum samples. Biometrics, 56:1126–1133.
Ver Hoef, J. and Boveng, P. (2007). Quasi-Poisson vs. negative binomial
regression: How should we model overdispersed count data? Ecology,
11:2766–2772.
Vermunt, J. and Magidson, J. (2007). Latent class analysis with sampling
weights: A maximum-likelihood approach. Sociological Methods &
Research, 36:87–111.
Verstraeten, T., Farah, B., Duchateau, L., and Matu, R. (1998). Pooling
sera to reduce the cost of HIV surveillance: A feasibility study in a
rural Kenyan district. Tropical Medicine and International Health,
3:747–750.
Vos, P. and Hudson, S. (2005). Evaluation criteria for discrete
confidence intervals: Beyond coverage and length. The American
Statistician, 59:137–142.
Wald, A. (1943). Tests of statistical hypotheses concerning several
parameters when the number of observations is large. Transactions of the
American Mathematical Society, 54:426–482.
Wardrop, R. (1995). Simpson’s paradox and the hot hand in basketball.
The American Statistician, 49:24–28.
Wedderburn, R. (1974). Quasi-likelihood, generalized linear models, and
the Gauss-Newton method. Biometrika, 61:439–447.
Westfall, P. and Young, S. (1993). Resampling-Based Multiple Testing:
Examples and Methods for P-Value Adjustment. Wiley.
Wilkins, T., Malcolm, J., Raina, D., and Schade, R. (2010). Hepatitis C:
Diagnosis and treatment. American Family Physician, 81:1351–1357.
Williams, D. (1975). The analysis of binary responses from toxicological
experiments involving reproduction and teratogenicity. Biometrics,
31:949–952.
Wilson, E. (1927). Probable inference, the law of succession, and
statistical inference. Journal of the American Statistical Association,
22:209–212.
Wright, B. (2009). Use of chi-square tests to analyze scat-derived diet
composition data. Marine Mammal Science, 26:395–401.
Yee, T. (2010). The VGAM package for categorical data analysis. Journal
of Statistical Software, 32.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in
regression with grouped variables. Journal of the Royal Statistical
Society, Series B, 68:49–67.
Zamar, D., McNeney, B., and Graham, J. (2007). elrm: Software
implementing exact like inference for logistic regression models.
Journal of Statistical Software, 21(3):1–18.
Zeger, S. and Liang, K. (1986). Longitudinal data analysis for discrete
and continuous outcomes. Biometrics, pages 121–130.
Zeileis, A., Kleiber, C., and Jackman, S. (2008). Regression models for
count data in R. Journal of Statistical Software, 27.
Zhou, X. and Qin, G. (2005). A new confidence interval for the
difference between two binomial proportions of paired data. Journal of
Statistical Planning and Inference, 128:527–542.
Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of
the American Statistical Association, 101:1418–1429.