Análise de Dados Categóricos

Desde um diagnóstico de doença positivo ou negativo até a escolha de todos os itens que se aplicam a uma pesquisa, os resultados são frequentemente organizados em categorias para que as pessoas possam entendê-los com mais facilidade. No entanto, analisar dados de respostas categóricas requer técnicas especializadas.

O objetivo deste texto é ajudar estudantes e pesquisadores a aprender a analisar adequadamente dados categóricos utilizando amplamente a linguagem R. Usamos R não apenas como método de análise de dados, mas também como ferramenta de aprendizado. Por exemplo, usamos simulação de dados para ajudar os leitores a entender as suposições subjacentes de um procedimento e, em seguida, avaliar o desempenho desse procedimento.

O foco deste livro está na análise de dados, e não na análise matemática. Oferecemos vários exemplos de uma ampla gama de disciplinas - medicina, psicologia, esportes, ecologia e outras - e fornecemos código R extenso e saída à medida que trabalhamos com os exemplos. O uso do cálculo é principalmente de um foco conceitual e não matemático.

Conteúdo

Capítulo 1- Dados Categóricos: Introdução
1.1 Analisando uma resposta binária
1.1.1 As distribuições Bernoulli e binomial de probabilidade
1.1.2 Inferência para a probabilidade de sucesso
1.1.3 Níveis de confiança verdadeiros para intervalos de confiança
1.2 Duas variáveis binárias
1.2.1 Notação e modelo
1.2.2 Intervalos de confiança para a diferença de duas probabilidades
1.2.3 Teste para a diferença de duas probabilidades
1.2.4 Riscos relativos
1.2.5 Odds ratios ou Razões de chaces
1.2.6 Dados de pares combinados
1.2.7 Tabelas de contingência maiores
1.3 Exercícios

Capítulo 2- Regressão com resposta binária
2.1 Modelos de regressão linear
2.2 Modelos de regressão logística
2.2.1 Estimação de parâmetros
2.2.2 Testes de hipóteses para parâmetros de regressão
2.2.3 Razões de chances (odds ratios)
2.2.4 Probabilidade de sucesso
2.2.5 Interações e transformações para variáveis explicativas
2.2.6 Variáveis explicativas categóricas
2.2.7 Convergência dos estimadores dos parâmetros
2.2.8 Simulações Monte carlo
2.3 Modelos lineares generalizados
2.3.1 Modelos lineares generalizados com ligação paramétrica (em elaboração)
2.3.2 Modelos lineares generalizados com ligação não paramétrica (em elaboração)
2.3.3 \(R^2\) para modelos lineares generalizados (em elaboração)
2.4 Exercícios

Capítulo 3- Analisando uma resposta multicategórica
3.1 Distribuição de probabilidade multinomial
3.2 Tabelas de contingência \(I\times J\) e procedimentos de inferência
3.2.1 Uma distribuição multinomial
3.2.2 \(I\) distribuições multinomiais
3.2.3 Teste de independência
3.3 Modelos de regressão de resposta nominal
3.3.1 Odds ratio
3.3.2 Tabelas de contingência
3.4 Modelos de regressão de resposta ordinal
3.4.1 Odds ratio
3.4.2 Tabelas de contingência
3.4.3 Modelo de probabilidades não proporcionais
3.5 Modelos de regressão adicionais
3.6 Exercícios

Capítulo 4- Analisando uma resposta de contagem
4.1 Modelo de Poisson para dados de contagem
4.1.1 Distribuição de Poisson
4.1.2 Verossimilhança e inferência de Poisson
4.2 Modelos de regressão de Poisson para respostas de contagem
4.2.1 Modelo para média: Ligação logarítmica
4.2.2 Estimação e inferência de parâmetros
4.2.3 Variáveis explicativas categóricas
4.2.4 Regressão de Poisson para tabelas de contingência: modelos loglineares
4.2.5 Grandes modelos loglineares
4.2.6 Variáveis categóricas ordinais
4.3 Regressão da taxa de Poisson
4.4 Inflação de zeros
4.5 Exercícios

Capítulo 5- Seleção e validação do modelo
5.1 Seleção de variáveis
5.1.1 Visão geral da seleção de variáveis
5.1.2 Critérios de comparação de modelos
5.1.3 Regressão de todos os subconjuntos
5.1.4 Seleção de variável passo a passo
5.1.5 Métodos modernos de seleção de variáveis
5.1.6 Média do modelo
5.2 Ferramentas para avaliar o ajuste do modelo
5.2.1 Resíduos
5.2.2 Adequação do ajuste
5.2.3 Influência
5.2.4 Diagnósticos para modelos de resposta multicategoria
5.3 Superdispersão
5.3.1 Causas e implicações
5.3.2 Detecção
5.3.3 Soluções
5.4 Exemplos
5.4.1 Conjunto de dados de regressão logística - placekicking
5.4.2 Regressão de Poisson - conjunto de dados de consumo de álcool
5.5 Exercícios

Capítulo 6- Tópicos adicionais (em elaboração)
6.1 Respostas binárias e erro de teste
6.1.1 Estimando a probabilidade de sucesso
6.1.2 Modelos de regressão binária
6.1.3 Outros métodos
6.2 Inferência exata
6.2.1 Teste exato de Fisher para independência
6.2.2 Teste de permutação para independência
6.2.3 Regressão logística exata
6.2.4 Procedimentos adicionais de inferência exata
6.3 Análise de dados categóricos em projetos de pesquisa complexos
6.3.1 O paradigma de amostragem por inquérito
6.3.2 Visão geral das abordagens de análise
6.3.3 Contagens ponderadas de células
6.3.4 Inferência sobre proporções populacionais
6.3.5 Tabelas de contingência e modelos loglineares
6.3.6 Regressão logística
6.4 Dados “Escolha todos os que se aplicam”
6.4.1 Tabela de resposta do item
6.4.2 Teste de independência marginal
6.4.3 Modelagem de regressão
6.5 Modelos mistos e equações de estimação para dados correlacionados
6.5.1 Efeitos aleatórios
6.5.2 Modelos de efeitos mistos
6.5.3 Ajuste do modelo
6.5.4 Inferência
6.5.5 Modelo marginal usando equações de estimação generalizadas
6.6 Métodos bayesianos para dados categóricos
6.6.1 Estimando a probabilidade de sucesso
6.6.2 Modelos de regressão
6.6.3 Ferramentas computacionais alternativas
6.7 Exercícios

Referências

Agresti, A. (1996). An Introduction to Categorical Data Analysis. Wiley.
Agresti, A. (2002). Categorical Data Analysis. Wiley, 2nd edition.
Agresti, A. (2007). An Introduction to Categorical Data Analysis. Wiley, 2nd edition.
Agresti, A. and Caffo, B. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. The American Statistician, 54:280–288.
Agresti, A. and Coull, B. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52:119–126.
Agresti, A. and Liu, I. (1999). Modeling a categorical variable allowing arbitrarily many category choices. Biometrics, 55:936–943.
Agresti, A. and Min, Y. (2001). On small-sample confidence intervals for parameters in discrete distributions. Biometrics, 57:963–971.
Agresti, A. and Min, Y. (2005a). Frequentist performance of Bayesian confidence intervals for comparing proportions in 22 contingency tables. Biometrics, 61(2):515–523.
Agresti, A. and Min, Y. (2005b). Simple improved confidence intervals for comparing matched proportions. Statistics in Medicine, 24:729–740.
Aseffa, A., Ishak, A., Stevens, R., Fergussen, E., Giles, M., Yohannes, G., and Kidan, K. (1998). Prevalence of HIV, syphilis and genital chlamydial infection among women in north-west Ethiopia. Epidemiology and Infection, 120:171–177.
Bates, D. (2010). lme4: Mixed-Effects Modeling with R. Self-published,
https://r-forge.r-project.org/projects/lme4/. r-project.org/.
Becker, C., Loughin, T., and Santander, T. (2008). Identification of forest-obligate birds by mist netting and strip counts in Andean Ecuador. Journal of Field Ornithology, 79:229–244.
Beller, E. (2009). Bringing intergenerational social mobility research into the twentyfirst century: Why mothers matter. American Sociological Review, 74:507–528.
Belsley, D., Kuh, E., and Welsch, R. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley.
Berry, K. and Mielke, P. (2003). Permutation analysis of data with multiple binary category choices. Psychological Reports, 92:91–98.
Berry, S. and Wood, C. (2004). The cold-foot effect. Chance, 17:47–51.
Bilder, C. (2009). Human or Cylon? Group testing on ‘Battlestar Galactica’. Chance, 22:46–50.
Bilder, C. and Loughin, T. (1998). “It’s Good!” An analysis of the probability of success for placekicks. Chance, 11:20–24.
Bilder, C. and Loughin, T. (2002). Testing for conditional multiple marginal independence. Biometrics, 58:200–208.
Bilder, C. and Loughin, T. (2004). Testing for marginal independence between two categorical variables with multiple responses. Biometrics, 60:241–248.
Bilder, C. and Loughin, T. (2007). Modeling association between two or more categorical variables that allow for multiple category choices. Communications in Statistics: Theory and Methods, 36:433–451.
Bilder, C., Loughin, T., and Nettleton, D. (2000). Multiple marginal independence testing for pick any/c variables. Communications in Statistics: Simulation and Computation, 29:1285–1316.
Binder, D. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51:279–292.
Binder, D. and Roberts, G. (2003). Statistical inference for survey data analysis. In ASA Proceedings of the Joint Statistical Meetings, pages 568–572. American Statistical Association.
Binder, D. and Roberts, G. (2009). Design- and model-based inference for model parameters. In Pfeffermann, D. and Rao, C., editors, Handbook of Statistics 29B: Sample Surveys: Inference and Analysis, pages 33–54. Elsevier.
Blaker, H. (2000). Confidence curves and improved exact confidence intervals for discrete distributions. The Canadian Journal of Statistics, 28:783–798.
Blaker, H. (2001). Corrigenda: Confidence curves and improved exact confidence intervals for discrete distributions. The Canadian Journal of Statistics, 29:681.
Bolker, B. (2009). Dealing with quasi-models in R.
http://cran.rproject.org/web/packages/bbmle/vignettes/quasi.pdf.
Bonett, D. and Price, R. (2006). Confidence intervals for a ratio of binomial proportions based on paired data. Statistics in Medicine, 25:3039–3047.
Borkowf, C. (2006). Constructing binomial confidence intervals with near nominal coverage by adding a single imaginary failure or success. Statistics in Medicine, 25:3679–3695.
Breslow, N. and Lin, X. (1995). Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika, 82:81–91.
Bretz, F., Hothorn, T., and Westfall, P. (2011). Multiple Comparisons Using R. Chapman & Hall/CRC.
Brown, L., Cai, T., and DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16:101–133.
Brown, L., Cai, T., and DasGupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. The Annals of Statistics, 30:160–201.
Brown, P., Stone, J., and Ord-Smith, C. (1983). Toxaemic signs during pregnancy. Applied Statistics, 32:69–72.
Brownlee, K. (1955). Statistics of the 1954 polio vaccine trials. Journal of the American Statistical Association, 50:1005–1013.
Buonaccorsi, J. (2010). Measurement Error: Models, Methods, and Applications. Chapman & Hall/CRC.
Burnham, K. and Anderson, D. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, 2nd edition.
Calcagno, V. and de Mazancourt, C. (2010). glmmulti: An R package for easy automated model selection with (generalized) linear models. Journal of Statistical Software, 34.
Carlin, B. and Louis, T. (2008). Bayesian Methods for Data Analysis. Chapman & Hall/CRC.
Carroll, R., Ruppert, D., Stefanski, L., and Crainiceanu, C. (2010). Measurement Error in Nonlinear Models: A Modern Perspective. Chapman & Hall/CRC.
Casella, G. and Berger, R. (2002). Statistical Inference. Duxbury Press, 2nd edition.
Chambers, J. (2010). Software for Data Analysis: Programming with R. Springer.
Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49:327–335.
Clogg, C. and Eliason, S. (1987). Some common problems in log-linear analysis. Sociological Methods & Research, 16:8–44.
Clopper, C. and Pearson, E. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26:404–413.
Cognard, C., Gobin, Y., Pierot, L., Bailly, A., Houdart, E., Casasco, A., Chiras, J., and Merland, J. (1995). Cerebral dural arteriovenous fistulas: Clinical and angiographic correlation with a revised classification of venous drainage. Radiology, 194:671–680.
Coombs, C. (1964). A theory of data. Wiley.
Cowles, M. and Carlin, B. (1996). Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91:883–904.
Dalal, S., Fowlkes, E., and Hoadley, B. (1989). Risk analysis of the space shuttle: Pre-Challenger prediction of failure. Journal of the American Statistical Association, 84:945–957.
Davison, A. and Hinkley, D. (1997). Bootstrap Methods and their Application. Cambridge University Press.
Dawson, L. (2004). The Salk polio vaccine clinical trial of 1954: Risks, randomization and public involvement in research. Clinical Trials, 1:122–130.
DeHart, T., Tennen, H., Armeli, S., Todd, M., and Affleck, G. (2008). Drinking to regulate romantic relationship interactions: The moderating role of self-esteem. Journal of Experimental Social Psychology, 44:527–538.
Fang, L. and Loughin, T. (2012). Analyzing binomial data in a split-plot design: Classical approach or modern techniques? Communications in Statistics: Simulation and Computation, 42:727–740.
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80:27–38.
Fox, J. (2008). Applied Regression Analysis and Generalized Linear Models. Sage Publications, 2nd edition.
Foxman, B., Marsh, J., Gillespie, B., Rubin, N., Koopman, J., and Spear, S. (1997). Condom use and first-time urinary tract infection. Epidemiology, 8:637–641.
Francis, T., Korns, R., Voight, R., Boisen, M., Hemphill, F., Napier, J., and Tolchinsky, E. (1955). An evaluation of the 1954 poliomyelitis vaccine trials. American Journal of Public Health, 45:1–63.
Friendly, M. (1992). Graphical methods for categorical data. In SAS User Group International Conference Proceedings, volume 17, pages 190–200.
Gange, S. (1995). Generating multivariate categorical variates using the iterative proportional fitting algorithm. The American Statistician, 49:134–138.
Gelman, A., Carlin, J., Stern, H., and Rubin, D. (2004). Bayesian Data Analysis. Chapman & Hall/CRC.
Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, pages 1360–1383.
Gentle, J. (2009). Computational Statistics. Springer.
Greven, S. and Kneib, T. (2010). On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika, 97:773–789.
Gustafson, P. (2004). Measurement Error and Misclassificaion in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Chapman & Hall/CRC.
Halekoh, U., Højsgaard, S., and Yan, J. (2006). The R package geepack for generalized estimating equations. Journal of Statistical Software, 15.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning. Springer, 2nd edition.
Heeringa, S., West, B., and Berglund, P. (2010). Applied Survey Data Analysis. Chapman & Hall/CRC.
Heinze, G. (2006). A comparative investigation of methods for logistic regression with separated or nearly separated data. Statistics in Medicine, 25:4216–4226.
Heinze, G. and Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine, 21:2409–2419.
Henderson, M. and Meyer, M. (2001). Exploring the confidence interval for a binomial parameter in a first course in statistical computing. The American Statistician, 55:337–344.
Hirji, K., Mehta, C., and Patel, N. (1987). Computing distributions for exact logistic regression. Journal of the American Statistical Association, 82:1110–1117.
Hoeting, J., Madigan, D., Raftery, A., and Volinsky, C. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14:382–417.
Hosmer, D., Hosmer, T., le Cessie, S., and Lemeshow, S. (1997). A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine, 16:965–980.
Hosmer, D. and Lemeshow, S. (1980). Goodness-of-fit tests for the multiple logistic regression model. Communications in Statistics: Theory and Methods, 9:1043–1069.
Hosmer, D. and Lemeshow, S. (2000). Applied Logistic Regression. Wiley, 2nd edition.
Hubert, J. (1992). Bioassay. Kendall Hunt Publishing Company, 3rd edition.
Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 3:299–314.
Imrey, P., Koch, G., Stokes, M., in collaboration with Darroch, J., Freeman, D., and Tolley, H. (1982). Some reflections on the log-linear model and logistic regression. Part II: Data analysis. International Statistical Review, 50:35–63.
Korn, E. and Graubard, B. (1999). Analysis of Health Surveys. Wiley.
Kott, P. and Carr, D. (1997). Developing an estimation strategy for a pesticide data program. Journal of Official Statistics, 13:367–383.
Küchenhoff, H., Mwalili, S., and Lesaffre, E. (2006). A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics, 62:85–96.
Kuonen, D. (1999). Saddlepoint approximations for distributions of quadratic forms in normal variables. Biometrika, 86:929–935.
Kupper, L. and Haseman, J. (1978). The use of a correlated binomial model for the analysis of certain toxicological experiments. Biometrics, 34:69–76.
Kutner, M., Nachtsheim, C., and Neter, J. (2004). Applied Linear Regression Models. McGraw-Hill/Irwin, 4th edition.
Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34:1–14.
Larntz, K. (1978). Small-sample comparisons of exact levels for chi-squared goodnessof-fit statistics. Journal of the American Statistical Association, 73:253–263.
Lederer, W. and Küchenhoff, H. (2006). A short introduction to the SIMEX and MCSIMEX. R News, 6:26–31.
Lee, K. and Koval, J. (1997). Determination of the best significance level in forward logistic regression. Communications in Statistics: Simulation and Computation, 26:559–575.
Lesaffre, E. and Albert, A. (1989). Multi-group logistic regression diagnostics. Journal of the Royal Statistical Society, Series C, 38:425–440.
Liang, K. and Zeger, S. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73:13–22.
Littell, R., Milliken, G., Stroup, W., Wolfinger, R., and Schabenberger, O. (2006). SAS for Mixed Models. SAS Institute, 2nd edition.
Liu, P., Shi, Z., Zhang, Y., Xu, Z., Shu, H., and Zhang, X. (1997). A prospective study of a serum-pooling strategy in screening blood donors for antibody to hepatitis C virus. Transfusion, 37:732–736.
Lohr, S. (2010). Sampling: Design and Analysis. Cengage Learning, 2nd edition.
Long, J. (1990). The origins of sex differences in science. Social Forces, 68:1297–1316.
Loughin, T. (2004). A systematic comparison of methods for combining p-values from independent tests. Computational Statistics and Data Analysis, 47:467–485.
Loughin, T. and Bilder, C. (2010). On the use of a log-rate model for survey-weighted categorical data. Communications in Statistics: Theory and Methods, 40:2661–2669.
Loughin, T., Roediger, M., Milliken, G., and Schmidt, J. (2007). On the analysis of long-term experiments. Journal of the Royal Statistical Society, Series A, 170:29–42.
Loughin, T. and Scherer, P. (1998). Testing for association in contingency tables with multiple column responses. Biometrics, 54:630–637.
Lui, K. and Lin, C. (2003). A revisit on comparing the asymptotic interval estimators of odds ratio in a single 2x2 table. Biometrical Journal, 45:226–237.
Lumley, T. (2011). Complex Surveys: A Guide to Analysis using R. Wiley.
Margolin, B., Kaplan, N., and Zeiger, E. (1981). Statistical analysis of the Ames Salmonella/microsome test. Proceedings of the National Academy of Sciences USA, 78:3779–3783.
Martin, A. and Quinn, K. (2006). Applied Bayesian inference in R using MCMCpack. R News, 6:2–7.
Martin, A., Quinn, K., and Park, J. (2011). MCMCpack: Markov chain Monte Carlo in R. Journal of Statistical Software, 42.
Martin, C., Herrman, T., Loughin, T., and Oentong, S. (1998). Micropycnometer measurement of single-kernel density of healthy, sprouted, and scab-damaged wheats. Cereal Chemistry, 75:177–180.
Maugh, T. (2009). Results of AIDS vaccine trial ‘weak’ in second analysis. Los Angeles Times.
McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall/CRC, 2nd edition.
McLean, R., Sanders, W., and Stroup, W. (1991). A unified approach to mixed linear models. The American Statistician, 45:54–64.
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Pyschometrika, 12:153–157.
Mebane, W. and Sekhon, J. (2004). Robust estimation and outlier detection for overdispersed multinomial models of count data. American Journal of Political Science, 48:392–411.
Mehta, C. and Patel, N. (1995). Exact logistic regression: Theory and examples. Statistics in Medicine, 14:2143–2160.
Mehta, C., Patel, N., and Senchaudhuri, P. (2000). Efficient Monte Carlo methods for conditional logistic regression. Journal of the American Statistical Association, 95:99–108.
Meinshausen, N. (2007). Relaxed lasso. Computational Statistics and Data Analysis, 52:374–393.
Michalewicz, Z. (1996). Genetic Algorithms + Data Structures = Evolution Programs. Springer, 3rd edition.
Miller, A. (1984). Selection of subsets of regression variables (with discussion). Journal of the Royal Statistical Society, Series A, 147:389–425.
Milliken, G. and Johnson, D. (2001). Analysis of Messy Data, Volume III: Analysis of Covariance. Chapman & Hall/CRC.
Milliken, G. and Johnson, D. (2004). Analysis of Messy Data Volume I: Designed Experiments. Chapman & Hall/CRC, 2nd edition.
Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. Springer.
Moore, D. and Notz, W. (2009). Statistics: Concepts and Controversies. W.H. Freeman & Company, 7th edition.
Mullahy, J. (1986). Specification and testing of some modified count data models. Journal of Econometrics, 33:341–365.
Newcombe, R. (1998). Improved confidence intervals for the difference between binomial proportions based on paired data. Statistics in Medicine, 17:2635–2650.
Newcombe, R. (2001). Logit confidence intervals and the inverse sinh transformation. The American Statistician, 55:200–202.
Osius, G. and Rojek, D. (1992). Normal goodness-of-fit tests for multinomial models with large degrees of freedom. Journal of the American Statistical Association, 87:1145–1152.
Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R news, 6:7–11.
Potter, D. (2005). A permutation test for inference in logistic regression with smalland moderate-sized data sets. Statistics in Medicine, 24:693–708.
Pregibon, D. (1981). Logistic regression diagnostics. The Annals of Statistics, 9:705–724.
Raftery, A. (1995). Bayesian model selection in social research. Sociological Methodology, 25:111–163.
Rao, J. and Scott, A. (1981). The analysis of categorical data from complex sample surveys: Chi-squared tests for goodness of fit and independence in two-way tables. Journal of the American Statistical Association, 76:221–230.
Rao, J. and Scott, A. (1984). On chi-squared tests for multiway contingency tables with cell proportions estimated from survey data. The Annals of Statistics, 12:46–60.
Rao, J. and Thomas, D. (2003). Analysis of categorical response data from complex surveys: An appraisal and update. In Chambers, R. and Skinner, C., editors, Analysis of survey data, pages 85–108. Wiley.
Raudenbush, S. and Bryk, A. (2002). Hierarchical linear models: Applications and data analysis methods. Sage Publications, 2nd edition.
Rerks-Ngarm, S., Pitisuttithum, P., Nitayaphan, S., Kaewkungwal, J., Chiu, J., Paris, R., Premsri, N., Namwat, C., de Souza, M., Adams, E., Benenson, M., Gurunathan, S., Tartaglia, J., McNeil, J., Francis, D., Stablein, D., Birx, D., Chunsuttiwat, S.,Khamboonruang, C., Thongcharoen, P., Robb, M., Michael, N., Kunasol, P., and Kim, J. (2009). Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand. New England Journal of Medicine, 361:2209–2220.
Richardson, M. and Haller, S. (2002). What is the probability of a kiss? (It’s not what you think). Journal of Statistics Education, 10:9–9.
Richert, B., Tokach, M., Goodband, R., and Nelssen, J. (1995). Assessing producer awareness of the impact of swine production on the environment. Journal of Extension, 33.
Riemer, S., Wright, B., and Brown, R. (2011). Food habits of Steller sea lions (Eumetopias jubatus) off Oregon and northern California, 1986–2007. Fishery Bulletin, 109:369–381.
Robert, C. (2001). The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. Springer.
Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer.
Robert, C. and Casella, G. (2010). Introducing Monte Carlo Methods with R. Springer.
Rogan, W. and Gladen, B. (1978). Estimating the prevalence from the results of a screening test. American Journal of Epidemiology, 107:71–76.
Root, R. (1967). The niche exploitation pattern of the blue-gray gnatcatcher. Ecological Monographs, 37:317–350.
Rours, G., Verkooyen, R., Willemse, H., van der Zwaan, E., van Belkum, A., de Groot, R., Verbrugh, H., and Ossewaarde, J. (2005). Use of pooled urine samples and automated DNA isolation to achieve improved sensitivity and cost-effectiveness of largescale testing for Chlamydia trachomatis in pregnant women. Journal of Clinical Microbiology, 43:4684–4690.
Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society, Series B, 71:319–392.
Rust, K. and Rao, J. (1996). Variance estimation for complex surveys using replication techniques. Statistical Methods in Medical Research, 5:293–310.
Salsburg, D. (2001). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. Henry Holt and Company, LLC.
Satterthwaite, F. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2:110–114.
Schonnop, R., Yang, Y., Feldman, F., Robinson, E., Loughin, M., and Robinovitch, S. (2013). Prevalence of and factors associated with head impact during falls in older adults in long-term care. Canadian Medical Association Journal, 185:E803–E810.
Schwartz, C. and Mare, R. (2005). Trends in educational assortative marriage from 1940 to 2003. Demography, 42:621–646.
Scott, A. (2007). Rao-Scott corrections and their impact. In Proceedings of the Section on Survey Research Methods, pages 3514–3518. American Statistical Association.
Scott, A. and Rao, J. (1981). Chi-squared tests for contingency tables with proportions estimated from survey data. In Krewski, J., Platek, R., and Rao, J., editors, Current Topics in Survey Sampling, pages 247–266. Academic Press.
Seeber, G. (2005). Poisson Regression. In Armitage, P. and Colton, T., editors, Encyclopedia of Biostatistics, online. Wiley.
Severini, T. (2000). Likelihood Methods in Statistics. Oxford University Press.
Shtatland, E., Kleinman, K., and Cain, E. (2003). Stepwise methods using SAS Proc Logistic and SAS Enterprise Miner for Prediction. In SAS Users Group International, volume 28, paper 258. SAS Institute.
Skinner, C. and Vallet, L. (2010). Fitting log-linear models to contingency tables from surveys with complex sampling designs: An investigation of the Clogg-Eliason approach. Sociological Methods & Research, 39:83–108.
Snee, R. (1974). Graphical display of two-way contingency tables. The American Statistician, 28:9–12.
Stroup, W. (2013). Generalized Linear Mixed Models: Modern Concepts, Methods, and Applications. CRC Press.
Stukel, T. (1988). Generalized logistic models. Journal of the American Statistical Association, 83:426–431.
Suess, E., Sultana, D., and Gongwer, G. (2006). How much confidence should you have in binomial confidence intervals? Stats: The Magazine for Students of Statistics, 45:3–7.
Swift, M. (2009). Comparison of confidence intervals for a Poisson mean—further considerations. Communications in Statistics: Theory and Methods, 238:748–759.
Tango, T. (1998). Equivalence test and confidence interval for the difference in proportions for the paired-sample design. Statistics in Medicine, 17:891–908.
Tanner, M. (1996). Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Springer, 3rd edition.
Tauber, M., Tauber, C., and Nechols, J. (1996). Life history of Galerucella nymphaeae and implications of reproductive diapause for rearing univoltine chrysomelids. Physiological Entomology, 21:317–324.
Thomas, D. and Decady, Y. (2004). Testing for association using multiple response survey data: Approximate procedures based on the Rao-Scott approach. International Journal of Testing, 4:43–59.
Thomas, D. and Rao, J. (1987). Small-sample comparisons of level and power for simple goodness-of-fit statistics under cluster sampling. Journal of the American Statistical Association, 82:630–636.
Thomas, D., Singh, A., and Roberts, G. (1996). Tests of independence on two-way tables under cluster sampling: An evaluation. International Statistical Review, pages 295–311.
Thompson, S. (2002). Sampling. Wiley.
Thorburn, D., Dundas, D., McCruden, E., Cameron, S., Goldberg, D., Symington, I., Kirk, A., and Mills, P. (2001). A study of hepatitis C prevalence in healthcare workers in the west of Scotland. Gut, 48:116–120.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58:267–288.
Turner, D., Ralphs, M., and Evans, J. (1992). Logistic analysis for monitoring and assessing herbicide efficacy. Weed Technology, 6:424–430.
Vance, A. (2009). Data analysts captivated by R’s power. The New York Times.
Vansteelandt, S., Goetghebeur, E., and Verstraeten, T. (2000). Regression models for disease prevalence with diagnostic tests on pool of serum samples. Biometrics, 56:1126–1133.
Ver Hoef, J. and Boveng, P. (2007). Quasi-Poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology, 11:2766–2772.
Vermunt, J. and Magidson, J. (2007). Latent class analysis with sampling weights: A maximum-likelihood approach. Sociological Methods & Research, 36:87–111.
Verstraeten, T., Farah, B., Duchateau, L., and Matu, R. (1998). Pooling sera to reduce the cost of HIV surveillance: A feasibility study in a rural Kenyan district. Tropical Medicine and International Health, 3:747–750.
Vos, P. and Hudson, S. (2005). Evaluation criteria for discrete confidence intervals: Beyond coverage and length. The American Statistician, 59:137–142.
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54:426–482.
Wardrop, R. (1995). Simpson’s paradox and the hot hand in basketball. The American Statistician, 49:24–28.
Wedderburn, R. (1974). Quasi-likelihood, generalized linear models, and the Gauss-Newton method. Biometrika, 61:439–447.
Westfall, P. and Young, S. (1993). Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley.
Wilkins, T., Malcolm, J., Raina, D., and Schade, R. (2010). Hepatitis C: Diagnosis and treatment. American Family Physician, 81:1351–1357.
Williams, D. (1975). The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics, 31:949–952.
Wilson, E. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22:209–212.
Wright, B. (2009). Use of chi-square tests to analyze scat-derived diet composition data. Marine Mammal Science, 26:395–401.
Yee, T. (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68:49–67.
Zamar, D., McNeney, B., and Graham, J. (2007). elrm: Software implementing exact like inference for logistic regression models. Journal of Statistical Software, 21(3):1–18.
Zeger, S. and Liang, K. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, pages 121–130.
Zeileis, A., Kleiber, C., and Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27.
Zhou, X. and Qin, G. (2005). A new confidence interval for the difference between two binomial proportions of paired data. Journal of Statistical Planning and Inference, 128:527–542.
Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101:1418–1429.

Análise de Dados Categóricos

2023-11-04

Conteúdo

Referências