Instructor: Xuewen Lu
Course Web: http://www.math.ucalgary.ca/~lux
Classes:
Office Hours: Wednesdays:
Blackboard
Note: For other course information not on this page such as lecture notes, sample programs and assignment solutions, please visit the Blackboard . You will need to log-on using your University of Calgary IT username and password. A student who does not have a valid IT username can obtain one by going to the online registration system at http://www.ucalgary.ca/it/register.
Course Description
This course introduces some statistical modeling tools that are developed for situations where least squares regression and standard ANOVA techniques may not naturally apply. The statistical methods studied are the general linear models for quantitative responses (including multiple regression, ANOVA and ANCOVA), binomial regression models for binary data (including logistic regression and probit models), and Poisson regression models for count data (including log-linear models for contingency tables and hazard models for survival data). All of these techniques are covered as special cases of the generalized linear models (GLM) for regression (or ANOVA or ANCOVA) with Gaussian or non Gaussian responses. Much recent statistical research has focused on generalized additive models (GAM) and on generalized linear mixed models (GLMM) and applications in longitudinal studies. We will survey some of this more recent methodology. As the course develops, we will make extensive use of the statistical modelling and analysis package Splus/R and SAS. Data examples will be used throughout the course to illustrate the methodologies and the related software tools.
Prerequisite
Working knowledge of basic statistical inference and modeling, such as the theory of point estimation and statistical hypothesis testing, ANOVA, ANCOVA and the standard linear models.
Textbook
· An introduction to generalized linear models (3rd ed., 2008), by Dobson and Barnett (DB).
· This website contains some SAS Textbook examples hosted in UCLA.
References
· Introduction to Statistical Modelling in R , by P.M.E. Altham.
· Categorical Data Analysis (2002) , by Alan Agresti
· Modern Applied Statistics with S-Plus, by Venables & Ripley.
·
Generalized Linear Models, by McCullagh
& Nelder, 1989,
·
Survival Analysis: Techniques for Censored
and Truncated Data, by Klein and Moeschberger, 1997,
·
Modelling Binary Data, by D. Collett,
1991,
·
Multivariate Statistical Modelling Based on
Generalized Linear Models, by Fahrmeir and Tutz, 1994,
· Categorical Data Analysis Using the SAS System, by Stokes, Davis & Koch, 1995, SAS Institute Inc., Cary, NC, USA.
Software
We will be using R, an
open-source clone of S/Splus, or SAS for computation programming, data analysis
and graphics. R resources are to be found at CRAN, the Comprehensive R Archive
Network. The S/Splus Archive at Statlib
contains contributed code for S/Splus, which may or may not work under R.
Note: SAS is available only in room MS 571.
The following tutorial documents should be helpful to you, especially if you
had little previous exposure to R/S/Splus and SAS.
· An Introduction to R, by Venables, Smith, and the R Development Core Team.
· Using R for Data Analysis and Graphics: An Introduction, by John Maindonald.
· Introduction to Categorical Data Analysis Procedures. SAS/STAT User's Guide
· SAS Online Documents SAS/IML, SAS/GLM, SAS/REG, SAS/GENMOD, SAS/MIXED etc. User's Guide
· SAS Onlines Samples for Categorical Data Analysis Using the SAS System, by Stokes, Davis & Koch, 1995, SAS Institute Inc., Cary, NC, USA.
Course Work
There will be four homework
assignments, a midterm, and a project & oral presentation. The assignments
will contribute about 40% to the course grade, the midterm 30%, and the project
& oral presentation 30%. Some worksheets designed by Altham (see this reference above)
will be assigned as non-credit homework for practising R and GLM.
You are encouraged to discuss with each other on
the homework assignments, but you are expected to do your independent work. You
also need to submit your computer programs electronically for me to test. For
the project, you need to find a data set from the real applications and analyze
it using the methods you learned from this course. After that, you should write
a report with 8-10 pages (double spaced lines) and present your discoveries to
the class in 20-30 minutes. Evaluation of your work will be based on novelty of
the approaches, correct interpretation of the results and oral presentation of
the findings.
Homework Assignments and Project due Dates and Midterm Time
|
Assignment |
A1 |
A2 |
A3 |
A4 |
Midterm |
Project |
|
Chapters Covered and road map |
1->2->6 |
3->4->5 |
7->8 |
9->11->10 |
|
|