User:Georges
From Wiki1
m |
m |
||
Line 4: | Line 4: | ||
** The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response. | ** The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response. | ||
** the default level for 'fitted' is the finest level. | ** the default level for 'fitted' is the finest level. | ||
- | * http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf | + | ** The variance function must be expressed in terms of the expected value of the re-expressed response. |
+ | http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf | ||
* <opml><body><outline text="Microsoft PowerPoint - Slides" _note="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf " /></body></opml> | * <opml><body><outline text="Microsoft PowerPoint - Slides" _note="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf " /></body></opml> | ||
== Paradoxes, Fallacies and Other Surprises == | == Paradoxes, Fallacies and Other Surprises == |
Revision as of 21:41, 12 March 2019
Notes on Mixed Models
- varPower(form = ~ fitted(.), fixed = 1)
- The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response.
- the default level for 'fitted' is the finest level.
- The variance function must be expressed in terms of the expected value of the re-expressed response.
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf
- <opml><body><outline text="Microsoft PowerPoint - Slides" _note="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf " /></body></opml>
Paradoxes, Fallacies and Other Surprises
Paradoxes, Fallacies and Other Surprises
Bayes
- Two consecutive issues of Statistical Science in 2011 have many interesting article that are related to Bayesian inference:
- Experimenting with files:
- jpg file
- Using the wiki link to the uploaded name:
- Using the wiki link to the uploaded name as media: Media:2013-12-29 18.20.34.jpg
- .R file
- FIle wiki link File:Tcells.R
- Media wiki link Media:Tcells.R
- jpg file
- Useful formulas
- SEM with STAN
- Interaction fallacy in a presentation:
- If you think two variable affect each other then you should include an interaction between them. (Fooled by the word 'interaction').
- Gelman and Robert on Bayes
- MARS: Multivariate Adaptive Regression Splines
- Teaching with R using MOSAIC by ... and D. Kaplan
- Causality: interactive app illustrating Simpson's Paradox
- metafor: Tutorial using mixed models for meta analysis
- Andrew Gelman on survey weights with multilevel models: he suggests unweighted modeling (or a 'variance weighted' analysis, e.g. replication weights) followed by poststratification.
- Intro to R and Rstudio in an intro course at Duke
- Intro to R in RStudio
- Multilevel Modeling Using R
- Multilevel Modeling in R by Paul Bliese
- Losing ground to CS?
- Interview with David Smith at UseR 2014
- On colour
- 3d plotting packages
- Stigler's Seven Pillars of Statistics
- FAQ on GLMMs
- Virtual Labs in Probability and Statistics
- R code school with O'Reilly
- pastecs package for time series
- Do doctors understand test results? By William Kremer -- about Gerd Gigerenzer
- /Multiple Testing -- a comment
- /MOOCs for Data Science
- Arts Squared
- /Using R Markdown
- /Statistics Links for Courses
- /Lee Lorch
- Baumer et al. (2014) Using R Markdown in Intro Stats
- /Job ratings
- Using WinBUGS on the Netherlands data
- /Climate Change
- /Standardize or Not
- /Mixed Models -- papers
- /MCMCglmm
- /Wiki tests
Cause, correlation, or ...
Notepad
- R resources for GLMs
- On Mosaic plots
- /Academic and Administrative Program Review
- /Statistics programs
- Careers Expo at York with 13 booths from UofT such as Dalla Lana but only one generic FGS booth from York.
- Schervish's p-value paradox
- /Data
- Course evaluations: the good, the bad and the ugly
- Larry Wasserman on
- /HOA
- /Big data
- /Student satisfaction
- Rob Tibshirani's list of 9 great statistics papers
- Cassidy on the Reinhart-Rogoff controversy.
- Clinical Trials registry in the US
- The Cochrane Collection
- 2004 ICMJE: policy of registration:
Recommended sources on statistics:
There are many excellent sources for information on current statistical issues (Psychonomic Society Journals):
- Confidence Intervals:
- Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci).
- Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 57, 203-220. doi:10.1037/h0087426
- Effect Size Estimates:
- Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis and the interpretation of research results. Cambridge University Press. ISBN 978-0-521-14246-5.
- Fritz, C. O., Morris, P. E., & Richler, J. J. (2011). Effect size estimates: Current use, calculations and interpretation. Journal of Experimental Psychology: General, 141, 2-18.
- Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York, NY: Routledge/Taylor & Francis Group.
- Meta-analysis:
- Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY US: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci ).
- Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis. New York: Oxford University Press.
- Bayesian Data Analysis:
- Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. San Diego, CA: Elsevier Academic Press. (See www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/)
- Kruschke, J. K. (in press). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. (For a preprint see http://www.indiana.edu/~kruschke/BEST/BEST.pdf).
- Power Analysis:
- Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. (See http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/)
Blogs
Pythagoras Diagram
Recent changes
Links
/Recent Changes /Contributions
/DO
Topics
- /R packages
- /Curriculum
- /HLM links
- /Education links
- Death of Evidence
- /Mixed effects for multinomial responses
- /Ellipse paper comments
- On Tobacco (from Matt)
- /fda.R
- /FSE Scholars evening
- /MATH 6627 student contributions
Data scraping
RStudio: Shiny
Notes for 6643
- Assignment: Can we produce an estimate of AIC based just on the Wald test?
On Pedagogy
- On the importance of quantitative skills in social science
- http://www.matstat.com/teach/
- Common Misteaks in Statistics
Advice for students
Questions (e.g. for survey papers)
- Implement more diagnostics in R for lme models
- Explore duality of the whole data matrix
- Extend the UD representation to hyperbola, etc., and include a way of plotting osculation loci
- Explore the geometry of harmonic combinations and its implications for mixed model estimates. What happens as you shift weight from G to (X'X)^{ − 1}? How does the result wander outside the convex combination? When does it happen and what does it mean?
- Refine Lform and related tools
Read
- [http://www.ams.org/notices/201001/rtx100100030p.pdf Music: Broken Symmetry,
Geometry, and Complexity]
R course
Day 2 - add
- final recap of 'lm' interface: subset, na.action, etc., etc.
- discuss formula syntax
- final recap of methods for 'lm'
- note easy extension to 'glm', 'lme', etc.
- note that many 'new' functions do not use this interface, only more 'mature' functions
- lm.formula
- discuss OO showing methods and dispatching
Day 3
- the most useful tools:
- seq
- rep
- replacement functions
- data input
- more programming
- object oriented programming
- using a function in C
- using attributes
- systematic treatment of graphics, including
- par
- xyplot
/Day2 Guided Tour of Linear Models.R
SCS Reads 2011 Links
Capstone courses
Links to recent courses
Links to add somewhere
- Battling bad science
- D W Hosmer, S Taber and S Lemeshow () "The importance of assessing the fit of logistic regression models: a case study." American Journal of Public Health, Vol. 81, Issue 12 1630-1635
- Quick R for SPSS, SAS and Stata users.
Graphics
- ET Modern
- Striking graphics:
Matrices
Simpson's Paradox
In the 1979 Canadian federal election an unusual event occurred in the Northwest Territories: the Liberals won the popular vote in the territory, but won neither seat.
Lee Lorch
- Marybeth Gasman (1999) "Scylla and Charybdis: Navigating the Waters of Academic Freedom at Fisk University During Charles S. Johnson's Administration (1946–1956)" American Educational Research Journal
- A prominent sociologist and race relations activist, Charles S. Johnson dedicated his life to the advancement of Blacks. His presidency at Fisk University, a historically Black college, was the culmination of his career. During the latter part of his administration, he faced a dilemma involving an outspoken professor named Lee Lorch, who, in 1954, was accused of being a communist. Johnson and the Board of Trustees dismissed Lorch because he refused to answer a congressional committee's questions about his previous political affiliations. In 1959, the American Association of University Professors found the late President Johnson guilty of violating the principles of academic freedom. This article explores the ways in which academic freedom, civil liberties, and civil rights clashed in the Lee Lorch case. Furthermore, it examines the ways in which the setting of a historically Black college alters traditional assumptions about the application of these principles.
- [http://www.nytimes.com/2010/11/22/nyregion/22stuyvesant.html Charles V. Bagli (November 21, 2010), "A New Light on a Fight to Integrate Stuyvesant Town", New York Times.
Multilevel Models
Expository
Missing Data
Evaluation
- Green, M.J., Medley G.F., & Browne, W.J. (2009). Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression. Veterinary Research, 40(4):30.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675184/pdf/vetres-40-30.pdf
Software for multilevel models
Package | Function | Notes |
---|---|---|
R
clmm {ordinal} | Ordinal response: Fits cumulative link mixed models, i.e. cumulative link models with random effects via the Laplace approximation or the standard and the adaptive Gauss-Hermite quadrature approximation. The functionality in clm is also implemented here. Currently only a single random term is allowed in the location-part of the model. | |
R: {lme4a} | Development version of lme4
Download: svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4 | |
R: {MCMCglmm} | MCMC Methods for Multi-response Generalized Linear Mixed Models | |
R: {plm} | Econometric Analysis of Panel Survey Data | Vignette See p. 3 for comments on first-differencing. |
See Snijders and Bosker (2012) for longer list | ||
R: {lme4:nlmm} | Mon-linear models with lme4 | Presentation by Doug Bates |
Clones
Check for changes and reconcile
- Lab 1
Read
On the age-period-cohort problem:
- see bibliography by Yang: http://home.uchicago.edu/~yangy/research.html
Do
Read
- Links to recent papers by David Freedman
- Links to material by Chris Wild:
- Kai Ng's converse
Notes
R notes
Items to cover
- Wrap up language:
- Selection (give context): indices: index, names, logical, matrix of coordinates, 'subset'
- Example: dropping NAs from selected variables. Necessary because functions that are most sophisticated methodologically are generally least sophisticated in their interface
- contrast sophisticated program: lm with unsophisticated lowess
- Example: dropping NAs from selected variables. Necessary because functions that are most sophisticated methodologically are generally least sophisticated in their interface
- Selection (give context): indices: index, names, logical, matrix of coordinates, 'subset'
- Using variables in data frames:
- formula oriented functions: xyplot( y ~ x, data = dd )
- explicit: plot( dd$x, dd$y )
- with: with( dd, plot(x,y)); with(dd, xyplot( y ~ x, dd)
- attach: As usual the easiest is deprecated! (why is it only easy and pleasurable things that are ever deprecated)
- attach(dd)
- plot(x, y)
- detach(dd)
- Problem with 'attach':
- names in data frame may be masked by names in workspace
- assignments in workspace not saved in data frame
- Overview of graphics
- Programming structures
- Add to graphics:
- Colours: pal(grepv('red',colors())); pals() # for all
- modified tablemissing
debugging in R
Links
Importing files
From Excel
- Easy: save file in Excel as .csv, then read into R with read.csv
- If you have a lot of files, or get the files from some other sources that edits .xls or .xlsx files:
- The winner: package gdata:
- First install perl.
- read.xls in gdata handles both .xls and .xlsx files
- works on both 32-bit and 64-bit machines
- package XLConnect seems to work only on xlsx files
- the smaller xlsx package also works only xlsx files
- Package xlsReadWrite works on xls files but only on 32-bit systems
- Use xls2csv, a Perl script to convert files to csv first.
Getting lines vs points for different groups in xyplot
Ideally, type = c('l','p') would work but it doesn't seem to. So one way is to use type = 'b' with an invisible line for one group and an invisible point for the other:
library(spida.beta) # also loads 'car' dd <- Prestige dd$income.pred <- predict( lm( income ~ education*type, dd), newdata = dd) td( lty = c(1,0), pch = c(32, 16), lwd = 2) # lty = 0 produces an invisible line # and pch = 32 seems to be an invisible point xyplot( income.pred + income ~ education|type, dd[order(dd$education),], type = 'b', auto.key = list( columns = 2, lines = T, points = T))
Also show example using panel.superpose.2
Bugs
grade <- function(x , cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0, grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) { factor(cut(x, cos, grade, right = FALSE), levels = grade) } dg$Grade <- grade( dg$Final ) tab(dg, ~ Grade) # gets indexing of levels wrong # the following seems to work correctly grade <- function(x , cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0, grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) { ret <- cut(x, cos, grade, right = FALSE) factor(ret, levels = grade) }
Getting the G matrix in nlme
fit <- lme( y ~ x, dd, random = ~1+x |id) G <- pdMatrix( fit$modelStruct$reStruct)$id
Building R packages in 2.14
- Install R
- Install tools: http://robjhyndman.com/researchtips/building-r-packages-for-windows/
Notes
- IPSUR: Introduction to Probability and Statistics using R
- /test of slash
- /schedule
- Addicted to R Graph Gallery
- R Wiki
- Stata demo in R
Thumbnail test
Here is a graphic file in raw form:
And here is the same file with a thumbnail:
Math check
Please click on the 'discussion' tab above
- Test how math renders:
glmmPQL etc
Good discussion between Doug and Ben: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001457.html
Combining unbiased estimators
THis is an example:
- bullet 1
- bullet 2
- again
- indented
- again
- bullet3
nubmered bullets:
- one
- two
- dkjkdj
- djkdj
- djfkd
- dkjkdj
subheading
new stuff
sub sub
more stuff
Let and be unbiased estimators of with non-singular variances V_{1} and V_{2} respectively.
Then the minimum variance linear unbiased estimator of φ is obtained by combining and using weights that are proportional to the inverses of their variances. The result can be expressed in a variety of ways:
The proof is an application of the principle of Generalized Least-Squares. The problem can be formulated as a GLS problem by considering that: with
Applying the GLS formula yields:
From Nassif Ghoussoub
Beware the “useful idiocy” of Mr. Morgan
The latest commentary of Gwyn Morgan in the Globe and Mail, “If universities were in business, they’d be out of business”, http://www.theglobeandmail.com/report-on-business/commentary/gwyn-morgan/ has crossed another line. Far from being an analysis of the state of Canadian universities, his rant is personal, bitter, demeaning, and insulting to university professors across the country.
Back in his Globe article of April 29, 2009, “Not all research deserves public funding“, the retired CEO of EnCana Corp. proceeded to rip into the “ivory towers of academia”, attack “esoteric research” and disparage any graduate degree not hailing from medicine or engineering. He also dismissed the 2300 scientists who joined the “Don’t Leave Canada Behind” campaign, which called on the government to include R&D, the lifeblood of the new economy, in its stimulus budget. To its credit, the government of Canada responded positively to the call of its scientists, but from his -dubiously earned- platform at the Globe and Mail, Mr. Morgan kept at it.
In Saturday’s paper, Mr. Morgan employs sweeping generalizations and ghost statistics to come to the conclusion that, among other things, Canada’s university professors are “poorly prepared” for their lectures, “show up occasionally” to class, and give “poorly thought out assignments”. He claims “the reaction of universities to widespread student dissatisfaction is to blame insufficient financing, rather than their own dysfunction”. He offers that in the new age, formal lectures should be altogether ended.
His commentary provides neither data about student learning, nor any direct quotations from professors or students. A 1991 study is cited, and then baptized as the truth with a simple "Nineteen years later, little has changed." The article does not attack a particular university, faculty, or teaching method, but rather an apparently archetypal "university professor".
So what if he hasn't been on campus in 40 years? He knows how it is. Even then, he "stopped going to classes and dedicated his time to learning from textbooks and reviewing friends’ notes". But Mr. Morgan ignores that a professor somewhere, sometime, must have produced and dictated these textbooks and notes. He finds "no reason why all written course material can’t be delivered via the Internet", obviously not aware that since the 90’s, most course material has been made available on the Internet, thanks to dedicated professors. Morgan's suggestion that we replace large classes with "small informal discussions" sounds great, but how does the CEO propose we pay for the much larger number of professors required to do the job? He wants universities to run like businesses, but as one reader suggested: “If Universities were run like the oil and gas industry we would be back in the dark ages where the only skill required would be to count your money... at least until the oil runs out”.
It is obvious that we embattled post-secondary teachers and researchers need to worry more about the “very useful idiocy” of Mr. Morgan, the permanent platform he has been provided, and the damage that drivel like this can cause to higher education and advanced research in Canada.
For the Globe and Mail, Mr. Morgan has been pure comic gold for years. Writing on a variety of subjects, ranging from environmental issues and health care, to research and post-secondary education, he has been a bottomless trove of shameless misrepresentations, extreme views and sheer wackiness. But ultimately, this is not only about Gwyn Morgan nor about the Globe and Mail. It is about us.
It is about Canada’s University Presidents countering his dangerous Tea Party style rhetoric on our post-secondary institutions.
It is about the Deans of Canada’s Faculties facing up to Mr. Morgan when he writes: “Many qualified applicants are turned away from areas such as engineering and medicine, while universities continue to graduate thousands with knowledge that is neither useful in getting a job, nor in helping our country succeed in a competitive world.”
It is about the Royal Society of Canada, and other learned societies responding to his views about “esoteric research that doesn’t have the slightest chance of yielding any real value”.
It is also up to our schools of journalism, to point out to mainstream media the irresponsibility in printing shallow, empty articles full of generalizations and devoid of facts.
Mr. Morgan may be one of those individuals who get so many things wrong at once that the thought of challenging them or setting the record straight is just too daunting. But it is incumbent upon us not to let his rhetoric negate the exemplary contributions of thousands of Canada’s scholars, teachers and researchers.
Nassif Ghoussoub, Professor of Mathematics, The University of British Columbia
Cell Phones
Date: Sun, 10 Oct 2010 20:02:29 -0400 From: Stuart Newman <newman@NYMC.EDU> Reply-To: Science for the People Discussion List
<SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU>
To: SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU Subject: "Disconnect": Why cellphones may be killing us
Though I haven't yet read it, this book is presumably not based on anecdotal evidence. The author, Devra Davis, is the founding director of the toxicology and environmental studies board at the U.S. National Academy of Sciences.
http://tinyurl.com/2fvycxc [Salon.com]
"Disconnect": Why cellphones may be killing us A new book probes the connection between mobile devices and a host of health problems -- with frightening results By Thomas Rogers
Links
Notes on mediation
The question of mediation is essentially a question about causality. Is the putative mediator, M say, caused by X and, in turn, a cause of Y? But M, in a mediational analysis, cannot have been randomized even if X has been. The question of mediation is essentially a question about causality with observational, not experimental, data.
To get a perspective on the problem we need to start by considering the general problem of causality with observational data. Let Y be the response variable and let X be the 'target' variable which is seen as a possible 'cause' of Y. For X to cause Y means that the expected value of Y would change in some target experimental condition in which X was manipulated (perhaps through random allocation) while other variables were left untouched -- not necessarily unchanged.
For causal inference with observational data, we are interested in what would happen under circumstances that are different from those we have actually observed. Our analysis of our observational data will yield an accurate estimate of the causal effect of X if the model for the observational data has the same coefficient for X as it would have if it were applied to data gathered under the target experimental condition. The challenge is to specify and estimate a model that is 'transferable' from the observational condition to the experimental condition. We need a set of concepts to help us critically assess whether a model is transferable. It is not sufficient to have a model that 'fits' well. It may be necessary to include potential confounding factors even if they are not significant in the prediction model for Y. And it may be necessary to exclude strong predictors that are potential mediators -- variables that must not be held constant as one examines the causal relationship between X and Y. One needs a good understanding of the causal model that is valid under experimental conditions in order to properly specify a transferable observational model.
The problem can be approached in a surprisingly different way, which is the basis for propensity scores. Instead of focusing on a 'transferable' model for Y, one focuses on a model for the assignment of the target causal variable X using potential confounding variables. As in models for Y, it is important to avoid potential mediators between X and Y. However, the model for X based on confounding factors is a prediction model. Confounding variables may be included, raw or transformed, as long as they are predictive of X. It is not necessary to include variables that are not predictive of X. The criterion for developing the model is statistical fit, a criterion that -- apart from the actual selection of confounding predictors -- is empirical, i.e. it is based on the analysis of the data at hand without reference to external theory that is not verifiable with the data. The assignment model need only be valid for the observational condition. Its validity for the experimental condition is irrelevant.
What are some of the pros and cons of the two approaches? A good transferable model for Y may provide more precise estimates of the effect of X because more of the variability in Y is accounted for in the model. On the other hand, the validity of a causal estimate based on the propensity score approach depends on assumptions that may be much easier to sustain than those required for the approach based on modeling Y. Broadly, the propensity score approach offers lower bias but not necessarily lower variability. Note that the two approaches are not mutually exclusive. They may be better viewed as two sets of concepts that could be combined in an analysis that draws from both.
How does this all relate to the analysis of mediation? The Baron and Kenny approach and its variants -- in which I include the various ways of estimating direct and indirect causal effects -- are all based on methods analogous to models for Y. As mentioned earlier, estimating the causal effect of the mediator involves causal inference with observational data -- even in the context of an experiment randomizing X. This invites the question whether propensity score methods could be used in assessing more accurately the causal effect of M. The answer lies in the relatively recent theory of principal stratification [Constantine Frangakis and Donald Rubin (2002) "Principal Stratification in Causal Inference", Biometrics, 58, 21--29].
An accessible reference for the concepts behind propensity scores is Donald Rubin (1997) "Estimating Causal Effects from Large Data Sets Using Propensity Scores," Annals of Internal Medicine, 127, 757--763.
A recent treatment of mediation using principal stratification is given in Chapter 8, "Intermediate Causal Factors," of Herbert Weisberg (2010) Bias and Causation: Models and Judgment for Valid Comparisons, Wiley.
With the large number of seemingly competing approaches to causal inference, students as well as experienced researchers may feel quite puzzled as to which approach they should use. The answer, possibly, is all of them. Each approach seems to shed light on some aspect of the challenge of causal inference in the absence of pristine randomization. They do not offer recipes so much as sets of concepts that can be applied to help understand research projects and analyses.
Shock of the New
York AODA
Notes for NATS 1500
- Topics
- single-sex schools?
Notes for MATH 6627
- Collection of misleading graphs
- ASA consulting page
- set up student home page
- first assignment. Find and explore a dataset using
- Ernest Kwan's correlagram
- Lattice (use panels and groups)
- p3d
- gapminder
- should have included candisc
- present a 15-minute(crucial) presentation on the data set and on the method
- prepare a wiki page with links and materials
- Address a few questions:
- What are strengths and weaknesses
- For what kind of dataset is it well suited and what kind not?
- Can you find a dataset that illustrates well the features of this approach?
- Can you compare your approach with other approaches?
- develop checklists:
- initial exploration of data
- missing data (explicit and implicit)
- do simulation of parallel methods: check estimation of variance parameters
- use nlme to estimate knot placement in gsp
Links
- Why bad math can ruin your health
- TED talk: David McCandless, The Beauty of Data Visualization
- Hans Rosling
- Great intro to Gapminder: http://www.mrbartonmaths.com/gapminder.htm
Notes for R course
- Start: It had to be U ... on the SVD [1]
- Use SPSS dates both ways to illustrate
- sub using regular expressions
- import: reading dates into 'Date' format using formats: Include all %a %b %Y and others?
- export: writing a date into a character string using format( Date.object, "%d-%m-%Y") to create variable SPSS can read
- Variable references
- deal with plethora of ways used differently in different places:
- formula ( ~id ), good for variables in different roles ( y ~ log(x) + x2 | id)
- interpreted in data: (id), good for single var but can use list : (list(x1,x2))
- Examples:
- Examples:
- fully reference: dd$id
- deal with plethora of ways used differently in different places:
- Beware:
- aggregate with a formula drops rows with NAs even though the FUN might be able to handle them
- multiple barplot: http://rtricks.wordpress.com/2009/10/26/multbar-advanced-multiple-barplot-with-sem/
Add
- Discussion of memory issues: what happens when you work on two computers
Links
- R blog
- John Fox: ICPSR: 2010: Overview including slides on building R packages
- Introduction to the R computing environment
- ICPSR 2011:
Notes for High School Talks
Climate change
Excel techniques
- Regular expressions and string substitution
- [2]
Ellipse Seminar
Setting up mathstat email in Thunderbird
IMAP mailserver: mathstat.yorku.ca Port: 143 Security: STARTTLS Outgoing: mathstat.yorku.ca Port: 587 Security: none?
Statistical amusement
- Statistical Song channel on youtube: http://www.youtube.com/user/StatisticalSongs
- It had to be U ... on the SVD: http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I
- It don't mean a thing if you don't do modelling: http://www.youtube.com/user/StatisticalSongs#p/u/0/Jzm2hrEfNdY
On Careers in Statistics and Mathematics
On Teaching Science
A few videos
- Why Teach Science by James Randi
- Teaching Introductory Physics
- Brian Goldman on learning from mistakes