Chi Square Tests

Mala Mahadevan discusses how to perform a Chi Square test:

For any dataset to lend itself to the Chi Square test it has to fit the following conditions  –

1 Both  variables are categorical (in this case – exposure to smoking – yes/no, and health condition – sick/not sick are both categorical).
2 Researchers used a random sample to collect data.
3 Researchers had an adequate sample size.Generally the sample size should be at least 100.
4 The number of respondents in each cell should be at least 5.

This is an easy case for using R over T-SQL—the Chi Square test is built in, whereas you have to roll your own T-SQL code.  Mala does show you how to do this from within SQL Server R Services as well.

Related Posts

Building an Image Classifier with PyTorch

Rogier van der Geer shows how you can use PyTorch to build out a Convolutional Neural Network for image classification: The tool that we are going to use to make a classifier is called a convolutional neural network, or CNN. You can find a great explanation of what these are right here on wikipedia. But we […]

Read More

xgboost and Small Numbers of Subtrees

John Mount covers an interesting issue you can run into when using xgboost: While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation).In doing that I ran into one more […]

Read More


September 2016
« Aug Oct »