In this post we will explore a common statistical term – Relative Risk, otherwise called Risk Factor. Relative Risk is a term that is important to understand when you are doing comparative studies of two groups that are different in some specific way. The most common usage of this is in drug testing – with one group that has been exposed to medication and one group that has not. Or , in comparison of two different medications with two groups with each exposed to a different one.
Read on for an example of a statistical formula calculation which might actually be easier in T-SQL than R.
Mala Mahadevan explains the Cochran-Mantel-Haenszel test, with two parts up so far. First, her data set:
Below is the script to create the table and dataset I used. This is just test data and not copied from anywhere.
This test is an extension of the Chi Square test I blogged of earlier. This is applied when we have to compare two groups over several levels and comparison may involve a third variable.
Let us consider a cohort study as an example – we have two medications A and B to treat asthma. We test them on a randomly selected batch of 200 people. Half of them receive drug A and half of them receive drug B. Some of them in either half develop asthma and some have it under control. The data set I have used can be found here. The summarized results are as below.
This series is not yet complete, so stay tuned.
Recently I had a conversation with a colleague that wasn’t aware of the REVERSE function in SQL Server. I had sent them a code snippet and they inquired as to what the purpose of the function was. Essentially, this function that reverses any string value.
For example, the phrase:
The brown dog jumped over the lazy fox
reversed looks like this
xof yzal eht revo depmuj god nworb ehT
Read on to see a good example of when you might use REVERSE.
People will often tell you to clearly alias your tables, and they’re right. It will make them more readable and understandable to whomever has to read your code next, puzzling over the 52 self joins and WHERE clause that starts off with 1 = 2. It can also help solve odd performance problems.
Take this query, for instance.
This isn’t just for subqueries; even simple joins can go haywire when you accidentally use the wrong alias and both tables happen to have the same column name.
GOis a special little guy. It’s not exactly T-SQL. It’s a way of telling the SQL Server Management Studio (SSMS) to send everything before it, from the beginning of the script or the preceding
GO, to the SQL Server instance.
If you read the documents, the main point to take away is…
A Transact-SQL statement cannot occupy the same line as a GO command. However, the line can contain comments.
The problem turns out to be a little trickier than you’d first imagine. Also, after reading this post, I think it’s lunchtime…
The decision rule in two sample tests of hypothesis depends on three factors :
1 Whether the test is upper, lower or two tailed (meaning the comparison is greater, lesser or both sides of gender and speaker count)
2 The level of significance or degree of accuracy needed,
3 The form of test statistic.
Our test here is to just find out if gender and speaker count are related so it is a two tailed test. The level of significance we can use is the most commonly used 95% which is also the default in R for Fischer’s Test. The form of the test statistic is P value. So our decision rule would be that gender and speaker category are related if P value is less than 0.05.
Click through for the R code followed by a code sample which should explain why you don’t want to do it in T-SQL.
Commas are all the rage nowadays:
There are a fair number of questions nowadays about returning data from a database in a comma separated string. Sure the application should probably do that but hey, database servers are expensive, why not get some bang for your bucks!
Protip: I use the colon as a separator because sometimes commas will get stuck in the Ethernet cable.
Finally, this week I had a breakthrough. I was working on updating a bunch of data but it was breaking on a small subset of that data. In this case, I was attempting to
JOINtwo tables on fields that should have been
INTs, but in a very small number of cases one side was using a comma-delimited string. The user told me that someone else had done these updates in the past and didn’t encounter the problem I was having (so I knew that it was something i was doing “wrong”), but given that it was only a handful of broken updates she was OK with manually doing the updates (we were scripting it because we were updating potentially tens of thousands of records).
I am not OK with manually fixing this in the future. I wanted to know how the other DBA had done it before. I dug into some history and found
CROSS APPLY. My nemesis. I was determined to figure out how to use it this time.
The APPLY operator is extremely powerful in the right set of circumstances. Andy shows the “classic” use case, but there are a number of other uses for the operator.
Now normally FizzBuzz is done with a loop, but as Russ said, we are using T-SQL so batch code is always the goal. That said, what table should I query to get the numbers 1-100? Well, I decided I’d just do something simple here and use a system view that’s more than 100 rows, the ROW_NUMBER function, and restrict it using TOP.
Read on to see several answers to this problem, some better than others.
Have you every executed a query in SQL Server Management Studio, looked at the execution plan, and noticed that it was a different plan than what was generated on the server?
A potential reason for this could be a different option settings. The options represent the SET values of the current session. SET options can affect how the query is execute thus having a different execution plan. You can find these options in two places within SSMS under Tools -> Options -> Query Execution -> SQL Server -> Advanced.
Click through for lots of information including a script John provided to see which options are currently on.