It’s taken a little bit longer than usual, but Microsoft R Open 3.5.2 (MRO) is now available for download for Windows and Linux. This update is based on R 3.5.2, and accordingly fixes a few minor bugs compared to MRO 3.5.1. The main change you will note is that new CRAN packages released since R 3.5.1 can now be used with this version of MRO.
David also lets us know that they’re working on 3.6.0’s release.
To understand relationship or dependencies among categorical variables, we take advantage of various types of tables and graphical methods. Also stratifying variables can be encompassed in order to highlight if the relationship between two primary variables is the same or different for all levels of the stratifying variable under consideration.
The contingency table are said to be of one-way flavor when involving just one categorical variable. They are said two-way when involving two categorical variables, and so on (N-way).
Read on for various techniques for data analysis against categorical variables.
Time limits: ending the neverending
Here’s one to ponder: how long can a long-running action go on before the customer (even a very patient, very digital customer) loses all interest in the outcome?
Pull up a chair. With no upper bound, we could be here a while.
In my last post, I showed how you can use the SSIS PowerShell Provider to execute an SSIS package with PowerShell. Of course, in order to execute that SSIS package, it has to get deployed first. In Part 5 of Andy Leonard’s “SSIS, Docker, and Windows Containers” series he used some PowerShell code from Matt Masson’s blog post to deploy an .ISPAC file to the SSIS catalog.
Click through for the code.
When planning for memory needs, it is critical to understand how SQL Server Integration Services uses memory. SSIS will allocate memory from the unallocated system memory for each package executed, and surrenders that memory shortly after the package completes its execution. The memory allocated for SSIS package executions runs in the SSIS execution runtime process (ISServerExec.exe, if you are executing the package from the SSIS catalog).
Here’s where the package design has a significant impact on memory use. If a package uses an SSIS data flow, all of the data passing through that data flow is written to memory used by SSIS. For example, consider a package that loads 10 million rows from a flat file to a table. In this case, all 10 million rows will pass through the SSIS memory space during package execution.
Read on as Tim goes into good detail on the topic.
What is good code style? You probably have some opinions about this. In fact, I’m willing to bet you might even have some very strong opinions about this; I know I do. Whether consciously or not, we tend to frame good coding practices as a moral issue. Following good coding practices makes us feel virtuous; ignoring them makes us feel guilty. I can guess that this is why Yom said “I don’t think I could bring myself to be satisfied with partial functions” [emphasis added]. And this is why we say “good code style”, not “optimal” or “rational” or “best practice” code style.
It’s an interesting post. There are some bits on competitive programming which don’t apply in general, but there’s a lot to unpack there.
There is a lot of confusion when it comes to designing tables in SQL Server around whether to pluralize names or not. How do you choose whether to pluralize or not?
If we want to store a list of people and their details do we use “Person”, “Persons”, “People” or “Peoples”? Some people will use “People” and some will use “Person”, other persons or people would go for “Peoples” or “Persons”.
My preference is singular. In the event that I do pluralize a table, I use a grammatically correct pluralization. None of this “childs” nonsense.
In that demo, the AdventureWorks sample database was initially set to compatibility level of 140 (SQL Server 2017 default compatibility) to execute a scalar UDF. At this point, the estimated execution plan showed that the UDF was given a cost of 0%, and performance was terrible (the expected behaviour). Then the database compatibility level was switched to 150 (which is all that’s required to enable this new optimization feature), the query was executed again, the UDF was inlined, and performance improved dramatically.
This is where it got interesting. As a test, the compatibility level of the database was set back to 140, but the query plan continued to inline the UDF. Curious. Flushing the plan cache didn’t change the outcome (even though we knew it wasn’t necessary). Had we discovered a bug in a preview version of SQL Server 2019? It was CTP 2.2 after all, and since then (at the time of this writing) CTP 2.5 is already available.
Read on for the answer.
Clean the data and optionally aggregate it as it sits in source system. The tool used for this would depend on the source system that stores the data (i.e. if SQL Server, you would use stored procedures). The only benefit with this option is if you aggregate the data, you will move less data from the source system to Azure, which can be helpful if you have a small pipe to Azure and don’t need the row-level details. The disadvantages are: the raw source data is not available in the data lake, so you would always need to go back to source system if you needed to get it again, and it may not even still exist in the source system; you would put extra stress on the source system when doing the cleaning which could affect end users using the system; it could take a long time to clean the data as the source system may not have fast performance; and you would not be able to use other tools (i.e. Hadoop, Databricks) to clean it. Strongly advise against this option
Read on for additional options and James’s recommendations.