I recently read something that said using the RESTORE WITH REPLACE command could be faster than dropping a database and then performing a RESTORE, because the shell of the file could be used and therefore skip file initialization. I did not think that was the case, but books online wasn’t clear about the situation, so I went ahead and built a quick test case, using ProcMon from sysinternals. If you aren’t familar with the sysinternals tools, you should be—they are a good way to get under the hood of your Windows Server to see what’s going on, and if you’re old like me, you probably used PSEXEC to “telnet” into a Windows server to restart a service before RDP was a thing.
Read on to see how the processes compare.
If your user is a database owner, (i.e. is a member of the db_owner group or has CONTROL permissions on the database) the default schema will always be dbo. This is something you can’t change.
So if your legacy application needs quasi-administrative privileges in the database, you can’t make it a database owner, but you can grant those permissions on the schema instead (which is actually a better idea anyway).
What Daniel is doing is akin to the pre-2005 concept of user spaces, where Bob had a schema and Mary had a schema and Jill had a schema and so forth.
This post is another in the continuing theme of “making things consistent.” We were voluntold to help another team get their staging environment set up. Piece of cake, SQL Compare made it trivial to snap the tables over.
Oh, we don’t want these tables in Custom schema, we want them in dbo. No problem, SQL Compare again and change owner mappings and bam, out come all the tables.
Oh, can we get this in near real-time? Say every 15 minutes. … Transaction replication to the rescue!
Oh, we don’t know what data we need yet so could you keep it all, forever? … Temporal tables to the rescue?
Yes, temporal tables is perfect. But don’t put the history table in the same schema as the table, put in this one. And put all of that in its own file group.
Click through for a helpful script, and tune in next time, when the other team has Bill move their furniture around. Maybe move the couch just a hair to the right…no, a little more, oops, too much…
Doug Kline has a new series on window functions. First, he looks at differences between RANK, DENSE_RANK, and ROW_NUMBER:
— Quick! What’s the difference between RANK, DENSE_RANK, and ROW_NUMBER?
— in short, they are only different when there are ties…
— here’s a table that will help show the difference
— between the ranking functions
— note the [Score] column,
— it will be the basis of the ranking
— here’s a simple SELECT statement from the Products table
ORDER BY UnitPrice DESC
— this shows that the highest priced product is Cote de Blaye, productID 38
— but sometimes the *relative* price is more important than the actual price
— in other words, we want to know how products *rank*, based on price
Doug’s entire posts are T-SQL scripts along with embedded videos.
This month’s T-SQL Tuesday was all about Big Data. See what the community has to say about Big Data with this collection of articles ranging from deep technical walk-throughs to musings about Big Data’s impact on our industry and the data professional.
Click through to see the participants.
In this post, I am going to share my experiment in how to do file management in ADLS using R studio,
to do this you need to have below items
1. An Azure subscription
2. Create an Azure Data Lake Store Account
3. Create an Azure Active Directory Application (for the aim of service-to-service authentication).
4. An Authorization Token from Azure Active Directory Application
It’s pretty easy to do, as Leila shows.
The Azure Data Lake store is an Apache Hadoop file system compatible with HDFS, hosted and managed in the Azure Cloud. You can store and access the data within directly via the API, by connecting the filesystem directly to Azure HDInsight services, or via HDFS-compatible open-source applications. And for data science applications, you can also access the data directly from R, as this tutorial explains.
To interface with Azure Data Lake, you’ll use U-SQL, a SQL-like language extensible using C#. The R Extensions for U-SQL allow you to reference an R script from a U-SQL statement, and pass data from Data Lake into the R Script. There’s a 500Mb limit for the data passed to R, but the basic idea is that you perform the main data munging tasks in U-SQL, and then pass the prepared data to R for analysis. With this data you can use any function from base R or any R package. (Several common R packages are provided in the environment, or you can upload and install other packages directly, or use the checkpoint package to install everything you need.) The R engine used is R 3.2.2.
Click through for the details.
Selecting the variables in the Deducer GUI:
Outcome variable: Y, or the dependent variable, should be put on this list
As numeric: Independent variables that should be treated as covariates should be put in this section. Deducer automatically converts a factor into a numeric variable, so make sure that the order of the factor level is correct
As factor: Categorically independent variables (language, ethnicity, etc.).
Weights: This option allows the users to apply sampling weights to the regression model.
Subset: Helps to define if the analysis needs to be done within a subset of the whole dataset.
Deducer is open source and looks like a pretty decent way of seeing what’s available to you in R.
Now that we know the read and write throughput characteristics of a single Data Node, we would like to see how per-node performance scales when the number of Data Nodes in a cluster is increased.
The tool we use for scale testing is the Tera* suite that comes packaged with Hadoop. This is a benchmark that combines performance testing of the HDFS and MapReduce layers of a Hadoop cluster. The suite is comprised of three tools that are typically executed in sequence:
TeraGen, that tool that generates the input data. We use it to test the write performance of HDFS and ADLS.
TeraSort, which sorts the input data in a distributed fashion. This test is CPU bound and we don’t really use it to characterize the I/O performance or HDFS and ADLS, but it is included for completeness.
TeraValidate, the test that reads and validates the sorted data from the previous stage. We use it to test the read performance of HDFS and ADLS.
It’s an interesting look at how well ADLS scales. In general, my reading of this is fairly positive for Azure Data Lake Store.
For this simple test, it worked pretty well, and it should work well for most of the requirements that you have. Time will tell how reliably this new feature does work.
The Import Flat File is available when connecting to SQL Server version 2005 or higher. I haven’t tried this on a lower version, but I don’t see any reason why it wouldn’t work there either. You can read more about this feature in Microsoft’s documentation.
You can definitely break the Import Flat File feature, but I appreciate it being smoother than the SSIS-based wizard of yore. Wayne also shares his thoughts on the Extended Events Profiler.