Press "Enter" to skip to content

Month: September 2019

strace and SQL Server Containers

Anthony Nocentino tries using strace to diagnose SQL Server process activity in a container:

We’re attaching to an already running docker container running SQL. But what we get is an idle SQL Server process this is great if we have a running workload we want to analyze but my goal for all of this is to see how SQL Server starts up and this isn’t going to cut it.
 
My next attempt was to stop the sql19 container and quickly start the strace container but the strace container still missed events at the startup of the sql19 container. So I needed a better way.

Don’t worry—Anthony finds a better way.

Comments closed

dbatools: the Book

Chrissy LeMaire has an exciting announcement:

After nearly 10 months of work, early access to Learn dbatools in a Month of Lunches is now available from our favorite publisher, Manning Publications!

For years, people have asked if any dbatools books are available and the answer now can finally be yes, mostly. Learn dbatools in a Month of Lunches, written by me and Rob Sewell (the DBA with the beard), is now available for purchase, even as we’re still writing it. And as of today, you can even use the code bldbatools50 to get a whopping 50% off.

They’re in active book development, so buy a copy now and watch as the book evolves.

Comments closed

Backup to URL in Azure

Joey D’Antoni recommends backing up database to Blob Storage via URL in Azure:

Unlike in your on-premises environment, where you might have up to a 32 Gbps fibre channel connection to your storage array and then a separate 10 Gbps connection to the file share where you write your SQL Server backups, in the cloud you have a single connection to both storage and the rest of the network. That single connection is metered and correlates to the size (and $$$) of your VM. So bandwidth is somewhat sacred, since backups and normal storage traffic go over the same limited tunnel. This doesn’t mean you can’t have good storage performance, it just means you have to think about things. In the case of the customer I mentioned, they were saturating their network pipe, by writing backups to the file system, and then having the Azure backup service backup their VM, they were saturating their pipe and making regular SQL Server I/Os wait.

Read on to see what the alternative is.

Comments closed

Exploratory Data Analysis with ExPanDaR

Joachim Gassen walks us through the ExPanDaR package in R:

The ‘ExPanDaR’ package offers a toolbox for interactive exploratory data analysis (EDA). You can read more about it here. The ‘ExPanD’ shiny app allows you to customize your analysis to some extent but often you might want to continue and extend your analysis with additional models and visualizations that are not part of the ‘ExPanDaR’ package.

Thus, I am currently developing an option to export the ‘ExPanD’ data and analysis to an R Notebook. While it is not ready for CRAN yet, it seems to work reasonably well and I would love to see some people trying it and letting me know about any bugs or other issues that they encounter. Hence, this blog post.

Looks like an interesting package. H/T R-bloggers

Comments closed

Paired RDDs in Spark

Ramandeep Kaur explains how Paired Resilient Distributed Datasets (PairRDDs) differ from regular RDDs:

So, assuming that you have a fair idea about what Spark is and the basics of RDDs. Paired RDD is one of the kinds of RDDs. These RDDs contain the key/value pairs of data. Pair RDDs are a useful building block in many programs, as they expose operations that allow you to act on each key in parallel or regroup data across the network. For example, pair RDDs have a reduceByKey() method that can aggregate data separately for each key, and a join() method that can merge two RDDs together by grouping elements with the same key.

When datasets are described in terms of key/value pairs, it is common to want to aggregate statistics across all elements with the same key.

Paired RDDs bring us back to that key-value pair paradigm which Hadoop’s version of MapReduce brought to the forefront.

Comments closed

Saving Data in Docker Containers

Anthony Nocentino has a three-part series on persisting SQL Server data in Docker containers. Part 1 takes us through volumes:

Let’s talk about how we can use Docker Volumes and SQL Server to persist data. If we want to run SQL Server in a container we will want to decouple our data from the container itself. Doing so will enable us to delete the container, replace it and start up a new one pointing at our existing data. When running SQL Server in a container will store data in /var/opt/mssql by default. When the container starts up for the first time it will put the system databases in that location and any user databases created will also be placed at this location by default. 

Part 2 looks at how volumes differ between the Linux and Mac/Windows versions of Docker:

So in my previous post, we discussed Docker Volumes and how they have a lifecycle independent of the container enabling us to service the container image independent of the data inside the container. Now let’s dig into Volumes a little bit more and learn where Docker actually stores that data on the underlying operating system.  

Part 3 ties it in with SQL Server:

Makes sense…we changed where SQL Server is reading/writing data. macOS doesn’t support a file mode called O_DIRECT which allows for unbuffered read/write access to the file opened using the open system call.  O_DIRECT is used by systems that manage their own file caching, like relational database management systems (RDBMS). So as SQL starts up and tries to open the master database with O_DIRECT the files can’t be opened because the macOS kernel doesn’t support this mode. And this is the reason why we have to have that Linux VM around. That Linux VM will support O_DIRECT option on the file opened. See more about this at the GitHub issue here.

Definitely worth getting a handle on this if you’re interested in containers.

Comments closed

Azure Data Studio Server Groups

Dave Bland explains how we can get different tab colors in Azure Data Studio:

Have you ever been in an environment where you worked with both DEV\QA servers and Production servers?  Of course you have, we all have.  However, being in such an environment brings a certain level of risk.  We all try to be very careful to run any code we are asked to run on the correct server and in the correct environment.  Despite all of our checks and our Due diligence, we are still human and do make a mistake from time to time.  Azure Data Studio has a nice feature that will allows us to change the color of tabs based on the environment the server is in.  The image below is an example of what it would look like.

I really like this functionality because it gives you one additional warning that you’re about to run a dev script against a production server on accident.

Comments closed

When Indexes Collide

Andy Mallon gives us a case where it makes sense to have a non-clustered index which shares the same columns as your clustered index columns:

First off, let’s remember the difference between clustered & nonclustered indexes

The clustered index is organized by the key columns. It also includes every other column as part of the row structure (ie, it has the entire row).

The nonclustered index is also organized by the key columns. It implicitly includes the clustering key columns (if the table is clustered), or a pointer to the row (if the table’s a heap). If any INCLUDE columns are explicitly specified, they will also be included in the index structure (but these included columns don’t affect order).

I’ve seen other cases where it made sense on sufficiently large and wide tables even for seeks (where the page density difference is large enough that you have a 4-level clustered index but a 3-level non-clustered index), so I think there’s more than Andy’s one corner case. But I do agree that it generally doesn’t help.

Comments closed

Multi-Column Slicers in Power BI

Marco Russo shows us how we can filter on multiple columns in a single slicer in Power BI:

Power BI provides slicers for a single column, but there are scenarios where it could be useful to consolidate alternative filters for multiple columns in a single slicer. Technically, this is not possible in Power BI through the standard visualizations, but you can use a particular data modeling technique to obtain the desired result.

Consider the case of a Customer table with a geographical hierarchy with ContinentCountry, and State. The requirement is to enable a filter over California (State), France (Country), or Asia (Continent) using a single slicer

Marco takes us through the process and offers up a clever solution.

Comments closed