Press "Enter" to skip to content

Month: August 2017

Pagination In Oracle Versus SQL Server

Daniel Janik is currently running an Oracle versus SQL Server series, looking at how the two database systems expose common functionality.  His latest topic is pagination:

Today’s topic is Pagination. Paging is a really important feature for web pages and applications. Without it you’d be passing large amounts of data to the application and expecting the application code to figure out which rows it needed to display.

Thankfully, someone smart came up with a way to do this on the database so you’re not returning gigs and gigs of data to the webserver to sort through.

Read on to see how the two platforms do this.

Comments closed

DevOps And The DBA

Kellyn Pot’Vin-Gorman gives a bit of DevOps 101 to database administrators:

Monitoring changes a bit with DevOps. It’s less about a simple tier and moves to the entire infrastructure. A need to monitor application, host, database and availability between each is essential. As these different tiers rarely come from one vendor and many may even be proprietary, there are requirements to monitor using multiple tools, scripts and interfaces.

Two of the main products for monitoring, recognized in the DevOps community are New Relic and AppDynamics. Monitoring can be as simple as a suite of scripts that report the health and status of processes and orchestration, notifying if there is any failure. This choice normally has a scaling limit and at some point, a more robust solution is required or gaps are felt in the monitoring process or failure at certain tiers. More enterprise solutions, such as New Relic and AppDynamics and enhanced by logging suites like Splunk and Sumo Logic.

Read the whole thing.

Comments closed

Index Changes And Recompilation

Kendra Little explains how index modifications affect the plan cache:

Creating an index doesn’t remove a plan from cache

We run our Demo Query five times. Querying sys.dm_exec_query_stats and related DMVs, we see five executions, plan_generation_num = one.

Then we create an index on a table referenced by the query.

Querying the DMVs we still see five executions, plan_generation_num = one, and the query plan in cache. It hasn’t been removed!

Read on for more, including what happens when you run the query again, what happens when you rebuild indexes, and what happens when you drop an index.

Comments closed

Deletes And Indexes

Jeff Schwartz looks at the performance cost of indexes when it comes to deleting rows:

Many articles concerning SQL Server discuss how record insertion overhead increases with each additional index. They discuss b-tree manipulations and page splits in addition to leaf and non-leaf levels. However, few discuss the fact that deletion overhead increases as well, especially when large numbers of records are deleted by individual queries. Recently, I was working with a client who regularly purged large numbers of records from tables that ranged in size from large to gigantic. For example, one table contained over 6.5 billion records. I added an index (4th overall) to one table expressly for the purpose of expediting the large deletion process, and the deletion run ran longer, despite using the new index! To determine how the numbers of indices and records to be deleted interact, I conducted an experiment to test several combinations. The specifics of the tests and their corresponding results are summarized below.

Check it out.  There’s certainly more to the story than “add indexes to improve performance.”

Comments closed

Cloudera Director And AWS Spot Instances

David Han shows off some new features in Cloudera Director 2.5 to help when building Hadoop clusters on AWS spot price instances:

You can configure Spot instances in Cloudera Director’s instance templates. These instance templates contain a flag indicating whether Spot instances should be used, as well as a field specifying the bid price for those instances.

Each instance group in the cluster template includes a field that indicates the minimum number of instances required in that group for the cluster to be considered successful. Cloudera Director will continue with bootstrapping or growing a cluster if the minimum count for each instance group is satisfied. Spot instances should not be used for instance groups that are required for the normal operation of the cluster, such as HDFS DataNodes. Instance groups configured to use Spot instances should set their minimum number to zero with the expectation that the instances may not be provisioned due to the Spot bid price being lower than the Spot price.

If you’re able to take advantage of spot instances, you can end up saving a pretty good amount of money.

Comments closed

Big Performance Tuning Mistakes

Erik Darling enumerates common performance tuning mistakes:

Coming in at number five and looking alive! Did you know that query plans can be different on busy servers? I bet not! And aside from that, your performance problem might not even be the query itself, it may be blocking, or a poison wait. This stuff may not show up in Dev, unless yous spend a lot of time and money engineering load tests.

This is what safety groups call “Situational Awareness”, and this is the kind of stuff that you really want a monitoring tool in place for. Sure, that query ran slowly, but if that’s all you know, and you can’t reproduce it, then you need to start digging deeper.

There are a number of tips here, and that number is five.

Comments closed

Searching In Execution Plans

Pedro Lopes gives a reason to upgrade to SSMS 17.2:

But what if I could search for node id 30 while looking at the graphical showplan?

Starting with SSMS 17.2, just use CTRL+F to start a search in graphical showplan (or right-click on a blank area of the plan, and in the context menu click on Find Node option), and you can quickly see exactly where node id 30 is:

I will have to see whether that also lets you quickly find the origin of Expr#### columns.

Comments closed

Understanding K-Means Clustering

Chaitanya Sagar has a good explanation of the assumptions k-means clustering makes:

Why do we assume in the first place? The answer is that making assumptions helps simplify problems and simplified problems can then be solved accurately. To divide your dataset into clusters, one must define the criteria of a cluster and those make the assumptions for the technique. K-Means clustering method considers two assumptions regarding the clusters – first that the clusters are spherical and second that the clusters are of similar size. Spherical assumption helps in separating the clusters when the algorithm works on the data and forms clusters. If this assumption is violated, the clusters formed may not be what one expects. On the other hand, assumption over the size of clusters helps in deciding the boundaries of the cluster. This assumption helps in calculating the number of data points each cluster should have. This assumption also gives an advantage. Clusters in K-means are defined by taking the mean of all the data points in the cluster. With this assumption, one can start with the centers of clusters anywhere. Keeping the starting points of the clusters anywhere will still make the algorithm converge with the same final clusters as keeping the centers as far apart as possible.

Read on as Chaitanya shows several examples; the polar coordinate transformation was quite interesting.  H/T R-Bloggers

Comments closed

Spheres In SQL Server

Slava Murygin continues his quest to build a graphics engine with spatial data:

Couple of years ago I came up with an algorithm of drawing an ellipse using SQL Server spatial geometry: http://slavasql.blogspot.com/2015/02/drawing-ellipse-in-ssms.html

I’ve used that algorithm to make a sphere and as in my previous blog of drawing 3D Cube I use external procedure to simplify the process.
This time instead of temporary stored procedure I’m using a function to generate Geometrical content.

This has been an enjoyable series so far, showing how to build different shapes using spatial queries.

Comments closed