Press "Enter" to skip to content

Author: Kevin Feasel

Figuring Out Azure Analysis Services Costs

Chris Webb explains that Azure Analysis Services might not be quite as expensive as you’d first think:

What does this mean for the cost of Azure Analysis Services? Basically, if you’re taking advantage of these features you won’t pay one of the monthly prices quoted on the pricing page linked to at the top of this post. Instead you may do things like:

  • Scale up for one hour every day when you need to process your SSAS database, just to get the extra memory and QPUs needed, then scale down when processing has finished
  • Scale out only on certain days, or certain times of day, to handle increased numbers of users
  • Pause your instance when you are sure that no-one needs to run queries

How do you then calculate the likely cost? For my Azure Analysis Services precon at SQLBits a few months ago I built an Excel workbook that shows how to go about this.

There are some good questions in the comments section, so check those out as well.

Comments closed

BCP And Multiple SQL Server Instances

Manoj Pandey investigates an interesting issue with BCP:

I observed one thing here with BCP (Bulk Copy Program), when you have 2 versions of SQL Server installed on you PC or Server. I had SQL Server 2014 & 2016 installed on one of my DEV server.
So if you are executing Query from SQL 2016 instance, it was inserting records in SQL 2014 instance:

exec master..xp_cmdshell ‘BCP AdventureWorks2014.Person.Address2 IN d:\PersonAddressByQuery.txt -T -c’

But even if you use BCP 2016 version, it was still inserting in SQL 2014 instance:

Read on for the reason as well as how to specify which instance you want to use.

Comments closed

Apache Pulsar 2.0 Released

George Leopold reports on a new version of Apache Pulsar:

The startup’s Apache Pulsar 2.0 released on Wednesday (June 6) adds new functionality designed to move data users “beyond batch” processing. Among them is a “stream-native” processing capability called Pulsar Functions designed to apply analytics to data as its flows through the Pulsar platform. Processing functions can be written in either Java or Python, the company said.

Debuted earlier this year as a preview feature, Streamlio announced general availability of Functions this week as part of its 2.0 release.

Another is a Pulsar enhancement developed in conjunction with Apache Bookkeeper, a scalable storage system. Streamlio said the new features, called Topic Compaction, delivers streaming data storage designed to improve the performance of applications consuming data from Pulsar. It serves as a “broker” that builds a snapshot of the latest value for each topic key, the startup said.

Read the whole thing.

Comments closed

Removing Time From A DateTime

Wayne Sheffield compares the performance of four methods for removing time from a DateTime data type:

Today, we’ll compare 3 other methods to the DATEADD/DATEDIFF method:

  1. Taking advantage of the fact that a datetime datatype is stored as a float, with the decimal being fractions of a day and the whole numbers being days, we will convert the datetime to float, taking the floor (just the whole numbers), and converting back to datetime.
  2. Using the DATEADD/DATEDIFF routine.
  3. Converting the datetime to DATE and back to datetime.
  4. Converting the datetime to varbinary (which returns just the time), and subtracting that from the datetime value.

While there are other ways of stripping the time (DATETIMEFROMPARTS, string manipulation), those ways are already known as poorly performing. Let’s just concentrate on these four.

Click through for the methods, as well as a performance test to see which is fastest.

Comments closed

Scatterplot Matrices

The Plotly folks show off scatterplot matrices in Python:

The scatterplot matrix, known acronymically as SPLOM, is a relatively uncommon graphical tool that uses multiple scatterplots to determine the correlation (if any) between a series of variables.

These scatterplots are then organized into a matrix, making it easy to look at all the potential correlations in one place.

SPLOMs, invented by John Hartigan in 1975, allow data aficionados to quickly realize any interesting correlations between parameters in the data set.

In this post, we’ll go over how to make SPLOMs in Plotly with Python. For extra insights, check out our SPLOM tutorial in Python and R.

fff

Comments closed

Missing @@SERVERNAME On Linux

Steve Jones fixes a naming issue on his SQL on Linux installation:

I setup a new instance of SQL Server on Linux some time ago. At the time, the Linux machine didn’t have any Samba running, and no real “name” on the network. As a result, after installing SQL Server I got a NULL when running SELECT @@SERVERNAME.

The fix is easy. It’s what you’d do if you had the wrong name.

Read on for the command, and don’t forget to restart the database engine afterward.

Comments closed

Restoring Point-In-Time To Another Azure SQL Managed Instance

Jovan Popovic announces an improvement to Azure SQL Database Managed Instances:

Azure SQL Database Managed Instance enables you to create a database as a copy of another database at some point in time in the past. This is known as point-in-time restore feature, and up till now you could perform point-in-time restore only within the same instance.

The latest release of Azure SQL Database Managed Instance enables you to perform point-in-time restore of a database from one instance to another. This might be useful if you need to be sure that you could easily restore a database to another instance if there is some issue on the original instance, or if you need a database for testing or auditing purposes on the test instance and you want to use copy of some of the existing database on another server.

Click through for the current requirements and limitations, as well as a sample.

Comments closed

Polybase Rejected Row Location

Casey Karst announces a nice improvement to Polybase on Azure SQL Data Warehouse:

Every row of your data is an insight waiting to be found. That is why it is critical you can get every row loaded into your data warehouse. When the data is clean, loading data into Azure SQL Data Warehouse is easy using PolyBase. It is elastic, globally available, and leverages Massively Parallel Processing (MPP). In reality clean data is a luxury that is not always available. In those cases you need to know which rows failed to load and why.

In Azure SQL Data Warehouse the Create External Table definition has been extended to include a Rejected_Row_Location parameter. This value represents the location in the External Data Source where the Error File(s) and Rejected Row(s) will be written.

This is a big improvement, one that I hope to see on the on-prem product.

Comments closed

Updating Hive Tables

Carter Shanklin gives us a few patterns for updating tables in Hive:

Historically, keeping data up-to-date in Apache Hive required custom application development that is complex, non-performant, and difficult to maintain. HDP 2.6 radically simplifies data maintenance with the introduction of SQL MERGE in Hive, complementing existing INSERT, UPDATE, and DELETE capabilities.

This article shows how to solve common data management problems, including:

  • Hive upserts, to synchronize Hive data with a source RDBMS.

  • Update the partition where data lives in Hive.

  • Selectively mask or purge data in Hive.

This isn’t the Hive of 2013; it’s much closer to a real-time warehouse.

Comments closed