Kevin Feasel – Page 829

In-Memory OLTP and HammerDB Setup Error

Published 2020-11-06 by Kevin Feasel

This a short post about a sort of quirky error message I got while trying to run the TPC-C load test against SQL Server’s in-memory tables.
Error in Virtual User1: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The integer value XXXXXXXXXXXX is out of range.

Click through for the solution. Also, read down to the bottom of the post for a huge discount on Erik’s recorded training content. Erik is extremely knowledgeable and has a great way of explaining things, so take advantage of that knowledge.

Comments closed

GIS Capabilities in R

Published 2020-11-05 by Kevin Feasel

Lionel Hertzog shows off spatial capabilities in R:

All of these operations follow the same logic, st_operation(A, B) checks for each combinations of the geometries in A and B whether A operation B is true or false. For instance st_within(A, B) checks whether the geometries in A are within B, this is similar to st_contains(B, A), the difference between the two being the shape of the returned object. If A has n geometries and B has m, st_contains(B, A) returns a list of length m where each elements contains the row IDs (numbers between 1 and n) of the geometries in A satisfying the operation. By using sparse=FALSE the functions returns matrices, like st_within(A, B, sparse=FALSE) returns a n x m matrix, st_within(B, A, sparse=FALSE) returns a m x n matrix. Note that running st_operation(A, A) checks the operation between all geometries of the object, so returning a n x n matrix.

Click through for part 1 of the series.

Comments closed

A Myth Surrounding Confidence Intervals

Published 2020-11-05 by Kevin Feasel

Sebastian Sauer takes us through confidence intervals:

Some one claims:
“A CI is the interval where 95% of all possible sample means fall into”.
Let’s check that.

Read the whole thing. Confidence intervals have their value, but this is one case where the frequentist model of statistics can trip you up between hypothetical and real, changing an accurate statement into a myth.

Comments closed

A Primer on Multi-Server SQL Agent Administration

Published 2020-11-05 by Kevin Feasel

Mikey Bronowski gives us a glimpse at the power of MSX/TSX:

The MSX (master server) can be used to define SQL Agent jobs that then will be pushed over to one or more TSX (target servers). It is possible to see the server is either MSX or TSX when looking at SQL Agent in SQL Server Management Studio Object Explorer. The master server can be identified by (MSX) next to the SQL Server Agent. It will also have two subfolders under the Job folder: Local Jobs (for regular jobs) and Multi-Server Jobs.

This is something woefully few database administrators take advantage of. As the number of servers you have to mange increases, ensuring consistency in SQL Agent jobs becomes more and more difficult. MSX/TSX admittedly has some rough edges, but the tool does a lot.

Comments closed

Statistics Aren’t Guarantees

Published 2020-11-05 by Kevin Feasel

Brent Ozar lays out an important point:

You might say, “But SQL Server has statistics on those columns, and it knows what the top values are!” Well, that’s true, but…data can change without the statistics being updated. For example, say that one user logs in right now, and then we run the MAX query again:

Statistics tell the engine what they learned at the time they were ran. If you need guarantees, that’s what constraints are for.

Comments closed

Caching and Statistics in Synapse Dedicated SQL Pools

Published 2020-11-05 by Kevin Feasel

Tsuyoshi Matsuzaki takes us through statistics and caching in Azure Synapse Analytics Dedicated SQL Pools:

In Synapse Analytics, several database objects (such as, compiled procedure, plan, …) will be cached in some conditions.
For instance, CCI tables (see my previous post “Azure Synapse Analytics : Choose Right Index and Partition” for CCI) will locally cache the recently-used columnstore segments on distributed compute nodes, which is called columnar cache. The local disk-based cache is used on Gen2 caching.
You cannot manually control these caching activities. (These are automatically applied to improve performance in Synapse Analytics.) See team blog “Adaptive caching powers Azure SQL Data Warehouse performance gains” for underlying architecture which improves caching in Gen2.

Dedicated SQL Pool behavior is close enough to on-premises SQL Server that it’s easy to expect everything to be the same, but there are some nuances.

Comments closed

Triggering a Refresh with Power BI’s API

Published 2020-11-05 by Kevin Feasel

Martin Schoombee continues a series on automating Power BI deployments:

At times you may want to refresh a Power BI dataset from outside the portal, either on-demand or as part of another process (think DevOps for instance). In those cases the API provides the ideal mechanism to do so. Just remember that you are still limited to 8 refreshes a day if you don’t have a Premium workspace, and using this method will not work beyond the number of allowed refreshes. You also cannot count on the API to return a useful error message in that case.

Read on to see how.

Comments closed

Testing Format String Expressions

Published 2020-11-05 by Kevin Feasel

Gilbert Quevauvilliers shows us a neat way to check that the Format String Expression for a calculation group works:

Currently the only way to create and or edit the Format String Expression is in Tabular Editor. Whilst I love Tabular Editor it does not have any DAX Intellisense.
I kept on editing the Format String Expression and I honestly must have had to edit it 4 times and it still was wrong.
Below is a quick tip on what I did to make sure my Format String Expression was valid and would indeed work!

Read on to see how.

Comments closed

Listing Functions and More in Power Query

Published 2020-11-05 by Kevin Feasel

Nicky van Vroenhoven shows off #shared:

In addition to what the title suggests, you can not only show all the functions, but all tables, records, (custom) functions and query (results) inside Power Query.

Click through to see how.

Comments closed

Real-Time Data Warehousing in Cloudera

Published 2020-11-04 by Kevin Feasel

Justin Hayes gives us an overview of using Cloudera Data Platform for real-time data warehousing:

The simplest way to describe a RTDW is that it looks and feels like a normal data warehouse, but everything is faster even while massive scale is maintained. It is a type of data warehouse modernization that lets you have “small data” semantics and performance at “big data” scale.
– the data arrives into the warehouse faster – think streams of many millions of events per second constantly arriving
– the time it takes for the data to be optimally queryable is faster – query immediately upon arrival with no need for processing or aggregation or compaction
– the speed at which queries run is faster – small, selective queries are measured in 10s or 100s of milliseconds; large, scan- or compute-heavy queries are processed at very high bandwidth
– mutations of the data, when needed, are fast – if data needs to be corrected or updated for whatever reason, this can be done in place without large rewrites

Read on for more.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Author: Kevin Feasel