Press "Enter" to skip to content

Month: March 2020

Using Pre-Trained Sentiment Models with Power BI

Ryan Wade shows us how to use a pre-built sentiment analysis model with Power BI:

As of this writing, there are two pre-trained models available: one for sentiment analysis and another for image classification. This example focuses on sentiment analysis.

Both of these installations are freely available to the on-prem version of SQL Server 2017 and later. For more information on how to install these on your instance, reference this article for SQL Server Machine Learning Services and this article for pre-trained models.

Click through for step-by-step instructions.

Comments closed

Supporting Multiple NULL Values with a Unique Constraint

Itzik Ben-Gan walks us through a workaround in T-SQL:

Suppose that you have a database in a platform that supports the standard unique constraint and you need to migrate that database to SQL Server. You may face issues with the enforcement of unique constraints in SQL Server if the unique columns support NULLs. Data that was considered valid in the source system may be considered invalid in SQL Server. In the following sections I’ll explore a number of possible workarounds in SQL Server.

I use a simplified version of this as an interview question, so it’s nice to see an entire article from Itzik on the topic, including a couple solutions way outside the box.

Comments closed

Tracking Performance of Queries which use RECOMPILE Hints

Brent Ozar has some tips if you use RECOMPILE hints frequently:

The first query’s plan stuck around in memory, so it now shows 2 executions, and 2 total rows returned. Its row metrics are correct through the life of the stored procedure’s time in cache.

However, the second query – the one with the recompile hint – has a brand new plan in the cache, but also new metrics. You’re not just recompiling the execution plan, but you’re also not getting query plan metrics here. (That’s fine, and that part I was also kinda aware of.)

But the part that I keep forgetting is that when I’m looking at the stored procedure’s totals in sp_BlitzCache, the total, min, and max values are useless:

If the plan cache isn’t going to help, what will? Brent tells you exactly what.

Comments closed

Selecting From a Stored Procedure

Madhivanan shows us how to query the results of a stored procedure:

One of my friends asked me “Is it possible to query Stored Procedure resultset like a table. ie select * from (EXEC Stored_procedure)?”

Well. Querying the resultset from the Stored Procedure like Table can be done using OPENROWSET function

This is a fairly novel approach to the problem. In the past, I’ve inserted the results of a stored procedure into a temp table, but you can only do that if the procedure itself doesn’t call INSERT INTO ... EXEC ....

Comments closed

Data Lakes and the Power of Data Catalogs

Ashish Kumar and Jorge Villamariona take us through data lakes and data catalogs:

Any data lake design should incorporate a metadata storage strategy to enable business users to search, locate and learn about the datasets that are available in the lake. While traditional data warehousing stores a fixed and static set of meaningful data definitions and characteristics within the relational storage layer, data lake storage is intended to support the application of schema at read time with flexibility. However, this means that a separate storage layer is required to house cataloging metadata that represents technical and business meaning. While organizations sometimes simply accumulate content in a data lake without a metadata layer, this is a recipe for an unmanageable data swamp instead of a useful data lake. There are a wide range of approaches and solutions to ensure that appropriate metadata is created and maintained. Here are some important principles and patterns to keep in mind. Single data set can have multiple metadata layers dependent on use cases. e.g. Hive Metastore, Apache Glue etc. Same data can be exported to some NoSQL database which would have different schema.

Having a bunch of data isn’t helpful if you don’t know where it is, how it’s formatted, or anything else about the data.

Comments closed

Python Cross-Validation

John Mount has some advice if you’re doing cross-validation in Python:

Here is a quick, simple, and important tip for doing machine learning, data science, or statistics in Python: don’t use the default cross validation settings. The default can default to a deterministic, and even ordered split, which is not in general what one wants or expects from a statistical point of view. From a software engineering point of view the defaults may be sensible as since they don’t touch the pseudo-random number generator they are repeatable, deterministic, and side-effect free.

This issue falls under “read the manual”, but it is always frustrating when the defaults are not sufficiently generous.

Click through to see the problem and how you can fix it.

Comments closed

Extended Events and Query Store

Jason Brimhall takes us through some of the internals of Query Store as exposed by Extended Events:

One of my favorite questions to ask during some of presentations on XE is “What was the first version of SQL Server to have Query Store?” You can imagine the wide array of answers but what is interesting is how often  the correct answer is always missed. I hear lots of answers for 2012, some answers for 2017 and somewhere in between for 2016. But never does the correct answer pop up.

Right now, I hope you are scratching your head at that last statement. You see, the question is somewhat of a trick question. The first version of SQL Server that has QDS is SQL Server 2014. However, the first version where you can actually use it is SQL Server 2016. This fun fact is visible when we start exploring SQL Server from the realm of XE. Let’s take a look.

Read the whole thing.

Comments closed

Getting Started with Azure Cognitive Search

Matt How introduces us to Azure Cognitive Search:

Intuitive and powerful search technologies are becoming more and more important as businesses look to get more value from their unstructured data. Having the ability to full text search across an entire organisation’s worth of files can present huge opportunities for efficiency and understanding. Modern search tools now offer Artificial Intelligence (AI) capabilities that allow value driven enrichment of the raw content using Machine Learning and Data Science techniques. Microsoft’s Azure Cognitive Search product is a leader in this space and offers an excellent search experience with many out-of-the-box AI competencies.

Click through for an overview and a demo.

Comments closed

Oracle’s Automatic Workload Repository Explained

Kellyn Pot’vin-Gorman explains to us what the Automatic Workload Repository is:

The Automatic Workload Repository, (AWR) had been around since Oracle 10g and requires the diagnostic and tuning management pack licensing to use all of its features in Oracle’s Enterprise Edition database. Versions before 10.2.0.4 had limited collections vs. the modern reporting schema and every subsequent release of Oracle has added to it’s content, which explains the size increase stored in the objects/number of objects in the SYSAUX tablespace.

By default and since version 11.2.0.4, the AWR retention is 8 days and takes an automatic snapshot once per hour. It’s common for DBAs to up this retention to at least 31 days to capture a month of workload information and these snapshot identifiers can then be used to identify workload intervals for querying and reporting. Oracle can be also be configured to lessen the intervals between snapshots to change the granularity of the AWR reports, or my preference, the DBA or privileged user can take manual snapshots to identify an important beginning or ending of a period.

Kellyn goes into a good amount of detail in this post and, based on the title, promises at least a part 2. Though this could be a History of the World: Part I trick Kellyn is playing on us.

Comments closed