Press "Enter" to skip to content

Day: September 20, 2017

Azure Data Lake Analytics Pipelines

Yan Li notes that Azure Data Lake Analytics now offers the ability to manage pipelines:

To make it easier to manage and understand jobs, ADLA now captures the pipeline and recurrence information for each job. This information can be used to connect and organize jobs belonging to the same pipeline or recurring instances. As shown in Fig 2, now jobs are organized by pipeline and recurring instances which enable you to:

  • Quickly identify jobs in pipelines which may have failed or taken longer than expected.

  • Get the aggregated statistics (e.g. job counts, successful and failed AU hours etc.) for a pipeline or a recurring instance

This is an interesting improvement.

Comments closed

Integrating Azure Data Catalog With Power BI

Gaston Cruz shows how to tie view Azure Data Catalog data in Power BI:

A Self Service culture will allow to address analysts to generate their own reports, lists, and dashboards without dependence on the schedule and availability of IT staff. In these cases reports combine different sources of information are generated, many of which may not have been used historically in the company, and this in turn implies that a large number of cases which source you do not know used to implement certain reports.

Azure Data Catalog comes as an option to break that cycle of discovery that is usually done manually. This means that after the first cycle where the business analyst discovers the sources of optimal data to generate certain reports the can register, and add information (metadata) to make this source easier to discover future analysts requiring such data for the implementation of similar reports. The discovery of these sources, and capability to add metadata are procedures do not have to give at the same time but Data Catalog allows work annotations by analysts as a continuous work in time where more information is added to the repository every time.

Click through for a demo.

Comments closed

Automatically Fix Those VLFs

Tracy Boggiano has a script which will fix log files with high virtual log file counts:

First part of the process if to capture the info from DBCC LOGINFO or if you are ready for 2017 the new dmv sys.dm_db_log_stats into a table you can read later to know how many VLFs exist in your database currently. So we going to create table called VLFInfo and used the procedure VLF_UpdateInfo to populate that data.  The procedure would be called in step one of a SQL Agent Job to automate the fixing of VLF files during appropriate maintenance windows on your server.  But as you will see in the Step 2 may solutions tries to account for not doing to close to when the file just grew an acquired those new lovely extra VLFs.

Read on for the code.

Comments closed

Explaining Confidence Intervals

Mala Mahadevan explains what confidence intervals are:

Suppose I look at a sampling of 100 americans who are asked if they approve of the job the supreme court is doing. Let us say for simplicity’s sake that the only two answers possible are yes or no. Out of 100, say 40% say yes. As an ordinary person, you would think 40% of people just approve. But a deeper answer would be – the true proportion of americans who approve of the job the supreme court is doing is between x% and y%.

How confident I am that it is?  About z%. (the common math used is 95%).  That is an answer that is more reflective of the uncertainty related to questioning people and taking the answers to be what is truly reflective of an opinion. The x and y values make up what is called a ‘confidence interval’.

Read the whole thing.

Comments closed

An Analysis Of The Utility Of Power BI Report Server

Meagan Longoria has an after-action report of a proof of concept using Power BI Report Server:

In addition to managing to versions of Power BI Desktop, I also found myself mentally managing two sets of features. I was constantly asking myself “Can I do that in Power BI Report Server?”. Some of that is because PBI Desktop for Report Server is on a quarterly release cycle rather than monthly, so I had to remember if a feature I wanted to use was new (or in preview) and therefore not available in this version. The other part is trying to remember what you can and cannot do with a Live Connection. For example, you can make report measures, but you can’t use ad hoc grouping and binning.

We had several scenarios where users wanted to be able to group fields in multiple ways that changed somewhat frequently. Since we couldn’t use grouping and binning in Power BI Desktop to accomplish this, we set up an Excel data source in the SSAS Tabular model, and allowed users to change the groups there and refresh the Tabular model when finished. This could get rather unwieldy if you had lots of users who needed this kind of flexibility.

Ultimately, the customer considered it a success.  Read on for more details.

Comments closed

Active Directory Integration On Ubuntu

Drew Furgiuele shows how to configure SQL Server on Linux to use Windows authentication:

In the following post, we’ll walk through joining a Linux SQL Server on Unbuntu to an Active Directory domain, and here’s the steps we’re going to take:

  1. Installing the required software and services to enable a Linux host to talk to and join an Active Directory Domain,
  2. Configuring the Linux host’s network configuration to talk to the Domain Controller(s),
  3. Setting up Samba, Kerberos, Winbind, and the System Security Services Daemon (SSSD) to properly talk to and digest authentication tokens from Active Directory, and
  4. Creating a Kerberos Keytab file for the SQL Server service to run as a domain service account.

Seems like a lot, doesn’t it? If you’re new to Linux, a lot of this configuration can seem a little daunting and a lot tedious, but as we walk through it, I’ll stop and talk a little bit about each step and what it does.

Active Directory integration was a critical piece of functionality for SQL Server on Linux.  There are still some odd edge cases (like weirdness when going cross-domain) but for the normal scenario, it works fine once you’ve configured Linux correctly.

Comments closed

Temporal Table Time Zones

Louis Davidson talks about how to translate temporal table start and end dates to your local time zone:

In my first 5 blogs on Temporal, I failed to note something pretty important about their usage. The start and end times for the row (and it follows, the historical rows) are likely not in your local time zone. They are stored in UTC time zone. This is obviously good because of that pesky daylight saving time issue where 1:30 AM occurs twice each year (here in the US, the time occurs on the first Sunday of November).

Unless you live in London, England or someplace where the offset from UTC is 0, it can make working with these tables confusing, because most people rarely work in UTC time, and even rarer is to think in UTC time when most of your data is likely in your local time zone. So you write your query and use local time…and then, most likely, data is returned…but not necessarily the data you actually desired.

Click through to see ways of translating those values.

Comments closed