Press "Enter" to skip to content

Month: January 2018

The Benefits And Risks Of Sharing In Power BI

Steve Hughes explains the implications of sharing a Power BI report:

The primary reason to use share is to distribute content outside the context of a Power BI App. Power BI Apps should be your first mechanism for sharing content within your organization. It requires more thought and planning which is typically a good idea with your companies data. However, there are times when sharing makes sense. With the ability to share reports, you can limit sharing to specific areas. Also, you may want to create a “one-off” report for use in decision making but not something to be deployed in the long term.

Sharing is very different from deploying Apps. App deployment is not that difficult to do, but prevents sharing and is much easier to manage access.

Read on for Steve’s thoughts, including his hesitancy toward sharing en masse.

Comments closed

Data Migration And Visualization With Data Factory And Data Lake

Matt Basile has a video which shows him taking raw data in S3, moving it to Azure Data Lake Storage using Azure Data Factory, and then visualizing it with Power BI:

While this seems like a lot of parts just to copy a few files, it’s important to note I only scratched the surface of what ADF can do.  Think of ADF as an airline company that manages and enables cargo (data) movement between cities (data sources).  A pipeline represents the overall goal of moving certain cargo from one city to another. The linked service is the airport, which provides a landing point and access control for the cities. The dataset is the list of cargo to move, the activity is the flight itself, while the integration runtime is the airport infrastructure that makes the cargo movement possible.  A single pipeline requires all these objects to run successfully; however, many pipelines can use these same objects to complete different tasks.  Once you’ve created these data factory objects, it is straightforward to layer on additional functionality or more pipelines. ADF also has visual tools that make building these objects a breeze – to build my pipeline, all I had to do was click on “Copy data” in the visual tools start menu and follow the steps provided.

Matt has a video demonstrating the process as well.

Comments closed

Quick Date Formatting With CONVERT

Dave Mason lists the common date formats available with the CONVERT fucntion:

Displaying dates and times with different formats in TSQL is a task I run into quite a bit. I used to visit this page so many times, I’m surprised it doesn’t have a “Welcome back, Dave!” banner on it at the top.  After umpteen million times, I decided it was time to be more efficient. I created this query that’s come in handy numerous times. I considered dumping it into a view, but I’ve found it’s nice to copy/paste the CONVERT statement (directly from a script) and replace CURRENT_TIMESTAMP with whatever column I want to have formatted.

Click through for the script and sample output.

Comments closed

Using The Cosmos DB Emulator

Cedric Charlier shows off how to use the Azure Cosmos DB emulator to create and query a graph database:

When developing on Azure Cosmos DB, Microsoft’s globally distributed, horizontally partitioned, multi-model database service, it’s useful to use the local emulator. At the moment the Web Interface of the data explorer is oriented for the SQL API and it sounds impossible to create a graph query on the emulator. This is false, the UI is not aware of these features but the engine supporting the Graph API.

You can access the emulator engine by the means of the .Net librairies. Create a new console project in Visual Studio and add the package Microsoft.Azure.Graphs. This package is in preview, you’ll need to specify to the package manager that you’re accepting the previews.

Read on to learn more.

Comments closed

Inlining Scalar Functions

Brent Ozar covers a Microsoft paper which looks exciting:

Froid replaces the scalar UDF operator in the calling query with the newly constructed relational expression as a scalar sub-query.

That one statement comes with a lot of interesting gotchas that they discuss throughout the paper:

  • You won’t see the scalar function in the plan (just like we don’t currently see single-statement inline table-valued functions by name in the plan – we just see their effects, kinda like views)

  • The function and the underlying tables might have different permissions (you could have permissions on the function but not the tables, or vice versa, which makes compilation & execution a little trickier)

  • Code that doesn’t get used can just get removed outright (just like SQL Server can do join elimination)

  • Costs and row estimates are now useful inside the plan

To the extent that this works (and I hope it does), it helps fulfill the promise of SQL Server 2000, with encapsulation of code.  Today, one of the easiest big performance gains I can give is to strip something out of a user-defined function and inline it.

Comments closed

Cross-Server Partition Elimination

Derik Hammer shows an example of “old-style” partitioning across servers:

SQL Server has a feature for partitioning tables and indexes. Partitioning can be implemented at many levels, however. You can create multiple tables for one logical data set, you can split the set into multiple databases, and you can even split it among different servers. Sharding is another term. It refers to partitioning data to horizontally scale out compute resources and storage.

There are different methods of handling sharding. Each of them need a central point of connection to handle querying the data on the other shards. This is typically called the control node. The method I am to discuss today is one where linked servers are used to connect the various shards.

This is useful for something like offloading old invoices which you rarely need to a separate server.  Derik also shows that the optimizer can, if it knows your partitioning rules, avoid unnecessary cross-server hits.

Comments closed

Subtle Changes In Application Behavior Across Kafka Streams Versions

Aleksandar Pejakovic shows some subtle but important changes to an application running Kafka Streams 0.11 versus 1.0:

One would expect that by changing the version, the previous behavior would remain the same. Well, it hasn’t. What has changed?

After each process method, a punctuate method is called. After punctuateInterval is scheduled, punctuate also occurs. This means the following:

  • In the first test scenario, each “Arrived: message_<offset>” message in the console is accompanied with “Punctuate call”. Unsurprisingly, we have one: “Processed: 1” message in output topic. After ten messages, we have another: “Punctuate call” and “Processed: 0” pair.
  • In the second scenario, we have nine: “Arrived: message_<offset>” and “Punctuate call” pairs on the console, followed with 9: “Processed: 1” in the output topic. After the pause and tenth message we have: “Arrived: message_<offset>” and 3 “Punctuate call”. In the output topic, we see “Processed: 1”, “Processed: 0”, and “Processed 0”.

Read the whole thing.  This sort of behavioral change can be hard to suss out when testing a streaming application.

Comments closed

Fun With ML Services And VARBINARY

I wrap up my ML Services mini-series by building out a process to predict sales for multiple products using different models:

I have my model as an input and want to spit it out at the end as well. But when I try that, I get an error:

Msg 39017, Level 16, State 3, Line 239
Input data query returns column #1 of type ‘varbinary(max)’ which is not supported by the runtime for ‘R’ script. Unsupported types are binary, varbinary, timestamp, datetime2, datetimeoffset, time, text, ntext, image, hierarchyid, xml, sql_variant and user-defined type.

So there goes that plan—I can output a VARBINARY(MAX) model, but I cannot input one.

Click through to see my workaround.

Comments closed

Goodbye, SQLPS

Max Trinidad notes that SSMS 17.4 gets rid of the SQLPS module:

After SSMS Version 17.4 was release back in December, SQLPS module is no longer available. So, if you try to use the “Start PowerShell” from any of the database object, you’ll get the message “No SQL Server cmdlets found…” popup message.

And good riddance.  Even in 2008, the SQLPS method of dealing with Powershell was obsolete, as Powershell modules were supposed to be snap-ins rather than independent shells.  The SQL Server Powershell module is a major improvement in that regard.

Comments closed

R Or Python

Tomaz Kastrun shares his thoughts on the topic of R versus Python:

Imag[in]e I ask you, would you prefer Apple iPhone over Samsung Galaxy, respectively? Or if I would ask you, would you prefer BMW over Audi, respectively? In all the cases, both phones or both cars will get the job done. So will Python or R, R or Python. So instead of asking which one I prefer, ask your self, which one suits my environment better? If your background is more statistics and less programming, take R, if you are more into programming and less into statistics, take Python; in both cases you will have faster time to accomplish results with your preferred language. If you ask me, can I do gradient boosting or ANOVA or MDS in Python or in R, the answer will be yes, you can do both in any of the languages.

This graf hits the crux of my opinion on the topic, but as I’ve gone deeper into the topic over the past year, I think the correct answer is probably “both” for a mature organization and “pick the one which suits you better” for beginners.

Comments closed