Submitting A Spark Job On HDInsight

Bharath Venkatesh shows different ways to run a Spark job on HDInsight:

From HDI 3.5 onwards, our clusters come preinstalled with Zeppelin Notebooks. Much like Jupyter notebooks, Zeppelin is a web-based notebook that enables interactive data analytics. It provides built-in Spark intergration that allows for:

  • Automatic SparkContext and SQLContext injection
  • Runtime jar dependency loading from local filesystem or maven repository. Learn more about dependency loader.
  • Canceling job and displaying its progress

This MSDN article provides a quick easy-to-use onboarding guide to help get acclimatized to Zeppelin. You can also try several applications that come pre-installed on your cluster to get hands on experience of Zeppelin.

Zeppelin is probably my favorite method, but there are good reasons to use all of these.

Understanding Data Integration Lifecycle Management

Andy Leonard explains DILM:

Data Integration Lifecycle Management (DILM) is not about data integration development.

DILM is about everything else:

  • Configurations Management
  • Version Management
  • Deployment
  • Execution

Although DILM is not about development, implementing DILM will impact the design of SSIS solutions.

This is the first part in a multi-part series, and covers some of the conceptual basicsbehind DILM.

Reporting Services Mobile Reports

Kathi Kellenberger shows off mobile reports in SQL Server Reporting Services:

Mobile Reports are dashboards that will run on most modern mobile devices as well as within the web portal. They are supported on IOS 9 and later, Android 4.4 or better, and Windows 10. To run Mobile Reports on these devices, the mobile Power BI application must be installed.

At first glance, they are simple to create. There is a new tool to use, the SQL Server Mobile Report Publisher. The tool will look familiar to you if you have worked with Datazen in the past. Microsoft purchased Datazen in 2015.

This is the first major Reporting Services update since 2008 (unless you consider sparkline support in R2 a major update), and could be a good business justification for upgrading to SQL Server 2016.

Columnstore Query Patterns

Niko Neugebauer gives a couple hints on how to make columnstore queries as fast as possible:

On my VM with 4 cores it takes 33 seconds to execute this query on SQL Server 2016 with Service Pack 1, while it burns almost 48 seconds of the CPU Time.
The relevant part of the execution plan can be found below, showing so many performance problems that this query is suffering, such as INNER LOOP JOIN, INDEX SPOOL, besides even worse part that is actually hidden and is identifiable only once you open the properties of any of the lower tree (left side of the LOOP JOIN), seeing that it all runs with the Row Execution Mode actually.

To show you the problem, on the left side you will find the properties of the sort iterator that is to be found in the lower (left) part of the LOOP Join that was executed around 770.000 times in the Row Execution Mode, effectively taking any chances away from this query to be executed in a fast way. One might argue that it might that it might be more effective to do the loop part in Row Mode, but given that we are sorting around 3.1 Million Rows there – for me there is no doubt that it would be faster to do it within a Batch Execution Mode. Consulting the last sort iterator in the execution plan (TOP N SORT), you will find that it is running with the help of the Batch Execution Mode, even though it is processing around 770.000 rows.

There’s some valuable information here.

Changing TDE Certificates

Tom Norman has a script to change the certificate used for TDE:

When you create TDE certificate by default, the certificate is just created for a year. Depending upon your auditor, you may be asked to change this TDE certificate yearly.  In this blog post, you can watch when your TDE certificate expires using policy-based management, https://www.mssqltips.com/sqlservertip/2388/internal-sql-server-certificates-have-expired-is-this-an-issue/.

Click through for the script.

Uses For Binary Data Types

Daniel Hutmacher explains what binary data types are and one use case:

Because binary values are essentially strings, they easily convert to and from character strings, using CAST or CONVERT. To convert the binary value of 0x41 to a plain-text character value, try

SELECT CAST(0x41 AS char(1)); --- 'A'

The binary value 0x41 is equivalent to decimal 65, and CHAR(65) is the letter “A”. Note that I haven’t placed any quotes around 0x41 – that’s because it’s a numeric value (albeit in hex notation) and not a string.

A couple use cases I’ve seen are creating hashes (SHA1 or MD5) for change detection, storing password hashes, and encrypted columns—Always Encrypted uses varbinary data types to store encrypted information, for example.

Getting A Handle On Isolation Levels

Robert Bishop explains some of the basics behind transaction isolation levels:

Lost Updates

It is possible for two or more transactions to modify the same row. While the subsequent transactions are reading the data, the 1st transaction commits its changes. Then the subsequent transaction makes its changes and commits, thereby possibly losing the changes committed by the 1st transaction. In this scenario, the last transaction always wins.

Dirty Read

A dirty read is when a SELECT statement will read against a table that is currently being modified and not yet committed to the database. By default, SQL Server will protect against this on all isolation levels, except for Read Uncommitted or by using NOLOCK in your queries.

This is the first in a series.

Password Expiration

Artemakis Artemiou hits a controversial topic in security today:

As you can see from the above screenshot, the maximum password age is set to 90 days. This means that if I have a login that uses SQL Server Authentication or a contained database user with a password, if the login’s or user’s password was last changed 90 days (or more) ago, then the login/user will be prompted by SQL Server to change her password.

Even though this practice was used for many years, not only in SQL Server but similarly, on Windows-level, as well as in other systems and applications, recent studies argue that it should not be a recommended practice anymore. On the contrary, these studies suggest that user passwords should not be regularly changed but rather change only when there is a specific and justified reason to do so. Furthermore, new security standards are being formulated that contain new recommendations on password change.

Security is all about trade-offs.  If you make people change their passwords every 90 days, they’ll write the password down somewhere.

Reading Extended Events

Dave Mason writes a custom app to read extended events data:

It took me a while to make the transition from SQL Profiler to Extended Events. Eventually I got comfortable enough with it to use it 100% of the time. As I read more about the XEvents architecture (as opposed to just “using” XEvents), I gained a deeper appreciation of just how great the feature is. My only gripe is that there isn’t a way to handle the related events from within SQL Server using T-SQL. DDL triggers can’t be created for XEvents. And they can’t be targeted to Service Broker for Event Notifications (not yet, anyway). For now, the one way I know of to handle an XEvent is by accessing the event_stream target via the .NET framework. I’ll demonstrate with C#.

9/10, would have preferred F# but would read again.

Pipelearner

Simon Jackson introduces pipelearner, a tool to help with creating machine learning pipelines:

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Click through for some very interesting examples.

Categories

February 2019
MTWTFSS
« Jan  
 123
45678910
11121314151617
18192021222324
25262728