Press "Enter" to skip to content

Month: May 2016

Setting Up A Linked Server To Oracle

Jon Morisi steps in to show how to set up a linked server connection to an Oracle database:

In this dialog box, the “TNS Service Name” drop down box should display your entries from the tnsnames.ora file.  Next, enter your Oracle User ID and click “Test Connection”, at which point you’ll be prompted for your password.  Everything should test successfully at this point.

Now would be a good time to restart.  Unfortunately, yes you need to restart…

You can do an additional test via sqlplus.  Open a windows command prompt and enter the following:

sqlplus user/pass@[addressname]

(Where addressname is one of your connections from tnsnames.ora)

I readily admit that I’m glad I don’t need to work with Oracle.  Nonetheless, if you do need to integrate the two, this step-by-step guide will show you how.

Comments closed

Spark Accumulators

Prithviraj Bose explains accumulators in Spark:

However, the logs can be corrupted. For example, the second line is a blank line, the fourth line reports some network issues and finally the last line shows a sales value of zero (which cannot happen!).

We can use accumulators to analyse the transaction log to find out the number of blank logs (blank lines), number of times the network failed, any product that does not have a category or even number of times zero sales were recorded. The full sample log can be found here.
Accumulators are applicable to any operation which are,
1. Commutative -> f(x, y) = f(y, x), and
2. Associative -> f(f(x, y), z) = f(f(x, z), y) = f(f(y, z), x)
For example, sum and max functions satisfy the above conditions whereas average does not.

Accumulators are an important way of measuring just how messy your semi-structured data is.

Comments closed

Database Detachments And File Permissions

Daniel Hutmacher looks at what happens when you detach a database:

On most database servers, the SQL Server service account is granted full control of the directories that host the database files. It goes without saying that the service account that SQL Server runs on should be able to create, read, write and delete database files. Looking at a sample database on my local server, the .mdf and .ldf files don’t actually inherit permissions from their folder, although the permissions are very similar to that of the folder.

This all makes sense once you read the explanation, but it’s not intuitive behavior.  Read Daniel’s gotcha near the end of the post.

Comments closed

Graphing With Microsoft R Open

David Smith points out a free e-book on creating effective graphs with Microsoft R Open:

The examples were done using Microsoft R Open, but since it’s 100% compatible with R the code works with any relatively recent R version.

Naomi and Joyce presented several examples from their e-book in a recent webinar (presented by Microsoft), and fielded lots of interesting questions from the audience. If you’d like to see the recorded webinar and also receive a copy of the slides and the e-book, follow the link below to register to receive the materials via email.

The book is free, the code is available on GitHub.  What more could you ask for?

Comments closed

Power BI Desktop Or Power Pivot

Bill Anton discusses when to use Power BI Desktop and when to use Power Pivot:

In the whitepaper, Strategic Prototyping is defined as the process of leveraging Power BI to explicitly seek out feedback from users during a requirements discovery session. The general idea is to use a prototyping tool to quickly slap together a model and mock up some reports while working closely with 1 or more business users. This helps ensure all reporting capabilities are flushed out and accounted for. After several iterations, the Power BI model becomes the blueprint upon which an enterprise solution can be based.

Prior to the emergence of Power BI, the tool of choice for strategic prototyping (at least in Microsoft shops) was Power Pivot. And even though the reporting side of Power Pivot is nowhere near as sexy as Power BI, there is one really awesome feature that does not (yet?) exist with Power BI… and that’s the “Import from PowerPivot” option in visual studio…

Bill does a good job of explaining the alternatives and, importantly, explaining that whichever you pick, there will be follow-up work.

Comments closed

Installing Apache Falcon

Awanish at Edureka shows how to install Apache Falcon on your Hadoop cluster:

Apache Falcon is a framework for managing data life cycle in Hadoop clusters. It establishes relationship between various data and processing elements on a Hadoop environment, and also provides feed management services such as feed retention, replications across clusters, archival etc.

Let us first discuss how to setup Apache Falcon. Run the below given command to download git repository of Falcon:

Command: git clone https://git-wip-us.apache.org/repos/asf/falcon.git falcon

Falcon comes as part of the Hortonworks Data Platform; Cloudera has its own alternative.

Comments closed

Restoring Azure SQL Databases With Powershell

Mike Fal shows us how to restore an Azure SQL Database database using Powershell:

The most fundamental form of disaster recovery is database backups and restores. Typically setting up backups is a lot of work. DBAs need to make sure there’s enough storage available for backups, create schedules that accommodate business operations and support RTOs and RPOs, and implement jobs that execute backups according to those schedules. On top of that, there is all the work that has to be done when backups fail and making sure disk capacity is always large enough. There is a huge investment that must be made, but it is a necessary one, as losing a database can spell death for a company.

This is one of the HUGE strengths of Azure SQL Database. Since it a service offering, Microsoft has already built out the backup infrastructure for you. All that stuff we talked about in the previous paragraph? If you use Azure SQL Database, you do not have to do any of it. At all.

What DBAs still need to manage is being able to restore databases if something happens. This is where Powershell comes into play. While we can definitely perform these actions using the portal, it involves a lot of clicking and navigation. I would much rather run a single command to run my restore.

The Powershell cmdlets are easy to use, so spin up an instance and give it a try.

Comments closed

Power BI Takeover

Devin Knight has a Q&A on Power BI:

Q: what is the difference between the Query editor and Data Modeler? What can and can’t do in each case ?

To summarize the Query Editor is mainly for Data Extraction actions.  So providing source information, applying rules to the incoming data, etc… The Data Modeling areas are focused on creating relationships between tables you’ve important and creating calculations you might need in your report.  This of this as the last step to prepare you data for reports.

Check out Devin’s webinar as well.  It’s a lot longer than a coffee break, but worth your time.

Comments closed

RID Lookup Or Key Lookup?

Aaron Bertrand asks which is faster, RID lookups or key lookups?

I’ve seen multiple people state that a heap can be better than a clustered index for certain scenarios. I cannot disagree with that. One of the interesting reasons I’ve seen stated, though, is that a RID Lookup is faster than a Key Lookup. I’m a big fan of clustered indexes and not a huge fan of heaps, so I felt this needed some testing.

So, let’s test it!

I thought it would be good to create a database with two tables, identical except that one had a clustered primary key, and the other had a non-clustered primary key. I would time loading some rows into the table, updating a bunch of rows in a loop, and selecting from an index (forcing either a Key or RID Lookup).

It looks like RID lookups are slightly faster than key lookups.  But check out the comments:  this is a best-case scenario.

Comments closed

Connecting To SQL Data Warehouse

Robert Sheldon looks at ways to connect to Azure SQL Data Warehouse:

Unlike SSMS, Microsoft does support connecting to SQL Data Warehouse from Visual Studio, via the database engine features in SSDT. When you get into the Visual Studio/SSDT environment, open SQL Server Object Explorer, which is similar to Object Explorer in SSMS. From there, click the Add SQL Server button.

When the Connect dialog box appears, provide the server name, select SQL Server Authentication, and then specify the login name and password, as shown in the following figure.

It is a bit surprising that you can’t easily connect via SSMS 2014.  Maybe that’s changed with SSMS 2016?

Comments closed