Press "Enter" to skip to content

Month: April 2017

Embedded Solr With Scala

Anurag Srivastava shows how to use Embedded Solr using an example written in Scala:

Embedded Solr has the same interface as Solr without requiring an HTTP connection. When we “embed” Solr into a Java an application, it provides the exact same API that you would use if you were connecting to a remote Solr instance. We can use embedded Solr for in-memory testing because when we implement test cases, it should not depend on any external resources.

Read on for the code sample.

Comments closed

Using h2o.ai On HDInsight

Xiaoyong Zhu shows how to set up h2o.ai on Azure HDInsight:

H2O Flow is an interactive web-based computational user interface where you can combine code execution, text, mathematics, plots and rich media into a single document, much like Jupyter Notebooks. With H2O Flow, you can capture, rerun, annotate, present, and share your workflow. H2O Flow allows you to use H2O interactively to import files, build models, and iteratively improve them. Based on your models, you can make predictions and add rich text to create vignettes of your work – all within Flow’s browser-based environment. In this blog, we will only focus on its visualization part.

H2O FLOW web service lives in the Spark driver and is routed through the HDInsight gateway, so it can only be accessed when the spark application/Notebook is running

You can click the available link in the Jupyter Notebook, or you can directly access this URL:

https://yourclustername-h2o.apps.azurehdinsight.net/flow/index.html

Setup is pretty easy.

Comments closed

Optimizing For Ad Hoc Workloads

Kendra Little has soured a bit on the Optimize for Adhoc Workloads setting:

Once upon a time, I was really excited about getting this configuration item in SQL Server 2008. Early versions of SQL Server 2005 weren’t all that great at managing the size of the execution plan cache: it could really balloon up and eat away at the buffer pool. But the SQL Server team did a good job at tuning those algorithms in later service packs for 2005 and future versions, and it became much less of an issue.

Personally, I’ve never had a case where enabling ‘Optimize for Adhoc Workloads’ improved performance in a way that I could measure. It may save you a small amount of memory, it may not.

I don’t mean this as a big insult. Trying to save a penny every time you go to the grocery store could add up, if you grocery shop very frequently. But hopefully that’s not one of your major revenue sources over time.

It’s an interesting counter-argument and worth reading.

Comments closed

SQL Server Backup To Azure Tool Causing Restore Errors

Jack Li diagnoses an issue in which the Microsoft SQL Server Backup to Microsoft Azure Tool causes errors when trying to restore a database on an Azure VM with SQL Server 2008 R2:

I worked on an interesting issue today where a user couldn’t restore a backup.   Here is what this customer did:

  1. backed up a database from an on-premises server (2008 R2)
  2. copied the file to an Azure VM
  3. tried to restore the backup on the Azure VM (2008 R2 with exact same build#)

But he got the following error:

Msg 3241, Level 16, State 0, Line 4
The media family on device ‘c:\temp\test.bak’ is incorrectly formed. SQL Server cannot process this media family.
Msg 3013, Level 16, State 1, Line 4
RESTORE HEADERONLY is terminating abnormally.

We verified that he could restore the same backup on the local machine (on-premises).  Initially I thought the file must have been corrupt during transferring.   We used different method to transfer file and zipped the file.  The behavior is the same.   When we backed up a database from the same Azure VM and tried to restore, it was successful.

Click through for Jack’s findings as well as a couple workarounds.

Comments closed

On-Prem Power BI Gateway

Steve Hughes shows how to set up a data gateway for Power BI:

First, I will not be discussing the personal gateway in this post. If you have chosen to use the personal gateway, you have limited functionality and should consider using the on-premises data gateway for corporate use.

The on-premises data gateway (referred to as gateway throughout this post) “acts as a bridge, providing quick and secure data transfer between on-premises data and the Power BI, Microsoft Flow, Logic Apps, and PowerApps services.” (ref) Much of what is discussed here will apply to all of the services referenced above, but our primary concern is related to Power BI. Please refer to references at the end of this post for details about data sources supported within the gateway.

Click through for more information.

Comments closed

Measuring Correlation In SQL

Phil Factor shows how to calculate Kendall’s Tau and Spearman’s Rho in SQL:

Kendall’s Tau rank correlation is a handy way of determining how correlated two variables are, and whether this is more than chance. If you just want a measure of the correlation then you don’t have to assume very much about the distribution of the variables. Kendall’s Tau is popular with calculating correlations with non-parametric data. Spearman’s Rho is possibly more popular for the purpose, but Kendall’s tau has a distribution with better statistical properties (the sample estimate is close to a population variance) so confidence levels are more reliable, but in general, Kendall’s tau and Spearman’s rank correlation coefficient are very similar. The obvious difference between them is that, for the standard method of calculation,  Spearman’s Rank correlation required ranked data as input, whereas the algorithm to calculate Kendall’s Tau does this for you.  Kendall’s Tau consumes any non-parametric data with equal relish.

Kendall’s Tau is easy to calculate on paper, and makes intuitive sense. It deals with the probabilities of observing the agreeable (concordant) and non-agreeable (discordant) pairs of rankings. All observations are paired with each of the others, A concordant pair is one whose members of one observation are both larger than their respective members of the other paired observation, whereas discordant pairs have numbers that differ in opposite directions. Kendall’s Tau-b takes tied rankings into account.

I appreciate Phil putting this series together.  I’d probably stick with R, but it’s good to have options.

Comments closed

Rethrowing Exceptions

Vladimir Oselsky shows how to use THROW and RAISERROR for rethrowing exceptions:

Upon executing the first procedure, we get the error message back to the front end, but after checking balance, we find that money withdrawn from the account, but in the case of the second procedure, the same error returned to the front end but money still there.

Now we begin to scratch our head trying to figure out why we lost the money even though we got errors in both cases. The truth behind is the fact that RAISERROR does not stop the execution of code if it is outside of TRY CATCH block. To get same behavior out of RAISERROR, we would need to rewrite procedure to look something like following example.

There are some nuanced differences between THROW and RAISERROR, so it’s valuable to know how both work.

Comments closed

Basics Of R Plotting

Aman Tsegai shows some basic ways to customize R’s plot function:

We’re going to be using the cars dataset that is built in R. To follow along with real code, here’s an interactive R Notebook. Feel free to copy it and play around with the code as you read along.

So if we were to simply plot the dataset using just the data as the only parameter, it’d look like this:

plot(dataset)

The plot function is great for cases where you don’t much care how the visual looks, and the simplicity is great for throwaway visuals.

Comments closed

Thinking About Automation

Chrissy LeMaire has a series of thoughts on this month’s T-SQL Tuesday, and it was worth separating out from the rest of today’s batch:

Y’all know what I’m gonna say here! I love automation and PowerShell. I know for a fact that PowerShell and T-SQL together are the future of SQL Server administration. As someone who often presents about dbatools, the popular SQL PowerShell community project, I’ve seen the excitement and relief that PowerShell automation brings to SQL Server Database Pros.

From making it way easier to migrate entire instances to automating backup testing and verification, PowerShell makes it straight up more enjoyable to be a DBA.

There’s a lot of well-deserved plugging of dbatools.  Hint, hint.

Comments closed

Updating Large Tables In SQL Server And Oracle

Jana Sattainathan has a post on how he was able to move and update billions of rows, using both Oracle and SQL Server as examples:

The key thing to remember with SQL Server is to convert to a non-integer value by using a “decimal” as shown in the above example with “10.”. This is the same as saying “10.0”. Without the “.”, it will result in uneven splits from rounding errors of integers. It is not the result that you intend to have it you want accurate results.

To show you the difference, I have included the SQL and results of a query that uses “.” and the other that does not, with “.” being the only difference:

It’s a good article, and definitely an important thing to think about when you have large tables.

Comments closed