Press "Enter" to skip to content

Curated SQL Posts

Getting Started With Apache Mesos

Mahesh Chand Kandpal shows how to install Apache Mesos:

Follow the following instructions to install required packages and other Mesos dependencies.

# Update the packages.
$ sudo apt-get update

# Install a few utility tools.
$ sudo apt-get install -y tar wget git

# Install the latest OpenJDK.
$ sudo apt-get install -y openjdk-8-jdk

# Install other Mesos dependencies.
$ sudo apt-get -y install build-essential python-dev python-six python-virtualenv libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev zlib1g-dev

Then, Anubhav Tarar shows how to install Spark on top of Mesos:

7.now got to $SPARK_HOME/CONF

inside your spark-env.sh add following parameters

export MESOS_NATIVE_JAVA_LIBRARY= /usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=/path/to/spark-2.2.0-bin-hadoop2.7.tgz

8. start spark shell with mesos as master

./bin/spark-shell –master mesos://127.0.0.1:5050

Mesos is a rather interesting platform, and if you’re getting interested in Hadoop and Spark, it’s worth learning about this.

Comments closed

Comparing Tree Graphs In SQL

Dmitriy Vlasov shows how to compare two trees in PL/SQL:

During the day, various changes are received by the accounting system from the design system. Production planning is based on the data from the accounting system. Conditions allow you to accept all the changes for the day and recalculate the product specification at night. However, as I wrote above, it is unclear how the yesterday state of the product differs from the today one.

I would like to see what was removed from the tree and what was added to it, as well as which part or assembly replaced another one. For example, if an intermediate node was added to the tree branch, it would be wrong to assume that all the downstream elements were removed from the old places and added to the new ones. They remained where they were, but the insert of the mediation node took place.  In addition, the element can ‘travel’ up and down only within one branch of the tree due to the specifics of the manufacturing process.

This is Oracle-specific; migrating it to another platform like SQL Server would take a bit of doing.

Comments closed

Extended Events Profiler

Marek Masko shows off the new Extended Events Profiler In SQL Server Management Studio 17.3:

XE Profiler looks promising and can be really a great feature. We can use it with no issues on any version of SQL Server which supports extended events – not only with newest SQL Server 2017. I tested it with SQL Server 2014 and it was working well. Currently, lack of configuration of new templates, and logic based on hard-coded names is the biggest concern and discomfort for the user. However Microsoft didn’t officially release yet this version of SQL Server Management Studio, so it’s hard to say what will be the final feature functionality.

I’m hoping that when the final version appears, it will be good enough to get people finally to kick the Profiler habit.

Comments closed

Configuring Visual Studio To Execute Python Code

Dave Mason shows us how to install Python support in Visual Studio 2015 and hook it up to the SQL Server 2017 Machine Learning Services installation of Python:

I’m starting to experiment with Python scripts in SQL Server 2017 using Machine Learning Services (In-Database). The problem is, I don’t know Python. If I run into a Python error, the output I get from SSMS is not looking too helpful. My instincts tell me I’ll be much better off developing and debugging Python code from a development tool. What I settled on was to use Visual Studio along with the Python interpreter that comes with SQL Server 2017 Machine Learning Services. I ran into a few issues that I’ll review here.

The first thing I did was Install Python support in Visual Studio on Windows. This article from Microsoft was simple enough. It worked for me with Visual Studio Community 2015. I quickly created a “PythonApplication1” project and tried Hello World. But I got an error telling me Visual Studio couldn’t find any interpreters.

Click through to read more.  With Visual Studio 2017, it’s a bit easier to get started:  select the Data Science pack on installation and you’ll get both Python and R support out of the box.

Comments closed

Scheduled U-SQL Jobs With Azure Data Factory

Melissa Coates shows how to schedule Azure Data Factor workflows to run U-SQL:

This post is a continuation of the blog where I discussed using U-SQL to standardize JSON input files which vary in format from file to file, into a consistent standardized CSV format that’s easier to work with downstream. Now let’s talk about how to make this happen on a schedule with Azure Data Factory (ADF).

This was all done with Version 1 of ADF. I have not tested this yet with the ADF V2 Preview which was just released.

It’s a bit lengthy, but Melissa lays it out step-by-step, making it straightforward to follow.

Comments closed

Sentiment Analysis In R

Rachel Tatman has a great tutorial introducing sentiment analysis in R:

By the end of this tutorial you will:

  • Understand what sentiment analysis is and how it works
  • Read text from a dataset & tokenize it
  • Use a sentiment lexicon to analyze the sentiment of texts
  • Visualize the sentiment of text

If you’re the hands-on type, you might want to head directly to the notebook for this tutorial. You can fork it and have your very own version of the code to run, modify and experiment with as we go along.

Check it out.  There’s a lot more to sentiment analysis—cleaning and tokenizing words, getting context right, etc.—but this is a very nice introduction.

Comments closed

Sparklines In R

Robert Sheldon shows how to use SQL Server R Services to display sparklines for categories:

In this article, we continue our discussion on visualizations, but switch the focus to sparklines and other spark graphs. As with many aspects of the R language, there are multiple options for generating spark graphs. For this article, we’ll focus on using the sparkTable package, which allows us to create spark graphs and build tables that incorporate those graphs directly, a common use case when working with spark images.

In the examples to follow, we’ll import the sparkTable package and generate several graphs, based on data retrieved from the AdventureWorks2014 sample database. We’ll also build a table that incorporates the SQL Server data along with the spark graphs. Note, however, that this article focuses specifically on working with the sparkTable package. If you are not familiar with how to build R scripts that incorporate SQL Server data, refer to the previous articles in this series. You should understand how to use the sp_execute_external_script stored procedure to retrieve SQL Server data and run R scripts before diving into this article.

Sparklines and associated visuals have their place in the world.  Read on to see how you can build a report displaying them.

Comments closed

Index Rebuilds Reset DMV Counters

Clive Strong notes that an index rebuild will reset certain DMV counters:

As it transpires, an index rebuild will reset the counters for this index within the sys.dm_db_index_usage DMV and this is potentially very dangerous unless you are aware of this. Normally, we determine whether or not an index is in use by looking at this information, but if you perform regular maintenance on this table, you could be resetting the data which you rely on for an accurate decision.

Read the whole thing.

Comments closed

Unboxing ISPACs

It’s an early Christmas for Richie Lee:

The first file that we’re going to look at is the [Content_Types].xml file, and this is the file that confirms that the ZipPackage class is used. There’s an article here that is ten years old but is still valid (scroll down to the System.IO.Packaging INcludes Zip Support to read up on this.) This is because we know that the content_types file is part of the output when using the ZipPackage class to zip up a bunch of files into a .zip. The content_file contains both the extension and content type of the three other files that are included in the ispac:

  • dtsx
  • params
  • manifest

Note that the content_types file does not specify the files, either in quantity or in content, other than the fact that they will contain xml.

Read on for a good amount of detail on what’s included in an Integration Services package.

Comments closed

No Hierarchies In SSAS DirectQuery

Thomas LeBlanc points out that Analysis Services Tabular’s DirectQuery functionality does not include hierarchies on dimensions:

You can go to this like from Microsoft about more limitations.

There was also difference in labeling when designing a Tabular Model in Visual Studio (SQL Server Data Tools – SSDT) and making a change to the Model from SQL Server Management Studio – SSMS.

In SSDT, The Property for the Model is DirectQuery and the values are On and Off.

It’s certainly not the end of the world, but I can see it being a bit annoying to deal with.

Comments closed