Author: Kevin Feasel

As you probably already know, the key flaw to percentage-based FILEGROWTH is that over time the increment grows larger and larger, causing the actual growth itself to take longer and longer. This is especially an issue with LOG files because they have to be zero-initialized before they can be used, causing excessive I/O and file contention while the growth is in progress. Paul Randal (blog/@PaulRandal) describes why this is the case in this blog post. (If you ever get a chance to see it Paul also does a fun demo in some of his classes and talks on why zero initialization is importan, using a hex editor to read the underlying contents of disk even after the old data is “deleted”)

Andy also has a script to change filegrowth to fixed-increment growth depending upon the size of the file, so check that out.

Comments closed

Imbalanced Data In R

Published 2017-10-09 by Kevin Feasel

Rathnadevi Manivannan explains how to deal with imbalanced data using R:

Imbalanced data refers to classification problems where one class outnumbers other class by a substantial proportion. Imbalanced classification occurs more frequently in binary classification than in multi-level classification. For example, extreme imbalanced data can be seen in banking or financial data where majority credit card uses are acceptable and very few credit card uses are fraudulent.

With an imbalanced dataset, the information required to make an accurate prediction about the minority class cannot be obtained using an algorithm. So, it is recommended to use balanced classification dataset.

Rathnadevi uses fraudulent transactions for his sample, but medical diagnoses is also a good example: suppose 1 person in 10,000 has a particular disease. You’re 99.99% right if you just say nobody has the disease, but that’s a rather unhelpful model.

Comments closed

Getting Started With Apache Mesos

Published 2017-10-09 by Kevin Feasel

Mahesh Chand Kandpal shows how to install Apache Mesos:

Follow the following instructions to install required packages and other Mesos dependencies.

# Update the packages.
$ sudo apt-get update

# Install a few utility tools.
$ sudo apt-get install -y tar wget git

# Install the latest OpenJDK.
$ sudo apt-get install -y openjdk-8-jdk

# Install other Mesos dependencies.
$ sudo apt-get -y install build-essential python-dev python-six python-virtualenv libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev zlib1g-dev

Then, Anubhav Tarar shows how to install Spark on top of Mesos:

7.now got to $SPARK_HOME/CONF

inside your spark-env.sh add following parameters

export MESOS_NATIVE_JAVA_LIBRARY= /usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=/path/to/spark-2.2.0-bin-hadoop2.7.tgz

8. start spark shell with mesos as master

./bin/spark-shell –master mesos://127.0.0.1:5050

Mesos is a rather interesting platform, and if you’re getting interested in Hadoop and Spark, it’s worth learning about this.

Comments closed

Comparing Tree Graphs In SQL

Published 2017-10-09 by Kevin Feasel

Dmitriy Vlasov shows how to compare two trees in PL/SQL:

During the day, various changes are received by the accounting system from the design system. Production planning is based on the data from the accounting system. Conditions allow you to accept all the changes for the day and recalculate the product specification at night. However, as I wrote above, it is unclear how the yesterday state of the product differs from the today one.

I would like to see what was removed from the tree and what was added to it, as well as which part or assembly replaced another one. For example, if an intermediate node was added to the tree branch, it would be wrong to assume that all the downstream elements were removed from the old places and added to the new ones. They remained where they were, but the insert of the mediation node took place. In addition, the element can ‘travel’ up and down only within one branch of the tree due to the specifics of the manufacturing process.

This is Oracle-specific; migrating it to another platform like SQL Server would take a bit of doing.

Comments closed

Extended Events Profiler

Published 2017-10-09 by Kevin Feasel

Marek Masko shows off the new Extended Events Profiler In SQL Server Management Studio 17.3:

XE Profiler looks promising and can be really a great feature. We can use it with no issues on any version of SQL Server which supports extended events – not only with newest SQL Server 2017. I tested it with SQL Server 2014 and it was working well. Currently, lack of configuration of new templates, and logic based on hard-coded names is the biggest concern and discomfort for the user. However Microsoft didn’t officially release yet this version of SQL Server Management Studio, so it’s hard to say what will be the final feature functionality.

I’m hoping that when the final version appears, it will be good enough to get people finally to kick the Profiler habit.

Comments closed

Configuring Visual Studio To Execute Python Code

Published 2017-10-09 by Kevin Feasel

Dave Mason shows us how to install Python support in Visual Studio 2015 and hook it up to the SQL Server 2017 Machine Learning Services installation of Python:

I’m starting to experiment with Python scripts in SQL Server 2017 using Machine Learning Services (In-Database). The problem is, I don’t know Python. If I run into a Python error, the output I get from SSMS is not looking too helpful. My instincts tell me I’ll be much better off developing and debugging Python code from a development tool. What I settled on was to use Visual Studio along with the Python interpreter that comes with SQL Server 2017 Machine Learning Services. I ran into a few issues that I’ll review here.

The first thing I did was Install Python support in Visual Studio on Windows. This article from Microsoft was simple enough. It worked for me with Visual Studio Community 2015. I quickly created a “PythonApplication1” project and tried Hello World. But I got an error telling me Visual Studio couldn’t find any interpreters.

Click through to read more. With Visual Studio 2017, it’s a bit easier to get started: select the Data Science pack on installation and you’ll get both Python and R support out of the box.

Comments closed

Scheduled U-SQL Jobs With Azure Data Factory

Published 2017-10-09 by Kevin Feasel

Melissa Coates shows how to schedule Azure Data Factor workflows to run U-SQL:

This post is a continuation of the blog where I discussed using U-SQL to standardize JSON input files which vary in format from file to file, into a consistent standardized CSV format that’s easier to work with downstream. Now let’s talk about how to make this happen on a schedule with Azure Data Factory (ADF).

This was all done with Version 1 of ADF. I have not tested this yet with the ADF V2 Preview which was just released.

It’s a bit lengthy, but Melissa lays it out step-by-step, making it straightforward to follow.

Comments closed

Sentiment Analysis In R

Published 2017-10-06 by Kevin Feasel

Rachel Tatman has a great tutorial introducing sentiment analysis in R:

By the end of this tutorial you will:

Understand what sentiment analysis is and how it works

Read text from a dataset & tokenize it

Use a sentiment lexicon to analyze the sentiment of texts

Visualize the sentiment of text

If you’re the hands-on type, you might want to head directly to the notebook for this tutorial. You can fork it and have your very own version of the code to run, modify and experiment with as we go along.

Check it out. There’s a lot more to sentiment analysis—cleaning and tokenizing words, getting context right, etc.—but this is a very nice introduction.

Comments closed

Sparklines In R

Published 2017-10-06 by Kevin Feasel

Robert Sheldon shows how to use SQL Server R Services to display sparklines for categories:

In this article, we continue our discussion on visualizations, but switch the focus to sparklines and other spark graphs. As with many aspects of the R language, there are multiple options for generating spark graphs. For this article, we’ll focus on using the sparkTable package, which allows us to create spark graphs and build tables that incorporate those graphs directly, a common use case when working with spark images.

In the examples to follow, we’ll import the sparkTable package and generate several graphs, based on data retrieved from the AdventureWorks2014 sample database. We’ll also build a table that incorporates the SQL Server data along with the spark graphs. Note, however, that this article focuses specifically on working with the sparkTable package. If you are not familiar with how to build R scripts that incorporate SQL Server data, refer to the previous articles in this series. You should understand how to use the sp_execute_external_script stored procedure to retrieve SQL Server data and run R scripts before diving into this article.

Sparklines and associated visuals have their place in the world. Read on to see how you can build a report displaying them.

Comments closed

Index Rebuilds Reset DMV Counters

Published 2017-10-06 by Kevin Feasel

Clive Strong notes that an index rebuild will reset certain DMV counters:

As it transpires, an index rebuild will reset the counters for this index within the sys.dm_db_index_usage DMV and this is potentially very dangerous unless you are aware of this. Normally, we determine whether or not an index is in use by looking at this information, but if you perform regular maintenance on this table, you could be resetting the data which you rely on for an accurate decision.

Read the whole thing.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31