I wanted to import the million song dataset in SQL Server on Linux. There’s a github repo that has the SQL to allow you to use this with the graph database features. However, it’s built for Windows.
Linux is a slightly different beast. Once I started down this path, I had memories of working on SunOS in college, messing with permissions and moving files.
I run Ubuntu in VMWare, so I first downloaded the files to my Documents folder. That’s pretty easy. However, once there, the mssql user can’t read them. Rather than mess with permissions for my home, I decided to move these to a location where the mssql user could read them.
Much of the post is about file permissions. This is because SQL on Linux is SQL on Windows, and that’s a glorious thing.
Something isn’t right…as DBAs we think of things in rows and columns. So we’re going to count across the top and think the 7th column is going to yield the 7th column and it’s data for each row, right? Well, it will but data processed by awk is whitespace delimited by default and is processed row by row. So the 7th column in the second line isn’t the same as the output in the first line. This can be really frustrating if your row data has spaces in it…like you know…dates.So let’s fix that…the output from the DMVs via dbfs is tab delimited. We can define our delimiter for awk with -F which will allow for whitespaces in our data. Breaking the data only on the tabs. Let’s hope there isn’t any tabs in our data!
I’m a little surprised that these metrics don’t end up in /proc, but I imagine there’s a reason for that.
Nothing special here, simple syntax, but the seasoned PowerShell remoting pro will notice that we’re using a new parameter here -HostName. Normally on Windows PowerShell you have the -ComputerName parameter. Now, I don’t know exactly why this is different, but perhaps the PowerShell team needed a way to differentiate between OpenSSH and WinRM based remoting. Further, Enter-PSSession now has a new parameter -SSHTransport which at the moment doesn’t seem to do much since remoting cmdlets currently use OpenSSH by default. But if you read the code comments here, it looks like WinRM will be the default and we can use this switch parameter to specify SSH as the transport.
Once we execute this command, you’ll have a command prompt to the system that passed as a parameter to -HostName. The prompt below indicates you’re on a remote system by putting the server name you’re connected to in square brackets then your normal PowerShell prompt. That’s it, you now have a remote shell. Time to get some work done on that server, eh? Want to get out of the session, just type exit.
It’s interesting to see how well Microsoft is integrating Linux support into Powershell (and vice versa, but that’s a different post).
In my case, I got various systems setup: Windows and Ubuntu 16.04. So, I make sure I download correct *zip or *tar.gz file
As, pre-requisite, you will needed to have already installed *”.NET Core 2.0 Preview 1” for the SQL Service Tools to work and remember this need to be installed in all systems.
Just in case, here’s the link to download “.NET Core 2.0 Preview 1“: https://www.microsoft.com/net/core/preview#windowscmd
Now, because we are working with PowerShell Core, don’t forget to install the latest build found at:
Read the whole thing.
David Alcock has a couple posts covering installation of SQL Server on a brand new Ubuntu VM. First, David installs Ubuntu:
The system requirements for running SQL Server on Ubuntu 16.04.2 contains the following
NoteYou need at least 3.25GB of memory to run SQL Server on Linux. For other system requirements, see System requirements for SQL Server on Linux.On the create VM window the Memory is currently set to 1024 MB so by clicking the Customize Hardware button I can change the allocated memory to 4GB (4096 MB) as in the screenshot below:
Let’s break it down a little bit. First sudo, which is giving root permissions to a particular command this is as opposed to sudo su which I had to do later on in the install to switch to superuser mode for the session.
Next is apt. Apt is a command line tool which works with the Advanced Packaging Tool and enables to perform installs, updates and removals of software packages. In this case we’re installing curl so we use the install command.
I think Microsoft did a good job of simplifying the installation process on Linux and making it “Linux-y,” with an easy installation and then post-installation configuration.
We start with a 16.04 image, we run some upgrades, install python, upgrade pip, install our requirements and expose port 8888 (jupyter’s default port).
Here is our requirements.txt file
Notice how Jupyter is in there, I also added a few other things that I very commonly use including numpy, pandas, plotly, scikit-learn and some azure stuff.
The big benefit to doing this is that your installation of Jupyter can exist independently from your notebooks, so if you accidentally mess up Jupyter, you kill and reload from the image in a couple commands.
You’ll want to learn the syntax for one of the console-based text editors. My personal favorite is ‘vi‘. It’s quick, streamlined, but does have a significant learning curve. Emacs is another editor that works great. Many others are out there, and your options open even more if you’re using a GUI. You’ll need an editor to edit configuration files.
The folder structure of Linux is one of the biggest changes. Whereas Windows is based off of an arbitrary drive-letter assignment system that dates back to the DOS era, Linux is is based off of a tree structure. All folders and files are based on a single point, ‘/’ or the root folder, and everything is based off of folders from this point. Certain folders from Windows, such as C:\Windows, C:\Users\username, or %WINDOWSTEMP%, are mapped to certain folders within the Linux operating system.
This is really high-level stuff; if you’re looking at administering a Linux box in a production environment, I’d highly recommend taking the time to learn Linux in detail.
As of CTP 2.1 for SQL Server 2017 a set of new environment variables are available. These variables allow us to configure each SQL Server container as we spin them up. The first version of SQL Server on Linux came with:
These had to be set for the container to start. The SA_PASSWORD has be a complex password or the container will not start. CTP 2.1 introduced:
Read on for the new variables and an example on how to use them.
First and foremost (IMHO), Microsoft wants to appeal to developers. They want their development stack to run anywhere there are developers. Notably, Microsoft just released Visual Studio 2017 for Mac on May 10th! Many developers out there run on non-Microsoft workstations, notably Apple computers. Apple’s OSX operating system is originally derived from the FreeBSD operating system. FreeBSD and other *BSD operating systems share much in common with Linux. So, if you can make SQL Server work on the Apple, you’ve probably made it work on Linux. Arguably, covering these two platforms nails just about every widely adopted development platform out there.
Microsoft also wants to appeal to a broader customer base, which means exploring the other environments that software runs on. An exceptionally high number of the world’s servers are powered by Linux. It’s lean, mean, stable, and powerful. Lots of shops refuse to run a Windows-based server because of a number of reasons, including that their in-house IT staff only have Linux knowledge. These same shops are most likely pressured to run a SQL Server for various applications. I know a number of third-party vended application that require a SQL Server, and previously if an organization dictated no Windows-based servers, that meant that this application would never be adopted in the organization, no matter how well it would function.
David provides a good explanation and sets up the context behind his upcoming SQL Server on Linux series.
So Joey comes up and says “Chrissy, Aaron Nelson has pretty much required me to talk to you. The SQL Community has the #1 PowerShell UserVoice request. We see that – we’ve heard you, The People want Out-DataTable and we agree. Would you be happy if we added it to the PowerShell Gallery first?”
“Uh, no! I want Out-DataTable to be a first class citizen like Out-GridView.”
“But where we’re going with PowerShell — we’re going smaller – to just core files, then you add on from the Gallery as desired.”
“Oh dang, like Linux! I’m liking it, keep talking.”
“To be clear, this is post 6.0. In the 6.0 timeframe, but we want to decouple as many release trains as possible, like PowerShellGet and PSReadline. But we’ll still very well package the ‘uber-complete, awesome devops tool edition’ of PowerShell. In the meantime, you could setup a metapackage for just your database stuff.”
“So it is like Linux patterns! PowerShell Gallery does that? I’m sold.”
Chrissy goes on to explain what a Powershell Gallery metapackage module is, how to create one, and even how to publish one yourself.