Published on

November 17, 2020

Exploring SQL Server Machine Learning with R Scripts and External Packages

In this article, we will delve into the world of SQL Server Machine Learning using R scripts and the utilization of external packages. SQL Server has evolved beyond being just a traditional database system and now offers the capability to leverage machine learning using languages such as R, Python, and Java.

Python

Python is an interactive, high-level, and object-oriented programming language. It allows you to run complex algorithms and versatile workflows using user-friendly commands. Python is widely used by data scientists for statistical data computing and can be utilized on UNIX, Windows, and macOS platforms.

Installing R for Machine Learning in SQL Server 2019

In order to install R for machine learning in SQL Server, you need to follow a few steps. First, refer to the article “Machine Learning Services – Configuring R Services in SQL Server” for detailed instructions on R installation. During the SQL Server installation process, make sure to select the SQL Machine Learning Service and Language Extensions for R Scripts. You can also choose to select Python and Java as your preferred SQL Machine Learning languages.

Once the installation is complete, you will need to create an environment variable for SQL Machine Learning. In the Windows Server, navigate to Control Panel -> System and Security -> System -> Advanced System Settings -> Environment Variables. Create an environment variable with the following details:

Variable Name: MJK_CBWR

Variable Value: Auto

Click “OK” and restart the Windows server. After restarting, you will need to enable script execution using the sp_configure command. Connect to the SQL instance using SQL Server Management Studio (SSMS) and execute the following command:

EXEC sp_configure 'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE

Restart the SQL Services to ensure that both the SQL Server service and SQL Server Launchpad service are in a running state.

Default R Library

By default, R services are installed in the C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES directory, where “C” is the root directory for your SQL instance and “INST1” is the SQL Server instance name. Inside this path, you will find the library folder, which contains the installed instance library. SQL Server loads the respective functions from this library when running R scripts.

You can verify the default R package library by running the following SQL script, which uses the sp_execute_external_script stored procedure and specifies R as the language:

EXECUTE sp_execute_external_script
  @language = N'R',
  @script = N'OutputDataSet <- data.frame(.libPaths());'
WITH RESULT SETS (([DefaultLibraryName] VARCHAR(MAX) NOT NULL));

This script will return the default R package library.

Installed Packages in R Services

SQL Server installs several useful packages for R services. To get a list of the installed packages, you can run the following T-SQL script:

EXECUTE sp_execute_external_script
  @language = N'R',
  @script = N'str(OutputDataSet);
             packagematrix <- installed.packages();
             NameOnly <- packagematrix[,1];
             OutputDataSet <- as.data.frame(NameOnly);',
  @input_data_1 = N'SELECT 1 as col'
WITH RESULT SETS ((PackageName nvarchar(250)));

This script will provide you with a list of 58 packages that are preconfigured with the SQL Server R installations. These packages can be used for various data analysis and statistical computing tasks.

Useful External R Libraries

There are several external R libraries that can be utilized for different purposes. Here are a few examples:

  • Dplyr: Useful for data manipulations, such as filtering, sorting, and grouping data.
  • Esquisse: Enables data visualization using graphs, charts, and histograms.
  • Lubridate: Used for working with date and time data.
  • RODBC: Provides an interface to various data sources using ODBC.
  • Leaflet: Contains a JavaScript library for working with interactive maps.
  • ggplot2: Widely used for data visualizations.
  • Janitor: Offers functions for data cleaning and manipulation.
  • XLConnect, xlsx2dfs, xlutils3, xlsx: Useful for interacting with Microsoft Excel files.
  • Readr: Enables reading of various file formats, such as CSV and TSV.
  • mlr: Popular for data classifications and regressions.
  • Magick: Used for image processing.

These are just a few examples of the many packages available for R scripts in SQL Server. You can check the package availability in your SQL R instance using the stored procedure sp_execute_external_script.

Installing External Packages

If you need to install a specific external package in R service, you can launch R from the C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES\bin directory. Right-click on R and launch it as an administrator. This will open the Microsoft R Client application.

To install a package, run the following command in R:

install.packages("package_name")

Replace “package_name” with the name of the package you want to install. Once the package is downloaded, extracted, and installed, you can verify its installation by running the appropriate query.

Conclusion

In this article, we explored the configuration of R services in SQL Server 2019 and discussed useful libraries and installation methods. By installing the required packages, you can leverage the power of R for data analysis and statistical computing within SQL Server. In the next article, we will delve into data processing using R scripts in SQL Server.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.