JSON (JavaScript Object Notation) is a popular data interchange format that is widely used in modern applications. In SQL Server, there are different approaches to import JSON data, and in this article, we will compare the performance of two methods: using the Json.NET library and using the new JSON functionality introduced in SQL Server 2016.
Why Json.NET, not JavaScriptSerializer?
Before we dive into the performance comparison, let’s address the question of why we would choose Json.NET over the JavaScriptSerializer class, which is commonly used for parsing JSON in SSIS (SQL Server Integration Services).
Firstly, Json.NET is much faster than JavaScriptSerializer. When it comes to ETL (Extract, Transform, Load) processes, speed is crucial. Additionally, JavaScriptSerializer requires additional configuration for handling large datasets, which can be easily overlooked if packages are only tested on small datasets. On the other hand, using a third-party library like Json.NET in SSIS requires adding it to the GAC (Global Assembly Cache), which may require some extra work for a build engineer, but it pays off in terms of performance.
Test Machine Specifications
Before we proceed with the performance comparison, let’s take a look at the specifications of the test machine:
- 4 Intel Core i7 cores
- 12GB RAM
- HDD
- Windows 10 (64-bit)
- SQL Server 2017
Note that the tests were performed on the same machine with SQL Server installed, so the network transfer expenses didn’t impact the results.
Approach 1: Importing Files
The first approach we’ll consider is reading JSON data from files. We’ll compare the performance of using SQL Server’s native functionality and using the Json.NET library.
In order to perform the comparison, we have prepared two files: PeopleLarge.json and PeopleLarge_Indented.json. Both files contain a JSON array of 500k records with people’s mock data. The difference is that the first file contains minified JSON, while the second file stores indented JSON.
Using SQL Functionality:
SELECT
p.Id,
p.FirstName,
p.LastName,
p.BirthDate
FROM OPENROWSET (BULK 'C:\temp\PeopleLarge.json', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn)
WITH (
Id INT,
FirstName VARCHAR(50),
LastName VARCHAR(10),
BirthDate DATETIME
) AS p
Using Json.NET:
public override void CreateNewOutputRows()
{
List<Person> records = null;
using (var fs = new FileStream(@"C:\temp\PeopleLarge.json",
FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
records = DeserializeFromStream(fs);
}
foreach (var r in records)
{
Output0Buffer.AddRow();
Output0Buffer.Id = r.Id;
Output0Buffer.FirstName = r.FirstName;
Output0Buffer.LastName = r.LastName;
Output0Buffer.BirthDate = r.BirthDate;
}
}
After running the packages, we observed the following execution times:
File Name | SQL.dtsx | NET.dtsx |
---|---|---|
PeopleLarge.json | 3:28.734 | 2:47.562 |
PeopleLarge_Indented.json | 3:19.547 | 3:01.109 |
Based on these results, we can conclude that Json.NET performs better when reading files with minified JSON, while SQL Server’s native functionality performs better with indented JSON. However, the execution time difference between the two methods is not significant, so the choice depends on the specific requirements and preferences of the project.
Approach 2: JSON from Database
In this approach, we will consider cases where JSON data is already stored in the database. We will compare the performance of using SQL Server’s native functionality and using Json.NET.
Using SQL Functionality:
DECLARE @json NVARCHAR(MAX) = (SELECT Data FROM PersonJson WHERE Id = 1)
SELECT
p.Id, p.FirstName, p.LastName, p.BirthDate
FROM OPENJSON(@json)
WITH (
Id INT,
FirstName VARCHAR(50),
LastName VARCHAR(10),
BirthDate DATETIME
) AS p
Using Json.NET:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var jsonConverted = System.Text.Encoding.Unicode.GetString(Row.Data.GetBlobData(0, Convert.ToInt32(Row.Data.Length)));
var records = JsonConvert.DeserializeObject<List<Person>>(jsonConverted);
foreach (var r in records)
{
Output0Buffer.AddRow();
Output0Buffer.Id = r.Id;
Output0Buffer.FirstName = r.FirstName;
Output0Buffer.LastName = r.LastName;
Output0Buffer.BirthDate = r.BirthDate;
}
}
After running the packages, we observed the following execution times:
JSON Taken from (Id in the PersonJson) | SQL.dtsx | NET.dtsx |
---|---|---|
PeopleLarge.json (Id = 1) | 3:22.734 | 3:07.063 |
PeopleLarge_Indented.json (Id = 2) | 2:42.015 | 3:22.468 |
Based on these results, we can conclude that SQL Server’s native functionality performs better when processing JSON data that is already stored in the database. Json.NET, on the other hand, showed degradation with indented JSON and had a more complex implementation. Therefore, for processing JSON data in the database, the SQL method is recommended.
Conclusion
In this article, we compared the performance of using SQL Server’s native functionality and the Json.NET library for importing JSON data in SSIS. Based on the tests we performed, we can draw the following conclusions:
- Json.NET is faster when reading files with minified JSON.
- SQL Server’s native functionality performs better when reading files with indented JSON and when processing JSON data already stored in the database.
- The choice between the two methods depends on the specific requirements and preferences of the project.
It’s important to note that the performance may vary depending on the specific circumstances, so it’s always recommended to test and benchmark different approaches before making a final decision.
For the complete source code and more details, you can refer to the GitHub repository.