Working with Microsoft Excel Files in SQL Server

Microsoft Excel is a widely used spreadsheet application that allows users to store, organize, and analyze data. While SQL Server is a powerful relational database management system, it may be necessary to work with Excel files in certain scenarios. In this article, we will explore how to manage Microsoft Excel files in SQL Server using Apache Spark.

Sample Data Sets

Before we dive into the details, let’s first gather the necessary sample data sets. For this demonstration, we will use the AdventureWorks files, which are commonly used to create a database. These files are pipe delimited and can be found on GitHub. Additionally, we will also use a Microsoft Excel worksheet called the “Sunshine List,” which contains a list of public sector workers in Ontario, Canada who make more than $100K. This file can be downloaded from a specific website.

Configure Spark Cluster

In order to work with Microsoft Excel files in SQL Server, we need to configure our Spark cluster. We will need to install the openpyxl library, which is used to read and write Excel files, on the Azure Databricks cluster. This can be done by accessing the running cluster and adding the library through the cluster settings.

Read Excel Files

Once our cluster is configured, we can start reading Excel files in SQL Server. There are two libraries that support reading Excel files: PySpark and Pandas. PySpark is a Python library that provides an interface for Apache Spark, while Pandas is a popular data manipulation library in Python. Both libraries have methods to read Excel files into dataframes.

Here is an example of reading an Excel file using PySpark:

# libraries
import pyspark.pandas as pd

# file path
path = '/mnt/datalake/bronze/sunshine/stage/sunshine-list-2021.xlsx'

# read the file
pdf1 = pd.read_excel(io = path, sheet_name = 0, header = 0)

# show top 10 rows
pdf1.head(10)

And here is an example of reading an Excel file using Pandas:

# libraries
import pandas as pd
from openpyxl import load_workbook

# path to file
path = '/dbfs/mnt/datalake/bronze/sunshine/stage/sunshine-list-2021.xlsx'

# open work book
wb = load_workbook(filename=path, read_only=True)

# choose worksheet
ws = wb['tbs-pssd-compendium-salary-disc']

# convert to data frame
pdf1 = pd.DataFrame(ws.values)

# close workbook
wb.close()

# show top 5 rows
pdf1.head(5)

Analyze Excel Files

Once we have read the Excel files into dataframes, we can perform various analysis tasks on the data. This can include aggregations, filtering, and transformations. We can use SQL queries or dataframe operations to analyze the data.

Write Excel Files

In addition to reading Excel files, we can also write data from SQL Server to Excel files. Both PySpark and Pandas libraries provide methods to write dataframes to Excel files.

Here is an example of writing a dataframe to an Excel file using PySpark:

# output path
path = '/mnt/datalake/bronze/sunshine/raw/temp'

# save as one file
pdf1.to_spark().repartition(1).write.parquet(path)

# create single file
unwanted_file_cleanup("/mnt/datalake/bronze/sunshine/raw/temp", "/mnt/datalake/bronze/sunshine/raw/sunshine-list-v001.parquet", "parquet")

And here is an example of writing a dataframe to an Excel file using Pandas:

# file path
file_path = "advwrks.xlsx"

# use excel writer for multiple sheets
with pd.ExcelWriter(file_path, engine='openpyxl') as writer:
  values = range(16)
  for i in values:
    df_list[i].toPandas().to_excel(writer, sheet_name = ctrl_list[i]["worksheet_name"], index=False)

Conclusion

In this article, we have explored how to work with Microsoft Excel files in SQL Server using Apache Spark. We have learned how to read Excel files into dataframes, analyze the data, and write dataframes back to Excel files. By leveraging the power of Spark and the flexibility of Python libraries like PySpark and Pandas, we can easily manage Excel files in SQL Server.

Click to rate this post!

[Total: 0 Average: 0]

Comprehensive 360 Degree Assessment

Data Replication

Performance Optimization

Data Security

Database Migration

Expert Consultation

Cloud Migration Made Easy

Considering a move to the cloud? Axial SQL brings you proven migration strategies to streamline your transition. Our expert team ensures a smooth, efficient shift, keeping your data safe and accessible. Start your journey to the cloud with confidence!

SQL Performance Optimization

Is your SQL running slower than expected? Don't let sluggish performance hinder your business. Our optimization experts at Axial SQL specialize in tuning your databases for peak performance. Speed up your SQL and supercharge your data processing today!

Database Stability Solutions

Tired of frequent database outages? Discover stability with Axial SQL! Our comprehensive analysis identifies and resolves your database vulnerabilities. Enhance reliability, reduce downtime, and keep your operations running smoothly with our expert guidance.

Expert Database Team Evaluation

Questioning your database team's efficiency? Let Axial SQL provide an expert, unbiased analysis. We assess your team's strategies and workflows, offering insights and improvements to boost productivity. Elevate your database management to new heights!

Data Security Assurance

Concerned about your database security? Axial SQL is here to fortify your data defenses. Our specialized security assessments identify potential risks and implement robust protections. Keep your sensitive data secure and your peace of mind intact with our expert services.

Published on