Welcome back to our series on SQL Server concepts and ideas in the Big Data Story. In our previous articles, we discussed the importance of Relational Databases, NoSQL Databases, Key-Value Pair Databases, and Document Databases. Today, we will dive into the world of Columnar, Graph, and Spatial Databases.
Columnar Databases
A Relational Database is a row-oriented database, meaning that data is stored and retrieved by rows. On the other hand, a Columnar Database is column-oriented, where data is stored and retrieved by columns. This makes it easier to add new columns to the database, especially when dealing with different kinds of data in the Big Data landscape.
One popular columnar database is HBase, which utilizes the Hadoop file system and MapReduce for its core data storage. However, it’s important to note that columnar databases are not suitable for every application. They are particularly effective when dealing with high volume incremental data that needs to be gathered and processed.
Graph Databases
For highly interconnected data, a Graph Database is the ideal choice. This type of database organizes data in a node relationship structure, where nodes and relationships contain key-value pairs. The major advantage of a graph database is its ability to support faster navigation among various relationships.
For example, social media platforms like Facebook utilize graph databases to list and demonstrate various relationships between users. Neo4J is one of the most popular open-source graph databases available.
However, it’s important to note that graph databases have limitations. They do not support self-referencing or self-joins, which can be a requirement in certain real-world scenarios.
Spatial Databases
Have you ever used Foursquare, Google+, or Facebook Check-ins for location-aware check-ins? These applications determine the position of your phone using the Global Positioning System (GPS). To handle the vast amount of location data and provide meaningful results, these applications rely on Spatial Databases.
Spatial data is standardized by the Open Geospatial Consortium (OGC) and helps answer interesting questions such as the distance between two locations or the area of interesting places. One popular spatial database is PostGIS/OpenGIS, which runs as a layer implementation on the RDBMS PostgreSQL. This unique combination offers the best of both worlds.
Conclusion
Understanding the different types of databases available in SQL Server is crucial in the Big Data landscape. Columnar databases are ideal for storing and processing high volume incremental data, while graph databases excel in handling highly interconnected data. Spatial databases are essential for location-aware applications that require accurate and meaningful spatial information.
In our next blog post, we will explore another important component of the Big Data Ecosystem – Hive. Stay tuned!