In a blog about learning SQL Server, it can be tempting to start with the basics of installing and using SQL Server. But the “why” should always be covered before the “how”. Why use a database? What problems does it solve? When can it help you? If you already have your own answers to these questions, then feel free to skip this post and move onto other posts on this blog that more practically dive into SQL Server concepts and skills.
The following video attempts to address these questions:
If you skipped the video, I’ll try not to have my feelings hurt. Instead, I’ll also give you a quick written overview. I’m going to keep my answer non-academic, because you can find enough of that in books and on the web (for example the Wikipedia article on relational databases).
At it’s most basic level, a database is just another place to store data. That data could be your music collection, a company’s employee information, or an inventory of products for an online store. There’s a good chance you’ve already encountered the need to store data. In the past, you may have stored your data in a spreadsheet, a Word document, or even in a simple text file. Depending on the situation, this can meet all of your needs. So what are some common problems that you have with this strategy that databases solve?
Let’s look at an example similar to the one in the video. Assume you’re storing data from a series of experiment runs. Here is a sampling of the data:
The first potential issue involves the amount of redundant data. For example, the experiment name and description are stored for each run. If you want to change either, you have to do so in all places. This is easy for such a small example, but much harder as the data increases in size or complexity.
The second potential issue is the potential for inconsistent data. What happens when the user accidentally mispells “Exp1” as “Expp1”? It appears that there is a new experiment. Also, what keeps a user from putting text in the RunDate column or a negative value in the RunNumber column?
Although there are other problems that relational databases solve, these are two of the most obvious ones. First databases contain tables with rows of data. This is very similar to the table above or data in a spreadsheet. But in a relational database, you attempt to separate the data into related entities (a concept called normalization). This can get complex quickly, so let’s keep it simple with our example. Instead of a single table in our database, let’s create two tables:
Now each experiment has a single row in the ExperimentsTable. So there is one place to change the description of name of an experiment. Note that an ExperimentId column is also added. You’ll later learn that this is a “primary key”. But for now, just note that it allows this experiment to be referred to in another table, such as the RunsTable here. This is the “relational” aspect of relational databases. This addresses both the problem of redundant data and inconsistent data.
You’ll find that relational databases improve consistency in more ways than just through normalization. Columns have assigned data types, which means that you can’t enter an string in a column that takes a number, such as RunValue. Also, you specify that the ExperimentId in the RunsTable must come from a current value in the ExperimentsTable, which also means that you can’t delete an experiment in the ExperimentsTable when any rows in the RunValues table references it.
Hopefully, this basic example helps give you a better framework for understanding relational databases and normalization. But databases provide many other benefits that can’t be adequately explored in a single blog post. Here are a few:
- Custom queries can quickly answer specific questions about the data.
- Development tools like Visual Studio provide tools and technologies to work with database data in web and client applications.
- Security controls access to the data.
- Backup and restore technologies protect against failure.
If you are learning about SQL Server from a developer background, you may be thinking that you could code these features into your application without a dependency on a database engine. This is true. But why spend the time and energy to implement these features when they are already available, tested, and performance-tuned?
After reading this, maybe you realize that your requirements don’t call for a database. And that’s fine! But if you could use a database now or in the future, I hope this post provides a background that will help.
More information on this topic:
- Introduction to Relational Databases (DatabaseJournal.com)
- What is a Database (asp.net Video)
- SQL Server 2008 Essential Training — SQL Server Core Concepts (Lynda.com)