This series of articles is based on the university course material, entitled Database Management Systems (DBMS). The course provides an introduction to the principles and techniques of database management and design, as well as their practical application in the real world.
The course material is covered by the book – Database Systems The Complete Book, which is highly recommended to anybody who is interested in relational database management systems.
It covers topics such as data modelling, data retrieval, data manipulation, and data security. It also explores the challenges of managing databases in the context of an organization.
The course focuses on the use of SQL, as well as tools and techniques for managing and querying data. It also covers topics such as database performance, optimization, scalability, and data protection. Additionally, the course covers topics such as data warehousing, data mining, and analytics, as well as the development of data-driven applications.
Through this series of blog posts, readers will gain a better understanding of the concepts and principles of database management systems and how to apply them to real-world problems. From the perspective of summarized lectures and exercises.
Table of Contents
Introduction – What are Database Management Systems
A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data. The collection of data, commonly known as the database, holds information pertinent to an enterprise.
The primary objective of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient. Database systems are designed to manage large amounts of information.
Management of data includes defining structures for storage of information and providing mechanisms for the manipulation of information. Moreover, the database system must guarantee the security of the information stored, despite system crashes or attempts of unauthorized access. If data is to be shared among multiple users, the system must also prevent any anomalous results.
As information is so vital to most organizations, computer scientists have developed a comprehensive set of concepts and techniques for managing data. These concepts and techniques form the main focus of this series of articles. This blog post offers a brief introduction to the principles of database systems.
The Need for Database Systems
The advent of database management systems (DBMSs) was a response to the inefficiencies of file-processing systems, which were the primary method of computerized data management in organizations prior to DBMSs.
Using a file-processing system to store organizational information has many drawbacks, including:
- Data redundancy and inconsistency Due to different programmers creating files and application programs over a prolonged period, data redundancy and inconsistency are likely to arise, with various files having different structures and the programs being written in several programming languages. Furthermore, the same information may be redundantly stored in multiple places (files).
- Difficulty in accessing data.
- Data isolation presents a challenge due to the fact that data are dispersed across multiple files, often in different formats, making it difficult to write new applications to extract the necessary data.
- Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints.
- Atomicity problems. A computer system, like any other device, is subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure.
- Concurrent-accessanomalies. In order to maintain data consistency and ensure the system’s overall performance and faster response, many systems now allow multiple users to update the data simultaneously. Indeed, some of the largest Internet retailers may receive millions of accesses per day from shoppers, making concurrent access anomalies a real possibility and leading to inconsistent data.
- Security problems. Not every user of the database system should be able to access all the data.
View of Data
A database system provides users with an abstract view of the data, allowing them to access and modify the collection of interrelated data. This system hides certain details of the data’s storage and maintenance, allowing for a more streamlined and efficient experience. By doing so, the database system serves its major purpose of providing users with an abstract view of the data.
Data Abstraction
Data Abstraction is an essential part of database management, helping to ensure efficient data retrieval. To simplify user interaction with the system, designers utilize different levels of abstraction. These levels start with the:
- Physical level, which describes how the data is physically stored. This is followed by the
- Logical level, which describes what data is stored and any relationships that exist between them. Finally, the
- View level describes only part of the entire database, which is helpful for users who need access to only a specific portion of the data.
By utilizing all three levels of abstraction, users can effectively interact with the database system without having to understand the complexity of the underlying data.
Instances and Schemas
Instances and schemas are important concepts when it comes to database management. A database instance is the collection of information stored in the database at a particular point in time. On the other hand, a database schema is the overall design of the database, which is changed infrequently or not at all.
The concept of database schemas and instances can be understood by analogy to a program written in a programming language. The variable declarations in a program are like the database schemas, and the values of the variables at a point in time correspond to an instance of a database schema.
Furthermore, database systems typically have several schemas, which are divided according to the levels of abstraction. The physical schema describes the database design on the physical level, while the logical schema describes the database design on the logical level. Additionally, there may be several view-level schemas, also known as subschemas, that represent different views of the database.
Database languages
A database system provides a data-definition language to specify database schema and data-manipulation language. These are not two different languages but rather – one language that forms two parts.
The data-definition language allows users to define the structure of a database by creating, altering and deleting database objects such as tables and views.
The data-manipulation language enables users to add, change, delete, and retrieve data stored in the database (CRUD).
Commonly used database languages include SQL, PL/SQL, and Transact-SQL. Database languages provide users with a way to interact with and manage data stored in a database system.
Data-definition language is used to define the structure of a database, such as creating, altering, and deleting database objects like tables and views.
Data manipulation language enables users to add, modify, delete, and retrieve data stored in the database. Common database languages used are SQL, PL/SQL, and Transact-SQL, which allow users to effectively manage their data.
Design Process
The process of designing a database can be broken down into the following steps:
- High-level design: Develop a conceptual framework to specify the data requirements of the database users and their respective structures.
- Data Model: Translate the requirements into a conceptual schema of the database.
- Model Verification: Review the schema to ensure all data requirements are satisfied and that no conflicts exist. Remove any redundant features.
- Entity-Relationship Model: Use a collection of basic objects (entities) and relationships among them to represent the data.
- Normalization: Generate a set of relation schemas that allows to store information without unnecessary redundancy and ease of retrieval.
- Functional Specification: Use additional information about the real-world enterprise to determine whether a relation schema is in one of the desirable normal forms.
- Implementation: Create the necessary tables, fields, and other objects, and write the code to populate and query the database.
- Testing & Review: Test the design against the data requirements and ensure the performance of the database is within acceptable limits. Review the design to ensure it is optimized and efficient.
In conclusion, designing a database can be a complex process that requires several steps to ensure the database is optimized and efficient. Following the steps outlined above will help to ensure that the data requirements are met and the database meets the performance requirements.
Data Storage and Querying
A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The functional components can be broadly divided into the storage manager and the query processor.
The storage manager is important because databases require a large amount of storage space – from hundreds of gigabytes to terabytes of data. Since the main memory of computers cannot store this much information, the information is stored on disks.
Data are moved between disk storage and main memory as needed, so the database system must structure the data to minimize this movement.
The query processor is important because it simplifies and facilitates access to data. It translates updates and queries written at the logical level into efficient operations at the physical level.
Storage Manager
The Storage Manager is the component of a Database System that provides the interface between the low-level data stored in the Database and the application programs and queries submitted to the System. It is also responsible for interacting with the File Manager, and for translating the various DML statements into low-level File System commands. This is done using the File System provided by the Operating System, which is used for storing the raw data on the Disk.
Ah, the storage manager: the unsung hero of the database. It’s responsible for storing, retrieving, and updating data in the database, making sure that everything runs smoothly and efficiently. Without it, nothing would work properly! It’s composed of various components, like the
- Authorization and Integrity Manager – which checks who can access data.
- The Transaction Manager – handles system failures.
- The Filemanager – allocates space on disk storage.
- The Buffermanager – fetches data from disk storage into main memory.
Plus, the storage manager implements several data structures as part of the physical system implementation, this includes:
- Data files – store the database itself.
- Data dictionaries – stores metadata about the structure of the database.
- Indices (or even hashes in some cases) that provide fast access to data items. A database index provides pointers to those data items that hold a particular value. For example, we could use an index to find the instructor record with a particular ID, or all instructor records with a particular name. Hashing is an alternative to indexing that is faster in some but not all cases.
In short, the storage manager is here to save the day!
The Query Processor
The Query Processor! It’s like an orchestra conductor, but with a much cooler hat. It’s comprised of three main components:
- DDL interpreter that interprets DDL statements and records them in a data dictionary,
- DML compiler that translates DML statements into an evaluation plan and performs query optimization.
- and a Query evaluation engine that executes the low-level instructions generated by the DML compiler.
All of them together make the Query Processor the maestro of data.
Summarizing
Relational databases are powerful tools for managing and organizing data. They provide a standardized way of organizing data into tables and columns that can be easily accessed and manipulated. By understanding the basics of relational databases, users can quickly and effectively store, manipulate, and find data. They also provide a reliable means of protecting data from corruption and unauthorized access. With the right knowledge, users can easily master the basics of relational database management and use them to their advantage.
Further Reading
Introduction to the SQL Query Language – A follow-up course material.
Database Systems The Complete Book – This book by Hector Garcia-Molina is the “complete” solution for anyone who wants to dive into the world of databases. It offers an easy-to-understand introduction to the fundamentals of databases, as well as detailed explanations of more advanced topics. Such as Database System Implementation, Parallel and Distributed Databases, Data Mining, and Architecture of search engines.
It’s a great resource for both novice and experienced database users. While the book does a great job of covering the basics, it is not that easy-to-read book. Additionally, it’s a bit dated in some areas and could benefit from updating to include more recent developments in the field. Overall, however, this is a great resource for anyone wanting to get a comprehensive overview of database systems.