[DB] Introduction to Database System
Database is a collection of data that contains information about an enterprise. Since information is so crucial in most applications, there is a large body of concepts and techniques for managing data. And database system, a collection of interrelated data and a set of programs to access those data, provides a way to managing database conveniently and efficiently.
Purpose of Database
In the early days, database applications were built directly on top of file-processing systems that stores permanent records in various files, which leads to:
-
Data redundancy and inconsistency
Data is stored in multiple file formats. This leads to resulting in duplication of information in different files (redundancy) and that the various copies of the same data may no longer agree (inconsistency).
-
Difficulty in accessing data
Conventional file-processing environments do not allow needed data to be retrieved in a convenient and efficient manner. A new program needs to be implemented to carry out each new retrieval task.
-
Data isolation
Because data are scattered in various files, and files may be in different formats, writing a program to retrieve the appropriate data is difficult.
-
Integrity problems
Integrity constraints (e.g.,
account balance > 0
when the university maintains an account for each department and records the balance amount in each account) become "buried" in program code rather than being stated explicitly. Hence, it is hard to add new constraints or change existing ones. -
Atomicity of updates
Failures may leave database in an inconsistent state with partial updates carried out, e.g. Transfer of funds from one account to another must be atomic; either complete or not happen at all
-
Concurrent access by multiple users
Concurrent access is necessitated for high performance. Uncontrolled concurrent accesses can lead to inconsistencies, e.g. Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the same time.
-
Security problems
Not every user of the database system should be able to access all the data. But it is hard to provide user access to some, but not all, data in conventional file-processing systems.
Database systems offer solutions to all the above problems.
View of Data
A database system is a collection of interrelated data and a set of programs that allow users to access and modify these data. A major purpose of a database system is to provide users with an abstract view of the data. That is, the system hides certain details of how the data are stored and maintained.
Data Models
The data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. The data models can be classified into 4 different categories:
-
Relational model
All the data is stored in various tables. Each table has multiple columns, and each column has a unique name. Tables are also known as relations.
-
Entity-Relationship (E-R) model
Uses a collection of basic objects, called entities, and relationships among these objects.
-
Semi-structured data model
Permits the specification of data where individual data items of the same type may have different sets of attributes.
-
Object-based data model
Allow procedures to be stored in the database system and executed by the database system.
A large portion of future posts will be focused on the relational model because it serves as the foundation for most database applications.
Data Abstraction
The complexity of the data structures to represent data in the database is hidden from users through several levels of data abstraction to simplify users’ interactions with the system:
-
Physical level
Describes how the data are actually stored. It describes complex low-level data structures in detail.
-
Logical level
Describes what data are stored in the database, and what relationships exist among those data.
-
View level
Describes only part of the entire database. The view level of abstraction exists to simplify users’ interaction with the system. The system may provide many views for the same database as many users of the database system do not need all this information.
Instances and Schemas
Databases change over time as information is inserted and deleted.
-
The collection of information stored in the database at a particular moment is called an instance of the database.
-
The overall design of the database is called the database schema. Database systems have several schemas, partitioned according to the levels of abstraction.
-
Physical schema
Describes the overall physical structure of the database.
-
Logical schema
Describes the overall logical structure of the database.
-
Subschema
Describes different views of the database.
-
Physical schema
-
Analogy: A database schema corresponds to the variable declarations in a program, and the values of the variables in a program at a point in time correspond to an instance of a database schema.
The ability to modify the physical schema without changing the logical schema is called Physical data independence. In general, the interfaces between the various levels and components should be well defined, so that changes in some parts do not seriously influence others.
Reference
[1] Silberschatz, Abraham, Henry F. Korth, and Shashank Sudarshan. “Database system concepts.” (2011).
Leave a comment