What is Database Normalization?
In RDBMS we do not want to have the same information multiple times (redundancy), and we want to make sure, that integrity holds
We also need a way to uniquely identify data
Normalization is a process, in which the model of the real world (domain) is transfered into a concrete data model for the database, fulfilling various requirements (depending on the normal form) ===> if you have to design a database learn it.
Database schema
Formal described structure of the database (e.g., what tables are there, columns, index, etc.)
schema diagram displays that information – often used to talk about a database
Vendors implemented schema also for security reasons
- Schema is a defined group of database objects (like tables).
- A database can contain multiple schemas, one schema always belongs to one
database
- The same name could be used for the same type of object in different
schemas on the same database
TV Word - Schema Requirement
TV-Series
needs a name as string
same name could happen
will have a parental rating
has some actors (or not)
could be shown on streaming services (but has not to be)
Streaming services
needs a name as string
name has to be unique
Actors
have a name as string
have a firstname as string
both are not really unique, even together
names are mandatory
Create Actors
Key
In the database world a key uniquely identifies a “thing”
A key
- could be a unique attribute (e.g., TV-series name),
- or consist of multiple attributes (e.g., number-plate and car identity#) =>
compound key
- or just an not necessary (because not present in real world), but added attribute
=> surrogate key (most common)
Primary Key => uniquely identifies EVERY row in a table
Foreign Key => forms a relationship to another table
Key-Relationship Explained
Terminology – not always easy
Terminology often depends on the model we are talking about, therefore try to understand / recognize the model you are communicating (e.g., ER model, logical data model, etc.)
Sample:
- In ER-Model we find the entity type (formed of entities with the same attributes)
- In relational model, this is called relation
- In dbms / SQL this is called table (almost always, please use the following SQL in pgAdmin: select * from something and look for the result / message)
Database languages
DDL – Data Definition Language
This is the language a database schema is created in
In RDBMS with SQL statements like “create table …” are used
Constraints are created with that language
DML – Data Manipulation Language
This is the language to insert, retrieve, change and delete data from a database
In RDBMS with SQL statements like “insert into…”, “select x from ...”, “update y …” and “delete from …”
Now we go with practical Example
Query 1- Select Statement
Query 2- Select Statement
Selecting all attributes with shorting
Query 3 and 4/5- Creating Alias
Query 4- Schema Diagram
Query - 6 and 7
Query- 8
Write a query for the table “tvseries”. The result should contain the name of the series and the imdb rating. The top rated show should be the first displayed, if two or more shows share the same rating use their name to sort.
Data type DBMS
A datatype limits the possible data to a certain type of data
Look back to integrity, where we defined that integrity holds e.g., by using a specific data type. Otherwise, we could store everything as text (like in our notepad)
- Number (integer, real, double precision)
- String (character varying, char, text)
- Date and time
- Binary
- XML
- And much more
Data type - SQL
Expressions depends on data type
Sorting depends on data type
- String sorting is even more complex, because of different languages there are also different sorting rules (want to know more about that, look up for “database collation”)
Possible data ranges / input of data depends on data type
Example for expression with string – single quotation:
name = 'Netflix’
Vendor specific datatypes, and handling in expressions
Query and filter
Until now, we always returned all rows, or limited the number of results with the LIMIT expression
We need a way to filter the results to those which match our criteria
Comparison operators
Used to create the search criteria
=
<
>
<=
>=
<>
IN
BETWEEN
LIKE
IS NULL
NOT
Logical operators
If we have more than one search criterion
Combine them with AND and OR
Use Parentheses if necessary
Result of a logical expression is a truth value – either true or false
Know and get familiar with the truth table
Query - 9
output:
Query - 10/11
Write a query for the table “products”.
First, to get some clue about the data select 20 rows of data.
Second, we need the records of the actor CHEVY FOSTER, but only those where the price is higher than 12, sorted by the price.
Transaction
A transaction is an executing program, that forms a logical unit of database processing (Elmasri, et.al.)
Problem of concurrency (user, process, CPU, disk …)
Simple Problem: transfer money from one account to another account
- Account A => insert withdrawal record, update bank balance
- Account B => insert deposit record, update bank balance
- At least 4 DB-Operations have to be done
- Suppose there is an error after 2 DB-Operations, where is the money?
Transaction properties - ACID
Atomicity - perform everything of the unit or nothing
Consistency preservation – at the end of a succesful transaction the database is again in a consistent state
Isolation – the execution of a transaction should not be interferred by other transactions at the same time
Durability – the changes the execution of the transaction made have to be persistent and should not be lost due to failure
Query 12
Please have a look at the data of table actors. Do you find something in the data which is unusual or suspicious?
This was easy, what did you find?
What is NULL ?
Special value denoting “no data / information”
What would be the alternative (if there is no NULL):
- Using a magic value (e.g., store 0 for “we do not know the year born”)
- However, what if this magic value becomes a normal value (e.g., we store a
person born in year 0)
In SQL NULL often needs special treatment, e.g., queries use:
- IS NULL for finding records where an attribute does not have a value
- IS NOT NULL for finding records where an attribute has a value
Query - 13
Select the firstname, name, and age of those actors who are older than 40, sorted by their age.
Change it to having those actors who are older than 0. Think about the result compared to the discovery of the previous query (NULL-value)
What happens if you use this query next year?
The probability that the age of the one person,
where the value of year born is null, is higher than 0?
Query 14/15
Select the data of those actors, who do not have a year born stored.
Challenge: if you really are comfortable already, try to list the names, and the year born, and for the missing year born data display “no value” in the result
SQL Join
If we normalize our data model (which is the typical process), we will end up by having multiple tables which may be in relation
Cardinality of relationship
Defines the relationship of data between two tables
Types of cardinality (always read from left to right)
- 1:1 – each record on one table (left side) has exactly zero or one record in the
other table (right side)
- 1:n – each record on one side will have zero to n records on the other side
(right). Could be also read in the opposite way (therefore, sometimes you will
also read n:1)
- n:n – each record on one side will have zero to n records on the other side,
and, importantly, vice versa.
When creating the data base model, foreign keys are used to relate to facts (rows) of other tables
Cardinality of relationship – this class
1:1 – student and desk today
1:n – student and contribution
- one student can make zero to multiple contributions.
- each contribution can only be from one student
n:n – student and courses
- each student can take multiple courses
- each course is taken by multiple studens
Two tables, foreign key
Two tables, foreign key
Three tables: student, course, student_in_course (called junction table) with two foreign keys
Cardinality of relationship – tv domain
1:1 – not there
1:n – parental rating and TV-series
- one rating can be used on zero to multiple TV-series
- Each TV-series can only have one rating
n:n – actor and TV-series
- each actor can be in the cast of multiple TV-series
- each TV-series can have a cast of multiple actors
parentalrating.id = tvseries.ratingId
actoroftvseries.tvseriesId = tvseries.id
actoroftvseries.actorid = actors.id
JOIN - continued
In SQL we use a “join” to denormalize the relationship and get data of other tables into the result set
We have to think about the result set, and we have to be carefully when using columns to join where the column allows NULL
Types:
- inner join – could be replaced by FROM and WHERE extensions
- right, left and full outer join – like in cardinality, the side matters
Query - 16 INNER JOIN
Venn diagram
We can look at our data with venn diagrams, because our data are also sets
Query 17 Left Outer Join
JOIN – continued further
inner join => only rows, where the join condition is met will be in the result set
left outer join => all left table rows will be in the result set, and if the join condition is met, the matching rows of the right table will be joined
right outer join => just switch the side; difference is if the statement gets “longer”, more joins etc.
Look for the cardinality in the model, and look for NULL
Add new information to our domain
We recognized, that some TV-series are new editions of already produced series. Samples are e.g., “Knight Rider”, “Hawaii Five-O”. We changed our table to reflect that new information.
Query 18 - Self join – not another join!
Use double quotes for aliases if they contain special characters (like blank)
Query 19 – how to write longer stmts
Question: What TV-series has what cast? Or, in other words, what actors are playing in what TV-series?
Query 19 – how to write longer stmts
1) Start by the first table and introduce table-alias already!
2) Add the next table, which has a relation to the first table and connect them
3) Add the next table, which has a relation to the second table and connect them
Query 19 – another Question
Are all TV-Series in the result?
No, not really. However, why?
Query 20 (change 19)
Question: What TV-series has what cast? Or, in other words, what actors are playing in what TV-series? Show all TV-series, even they have no cast at all.
Query 21 – full outer join
Query 22
We want a list of all tvseries (the showname and the imdb-rating) not playing on a streaming service at the moment.
Sort the list by the showname.
Aggregation
Until now, we always created resultsets with / from records of tables.
In SQL we often aggregate data, to find new information based on the data stored, e.g., find the highest value, create a sum, or count the number of records
- min
- max
- avg
- sum
- count
- Vendor specific
Query 23 and 24
You are on / Query 25
There is a table called orders having information about orders from customers at the DVD store. Get familiar with the table (columns, data).
We need for the customer with the id 6704 the date of the first order, the amount of money (net) she spend on all orders, the number of orders she placed, and the average netamout she spend per order
Query 25 - orders
You are on / Query 26
In our public DVD-store we have to renew the description of our F-shelf. In the F-Shelf are all products from categories starting with F.
We need now the following information:
How many products will be in the shelf, what is the price of the cheapest and what of the most expensive product.
Query 26 – products and categories
Query 26 – alternative solution
To get any other help related to SQL, Database or other database management system you can send your requirement details here:
realcode4you@gmail.com
Kommentit