What is a Transaction?
A transaction is a unit of program execution that accesses and possibly updates various data items.
Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Two main issues to deal with:
Failures of various kinds, such as hardware failures and system crashes
Concurrent execution of multiple transactions
The Basic Architecture
A typical transaction server consists of multiple processes accessing data in shared memory.
Server processes
These receive user queries (transactions), execute them and send results back
Processes may be multithreaded, allowing a single process to execute several user queries concurrently
Typically multiple multithreaded server processes
Lock manager process
More on this later
Database writer process
Output modified buffer blocks to disks continually
Transaction-Related Processes
1. Shared memory contains shared data
Buffer pool
Lock table
Log buffer
Cached query plans (reused if same query submitted again)
2. All database processes can access shared memory
3. To ensure that no two processes are accessing the same data structure at the same time, databases systems implement mutual exclusion using either.
Operating system semaphores
Atomic instructions such as test-and-set
4. To avoid overhead of interprocess communication for lock request/grant, each database process operates directly on the lock table
instead of sending requests to lock manager process
5. Lock manager process still used for deadlock detection
Mutex and Semaphore
1. Semaphore or mutex is a multi-threaded flow of control construct that blocks execution of threads under certain conditions.
When a mutex or semaphore is available, a thread can acquire the mutex or semaphore and continue execution.
When a mutex or semaphore is not available, a thread is blocked from further execution when it attempts to acquire the mutex or semaphore. T
Threads that have successfully acquired a mutex or semaphore must eventually release it to make it available for another thread.
2. A mutex is used for mutual exclusion
A region of code that begins with a call to acquire a mutex and ends with a call to release the same mutex, can only have one thread in the code at a time.
3. A semaphore is used for flow control, to restrict the number of threads executing a block of code that begins with a call to acquire the semaphore, and ends with a call to release the semaphore.
4. The difference between a mutex and a semaphore is that only one thread at a time can acquire a mutex, but some preset number of threads can concurrently acquire a semaphore.
A mutex is sometimes called a binary semaphore.
Transactions have their own Work Areas
Transaction States
Active – the initial state; the transaction stays in this state while it is executing
Partially committed – after the final statement has been executed.
Failed – after the discovery that normal execution can no longer proceed.
Aborted – after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted.
- restart the transaction
- can be done only if no internal logical error
- kill the transaction
Committed – after successful completion
Requirements for Transactions
1. Atomicity requirement
if the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent database state
- Failure could be due to software or hardware
the system should ensure that updates of a partially executed transaction are not reflected in the database
2. Durability requirement
Once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist even if there are software or hardware failures.
3. Consistency requirement
the sum of A and B is unchanged by the execution of the transaction
4. In general, consistency requirements include
Explicitly specified integrity constraints such as primary keys and foreign keys
Implicit integrity constraints
- e.g., sum of balances of all accounts, minus sum of loan amounts must equal value of cash-in-hand
5. A transaction must see a consistent database.
6. During transaction execution the database may be temporarily inconsistent.
7. When the transaction completes successfully the database must be consistent
Erroneous transaction logic can lead to inconsistency
8. Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be).
9. Isolation can be ensured trivially by running transactions serially
that is, one after the other
10. However, executing multiple transactions concurrently has significant benefits
ACID Properties
1. A Transaction is a unit of program execution that accesses and possibly updates various data items.
2. To preserve the integrity of data the database system must ensure:
Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Consistency. Execution of a transaction in isolation preserves the consistency of the database.
Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions.
- That is, for every pair of transactions Ti and Tj , it appears to Ti that either Tj , finished execution before Ti started, or Tj started execution after Ti finished.
Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
Concurrent Executions
1. Multiple transactions are allowed to run concurrently in the system. Advantages are:
increased processor and disk utilization, leading to better transaction throughput
- E.g. one transaction can be using the CPU while another is reading from or writing to the disk
reduced average response time for transactions: short transactions need not wait behind long ones.
2. Concurrency control schemes – mechanisms to achieve isolation
that is, to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database
Serializability
Serializability is a concurrency scheme where the concurrent transaction is equivalent to one that executes the transactions serially.
A schedule is a list of transactions.
Serial schedule defines each transaction is executed consecutively without any interference from other transactions.
Non-serial schedule defines the operations from a group of concurrent transactions that are interleaved.
- In non-serial schedule, if the schedule is not proper, then the problems can arise like multiple update, uncommitted dependency and incorrect analysis.
The main objective of serializability is to find non-serial schedules that allow transactions to execute concurrently without interference and produce a database state that could be produced by a serial execution.
Serializable vs. non-Serializable Schedules
Conflict Serializability
Conflict serializability defines two instructions of two different transactions accessing the same data item to perform a read/write operation.
It deals with detecting the instructions that are conflicting in any way and specifying the order in which the instructions should execute in case there is any conflict.
A conflict serializability arises when one of the instruction is a write operation.
The following rules are important in Conflict Serializability,
If two transactions are both read operations, then they are not in conflict.
If one transaction wants to perform a read operation and other transaction wants to perform a write operation, then they are in conflict and cannot be swapped.
If both the transactions are for write operation, then they are in conflict, but can be allowed to take place in any order, because the transactions do not read the value updated by each other.
Conflict serializability is a sufficient condition for serializability
Concurrency Control
Concurrency control manages the transactions simultaneously without letting them interfere with each another.
The main objective of concurrency control is to allow many users perform different operations at the same time.
Using more than one transaction concurrently improves the performance of system.
If we are not able to perform the operations concurrently, then there can be serious problems such as loss of data integrity and consistency.
Concurrency control increases the throughput because of handling multiple transactions simultaneously.
It reduces waiting time of transaction.
Lock-Based Protocol for Concurrency Control
Lock is a mechanism which is important in a concurrent control.
It controls concurrent access to a data item.
It assures that one process should not retrieve or update a record which another process is updating.
Two lock modes
Shared lock – In this mode, the locked data items can only read without performing modification to it from the database
Exclusive lock – In this mode, the locked data items can be read as well as written
Yorumlar