It represents multiple interconnected databases. It Spread out across several sites connected by a network appear as a single database to the users. Distributed databases utilize multiple nodes. They scale horizontally and develop a distributed system. More nodes in the system provide more computing power, offer greater availability, and resolve the single point of failure issue.
Distributed Database Features
Location independency - Data is physically stored at multiple sites and managed by an independent DDBMS.
Seamless integration - Databases in a collection usually represent a single logical database, and they are interconnected.
Network linking - All databases in a collection are linked by a network and communicate with each other.
Transaction processing and management - Distributed databases incorporate transaction processing, an atomic process that is either entirely executed or not at all which is a program including a collection of one or more database operations.
For write operation, data is appropriately written to a node in the distributed database.
User not be aware of where the data is written.
Similarly, when a user wants to retrieve the data, it connects to the nearest node in the system which retrieves the data for it, without the user knowing about this.
This way, a user simply interacts with the system as if it is interacting with a single database.
With the increase in traffic from the users, we can easily scale our database by adding more nodes to the system.
Since these nodes are commodity hardware, they are relatively cheaper than adding more resources to each of the nodes individually.
Horizontal scaling is cheaper than vertical scaling.
This horizontal scaling makes replication of data cheaper and easier. This means that now the system can easily handle more user traffic by appropriately distributing the traffic amongst the replicated nodes that.
Types of Distributed Database
There are tow types of distributed database
1. Homogenous
Network of identical databases stored on multiple sites with same operating system, DDBMS, and data structure, making them
easily manageable.
It allows users to access data from each of the databases seamlessly.
2. Heterogeneous
Different sites have different operating systems, DBMS products and data models.
Diverse sites use dissimilar schemas and software.
The system may be composed of a variety of DBMSs like relational, network, hierarchical or object oriented.
Query processing is complex due to different schemas.
Transaction processing is complex due to dissimilar software.
Distributed database storage management
Distributed database storage is managed in two ways:
Replication
Fragmentation
Replication
The systems store copies of data on different sites.
If an entire database is available on multiple sites, it is a fully redundant database.
Advantage of database replication is increases data availability on different sites and allows for parallel query requests to be processed.
However, replication means that data requires constant updates and synchronization with other sites to maintain an exact database copy.
Any changes made on one site must be recorded on other sites, or else inconsistencies occur.
Constant updates cause a lot of server overhead and complicate concurrency control.
Fragmentation
Split into smaller parts
Each of the fragments is stored on a different site, where it is required.
Prerequisite for fragmentation is to make sure that the fragments can later be reconstructed into the original relation without losing data.
Advantage of fragmentation is that there are no data copies, which prevents data inconsistency.
Distributed Database Advantages
Modular Development: a system can be expanded to new locations by adding new servers and data to the existing setup and connecting them to the distributed system without interruption. This type of expansion causes no interruptions.
Reliability: offer greater reliability in contrast to centralized databases. In a distributed database, the system functions even when failures occur, only delivering reduced performance until the issue is resolved.
Lower Communication Cost: Locally storing data reduces communication costs for data manipulation in distributed databases.
Better Response: Efficient data distribution in a distributed database system provides a faster response when user requests are met locally.