Distributed database (DDB)



Definition:

A database that consists of two or more data files located at different sites on a computer network. Because the database is distributed, different users can access it without interfering with one another. However, the DBMS must periodically synchronize the scattered databases to make sure that they all have consistent data.


Description

The following has been taken from Google Answers.

A big advantage of distributed DBMSs over centralized ones is that of scalability. Growth can be sustained more gracefully in a distributed system.

Local autonomy is another reason for a business to implement a Distributed Database Management System (DDBMS). Since the nature of today's applications usually require data in geographic areas that are often decentralized, it often makes sense to implement a distributed system. In this way, data can physically reside nearest to where it is most often accessed, thus providing users with local control of data that they interact with.

Another reason why one might want to consider a parallel architecture is to improve reliability and availability of the data in a scalable system. In a distributed system, it is possible to access some, or possibly all of the data in a failure mode if there is sufficient data replication.


The major features of a DDB are:
  1. Data stored at a number of sites
  2. Sites are interconnected by a network
  3. DDB is logically a single database (although each site is a database site)
  4. DDBMS has full functionality of a DBMS
  5. To the user, the distributed database system should appear exactly like a non-distributed database system.

Advantages of distributed database systems are:
  1. local autonomy (in businesses/organisations that are distributed already). For example with the case study: we have two different theatres (CT and MTG) in two different locations. In one model of distrubted data, the data for each theatre is stored at each theatre site. The response time for the theatre and user geographically closer to the threatre would be faster. However the advantage is that the users can also access the data from MTG and not have to change to another database. The response time for accessing data for MTG may be slower but it is all the same system, it's just that when data is required from MTG (eg user searching for seats at a perfomance at MTG, the response time may be slower). Also if data is stored at CT for CT-related information, CT would have more control over the data (eg physical security, maintain systems and so on).
  2. improved performance (since data is stored close to where needed and a query may be split over several sites and executed in parallel). This means that if a person wants to enquire about seats available in both MTG and CT, the SQL search would take place at the same time for each one and therefore the response may be quicker.
  3. improved reliability/availability (should one site go down). So if MTG fails, the customer could still access information about MTG as there will be copies of the data on another site.
  4. economics (If there is a centralised database, then this would be cheaper, however if each theatre has their own system (which is the current situation) then servers, DBMS and all the system needs to be replicated at each site. With a DDBMS, this may the case (one system per theatre) so same cost for hardware and more configuration for distrubuting the data (and deciding how best to do that).
  5. expandability - yes, when another theatre joins the consortium, then data can be stored and accessed at that theatre with the addition of another server OR the theatre could have their data stored on one of the locations, for example with CT and not have to have another server installed.
  6. shareability - yes, as data for all theatres could be accessed and the user would not know that the data is been read at different locations.

Disadvantages of distributed database systems are:
  1. complexity: greater potential for bugs in software. This is much more complex to set up and would need a high level of technical knowledge. Also if the system failed, it may be difficult to get up and running in a limited time and sales may be lost.
  2. cost: software development can be much more complex and therefore costly. Also, exchange of messages and additional computations involve increased overheads. PTC would need decide if the initial cost of setting up and maintaining the system PLUS the possible delays in the retreival of data for customers is worthwhile. Optimisation of the distribution of the data would be required with lots of testing to find the best model for distributing the data.
  3. distribution of control (no single database administrator controls the DDB). Who will control the system and where will that person(s) be located. With they be able to control through remote access and will travel be involved. How will this person be funded?
  4. security (since the system is distributed the chances of security lapses are greater)
  5. difficult to change (since all sites have control of their own sites)
  6. lack of experience (enough experience is not available in developing distributed systems)


References and Resources:

References:

Google Answers "Distributed vs Centralized Database", http://answers.google.com/answers/threadview/id/65945.html, 11 January 2012
Distributed Databases, http://www.webopedia.com/TERM/D/distributed_database.html, 11 January 2012

Resources:

http://en.wikipedia.org/wiki/Distributed_database
http://www.mahipalreddy.com/dbdesign/dbarticle1.htm