SQLite3, together with other database products like MySQL, Oracle, PostgreSQL, etc. belongs to the family of databases called Relational Databases (RDBMS). In this type of database, data units and their internal relationships are organised into tables, each table representing an entity (e.g. customers, orders or relationship). The structure and organisation of these tables need to be established beforehand as the database schema. In Topic 1, we saw how the schema is set up in SQLite3 when we first create the necessary tables.
In contrast, NoSQL databases are schema-less. The data are organised in terms of documents. Some of these databases save data in JSON like forms, so the information could be mapped directly to dictionaries in our programming language. Because there is no need to predefine schema, the flexibility allows these types of databases for rapid prototyping of small projects.
At high level, the differences between these two types of databases are:
SQL database are known as Relational Databases; whereas NoSQL (Not Only SQL) databases are known as non-relational or distributed database.
SQL databases are table-based, each table consisting of rows of data; whereas NoSQL databases are document-based, key-value pairs, graph databases or widecolumn stores - there are no standard schema definitions.
SQL databases can be scaled vertically by increasing the power of hardware (e.g. CPU, RAM SSD) to achieve faster calculation when the load increases; NoSQL databases are scaled by increasing the number of servers in the resource pool to reduce the load.
SQL databases use SQL statements for defining and manipulating data; In NoSQL databases, the syntax to retrieve the collection of documents varies from database to database, depending on vendor’s implementation.
SQL databases are good fit for complex query intensive environment; NoSQL databases do not have standard interfaces to perform complex queries, and the queries themselves in NoSQL are not as powerful as SQL query language.
NoSQL databases are better for hierarchical data storage as they generally follow the key-value pair way of storing data similar to JSON data. They are also preferred for large data sets (big data) due to the distributed computing capabilities.
For high transactional based application (e.g. stock exchanges), SQL databases are still the best fit, as it is more stable and promises atomicity (i.e. complete update of databases, instead of incomplete or partial changes) and integrity of data. NoSQL databases are less comparable in high load or complex transactional applications, although vendors are working towards achieving that.
SQL databases emphasises on ACID properties (Atomicity, Consistency, Isolation and Durability), which are intended to guarantee validity even in the event of errors, power failures, etc. whereas the NoSQL databases follows the Brewer CAP theorem (Consistency, Availability and Partition tolerance).
NoSQL Database - MongoDB
The best way to see how a NoSQL Database operates is to test it out. For this purpose, we use MongoDB which is a popular document-centric NoSQL database.
https://www.mongodb.com/
You can see a NoSQL database in action by setting up a demo database at mLab (https:// mlab.com/), which is a cloud-based MongoDB provider. There is a free service where you can create a database instance to experiment and pilot trial projects.
Mongodb saves data in JSON like document. Here in our example, we have a demo database known as demo. In it, we have a collection (like a table in SQL) called user. There are three documents in user, which are organised like this:
output:
Each MongoDB document has a “_id” field, which, you will see later, is automatically created when data is added. It works like the Primary Key in SQL tables, which is used by the database for identification and access purpose.
To connect to our MongoDB database on mLab, we need to use the pymongo library, which needs to be installed together in the same environment like the other Python packages. Then we set up a database connection to the URI using the format specified by mLab.
The codes are as follows:
from pymongo import MongoClient
URI = 'mongodb://user:password1@ds016118.mlab.com:16118/demo'
conn = MongoClient(URI)
db = conn['demo']
collection = db['user']
data = collection.find()
data.count()
output:
out:
3
The output should return the number of documents in the database. Instead of data.count(), if we use a loop to print each row of the data, we should get output like this:
for d in data:
print d
output:
out:
{u'_id': ObjectId('5b1f3531f9f31504cd8fbf0c'), u'name': u'john', u'email': u'john@gmail.com'}
{u'_id': ObjectId('5b1f35aaf9f31504cd8fbf0f'), u'name': u'mary', u'email': u'mary@yahoo.com'}
To find a particular item in a collection, we use the find method taking a dictionary query:
result = collection.find({'name': 'john'})
list(result)
To add new data to a collection, the syntax is:
collection.insert_one({'name': 'roger', 'email': 'roger@gmail.com'})
For example, we can use the code below to add a series of data into a new collection, demo_collection. Notice that, unlike SQL, in NoSQL database like Mongodb, there is no need to define the collection structure in advance.
URI = 'mongodb://user:password1@ds016118.mlab.com:16118/demo'
conn = MongoClient(URI)
db = conn['demo']
collection = db['demo_collection']
song_list = [
{'song':'rainbow', 'plays':'3'},
{'song':'sea side', 'plays':'4'}
]
collection.insert_many(song_list)
The “schema-less” nature of NoSQL database can be clearly seen in the example below. We added new attribute artist to one of the existing records, and inserted another new record with another new feature publisher. The database added the new data with no questions asked.
from pymongo import MongoClient
URI = 'mongodb://user:password1@ds016118.mlab.com:16118/demo'
conn = MongoClient(URI)
db = conn['demo']
collection = db['demo_collection']
collection.find_one_and_update(\
{'plays': '3'}, {'$set': {'plays': '10', 'artist': 'peter'}})
collection.insert_one(\
{'song':'new song', 'publisher':'record company'})
data = collection.find()
for d in data:
print d
Handling Complex Data
The ability to deal with complex data allows us to process and analyse different kinds of data. One of the common scenarios is to analyse information collected over a period of time. In order to do that, we will need to be comfortable with the retrieval and use of complex data like dates and time.
Date or date-time is usually stored in a database as a string and there are many ways to represent them, with each one requiring a separate way to encode (i.e. store) or decode (read) them. The recommendation is to use the International Standard Organisation time (ISO) format as the string representation for dates and times and using the Coordinated Universal Time (UTC) form. The formats of the date and date-time strings are as follows:
2018-08-05 A date (Python code %Y-%m-%d)
2018-08-05T12:22:53Z A UTC (Z after time) date and time (T%H:%M:%S)
ISO 8601 is the standard used by JavaScript and is easy to work with in Python. JavaScript is a popular programming language used in HTML/Web page programming. What we want to be able to do is to create string representation that can be passed between Python and other languages such as JavaScript, and be easily encoded and decoded at both ends.
Let’s take a date and time, in Python datetime format, convert it into a string, and see how JavaScript can process it. First, we use Python datetime library to produce the string:
from datetime import datetime
d = datetime.now()
print d
print d.isoformat()
Out:
2018-06-14 15:03:02.282621
2018-06-14T15:03:02.282621
On your Google Chrome browser, open up the javascript console (refer to Figure 2.3). This will open up on the right side of the browser the javascript console panel, where you can key in javascript directly. If you type:
date_string = '2018-06-14T15:03:02.282621'
> "2018-06-14T15:03:02.282621"
date = new Date(date_string)
> Thu Jun 14 2018 15:03:02 GMT+0800 (Singapore Standard Time)
you see that Javascript reads in the ISO formatted string to create its Date object.
We can get back the date-time to ISO 8601 string form with the toISOString() method:
Thu Jun 14 2018 15:03:02 GMT+0800 (Singapore Standard Time)
date.toISOString()
"2018-06-14T07:03:02.282Z"
In Python, we can read back this string as time date. You can use the dateutil module to check to see if the data is properly processed.
date_string = '2018-06-14T07:03:02.282Z'
from dateutil.parser import parse
parse(date_string)
Out:
datetime.datetime(2018, 6, 14, 7, 3, 2, 282000, tzinfo=tzutc())
Comments