An overview of the NoSQL databases  

a. NoSQL business use cases 

3. The architecture, data storage and access. Architecture, data storage and access 

b. Comparison of SQL RDBMS’ and MongoDB, their approach to data storage and access, including both strengths and weaknesses. 

4. How MongoDB achieves performance optimization 

a. A description of the performance optimization approach 

b. A comparison of MongoDB and RDBMS’ approach to optimization, including both strengths and weaknesses.

5. Other considerations 

a. An overview of the technical aspects of the data management system (i.e., transaction management, concurrent control, backup and recovery, security).

b. A description of MongoDB approach to the technical considerations reviewed in 5.a. 

c. A comparison of MongDB and RDBMS’ approach to the technical considerations reviewed in 5.a, including both strengths and weaknesses. 

6. Conclusion 

7. list of references/bibliography 

Overview of NoSQL databases

NoSQL databases are interchangeably referred to as “non relational,” “NoSQL DBs,” or “NoSQL” to highlight the fact that they can handle huge volumes of rapidly changing, unstructured data in different ways than a relational (SQL) database with rows and tables. The acronym NoSQL was first used in 1998 by Carlo Strozzi while naming his lightweight, open-source “relational” database that did not use SQL.[1] The name came up again in 2009 when Eric Evans and Johan Oskarsson used it to describe non-relational databases. Relational databases are often referred to as SQL systems. The term NoSQL can mean either “No SQL systems” or the more commonly accepted translation of “Not only SQL,” to emphasize the fact some systems might support SQL-like query languages.

NoSQL developed at least in the beginning as a response to web data, the need for processing unstructured data, and the need for faster processing. The NoSQL model uses a distributed database system, meaning a system with multiple computers. The non-relational system is quicker, uses an ad-hoc approach for organizing data, and processes large amounts of differing kinds of data. For general research, NoSQL databases are the better choice for large, unstructured data sets compared with relational databases due to their speed and flexibility. Not only can NoSQL systems handle both structured and unstructured data, but they can also process unstructured Big Data quickly[2]

Four core features of NoSQL, shown in the following list, apply to most NoSQL databases. The list compares NoSQL to traditional relational DBMS:

  • Schema agnostic: A database schema is the description of all possible data and data structures in a relational database. With a NoSQL database, a schema isn’t required, giving you the freedom to store information without doing up-front schema design.
  • Nonrelational: Relations in a database establish connections between tables of data. For example, a list of transaction details can be connected to a separate list of delivery details. With a NoSQL database, this information is stored as an aggregate — a single record with everything about the transaction, including the delivery address.
  • Commodity hardware: Some databases are designed to operate best (or only) with specialized storage and processing hardware. With a NoSQL database, cheap off-the-shelf servers can be used. Adding more of these cheap servers allows NoSQL databases to scale to handle more data.
  • Highly distributable: Distributed databases can store and process a set of information on more than one device. With a NoSQL database, a cluster of servers can be used to hold a single large database.

There were more than 250 NoSQL databases but the following are the leading ones:

NoSQL Database TypesLeading Databases
ColumnarDataStax, Apache Cassandra, HBase, Apache Accumulo
Key-valueBasho Riak, Redis, Voldemort, Aerospike, Oracle 
GraphNeo4j, Ontotext’s GraphDB (formerly OWLIM), MarkLogic, OrientDB
Data DocumentMongoDB, MarkLogic, CouchDB, FoundationDB, IBM 
Search engineApache Solr, Elasticsearch, MarkLogicHybrid: OrientDB, MarkLogic, ArangoDB [3]

Business applications or use cases of  NoSQL Databases 

Advertising 

Content and Metadata Store : To handle the storage of large amounts of data such as digital content, e-books etc., many companies like publication houses require larger storage to merge various tools for learning in a single platform. Amazon for instance uses Elasticache Redis which is a key-value based NOSQL database.The applications which are content based, for such application metadata is very frequently accessed data which needs less response times. NoSQL provides flexibility in faster access to data and to store different types of contents for  building applications based on content.

Ad Targeting : Displaying ads or offers on the current web page is a decision with direct income To determine what group of users to target, on a web page where to display ads, the platform gathers behavioral and demographic characteristics of users. A NoSQL database enables ad companies to track user details and also place them very quickly and increases the probability of clicks. AOL, Mediamind and PayPal are some of the ad targeting companies which uses NoSQL.[5]

Session/User Profile Store

Managing session data using relational databases could be cumbersome, especially in cases where applications are grown very much. In such cases the right approach is to use a global session store, which manages session information for any user who visits the site. NOSQL is suitable for storing such web application session information which is usually large in size.Since the session data is unstructured in form, it is easy to store it in schemaless documents rather than in relation database records. To enable online transactions, authentication of users, user preferences  etc.,, it is required to store the user profile by web and mobile application.In recent times, adoption of web and mobile applications have increased tremendously. The relational database could not handle such a large volume of user profile data which increases daily, as it is limited to a single server.  NOSQL becomes the only way out as its capacity can be easily increased by adding server, which makes scaling cost effective

  • Mobile Applications

Ever since the use of smartphones became ubiquitous, there is an emergent problem relating to growth and volume in mobile applications. Using NoSQL database mobile application development can be started with small size and can be easily expanded as the number of users increases, which is very difficult if you consider relational databases. Since NoSQL databases store the data in schema-less for the application developer can update the apps without having to do major modification in the database. The mobile app companies like Kobo and Playtika, use NOSQL and serve millions of users across the world.

  • Third-Party Data Aggregation in E-commerce

For commercials and other business related reasons e-commerce or consumer goods companies often keep track and get sales data and shopper’s purchase history from stores. Because of the massive amount of data being generated, the NoSQL database like MongoDB can handle such data being generated at high speeds from many data sources 

  • Internet of Things

In the world of today, millions of devices are connected to the internet ranging from smart surveillance cameras in homes and sensors in smart cars . From these devices large volumes of data are generated for analysis and decision making by humans and the devices themselves. Relational databases cannot handle such Big data. The NOSQL permits organizations to expand concurrent access to data from millions of devices and systems which are connected, store huge amounts of data and meet the required performance.

  • Social Gaming

Data-intensive applications such as social games which can grow users to millions. Such a growth in number of users as well as amount of data requires a database system which can store such data and can be scaled to incorporate a growing number of users NOSQL is suitable for such applications. NOSQL has been used by some of the mobile gaming companies like electronic arts, zynga and tencent.[6]

To narrow down the scope of  this paper, we shall focus on MongoDB which is an open-source database that uses a document-oriented data model and a non-structured query language. It is one of the widely used  NoSQL systems in use today. As a NoSQL database, it does not use the usual rows and columns, found in relational databases like SQL. 

MongoDB – A NoSQL Database

Architecture

It is an architecture that is built on collections and documents. The basic unit of data in this database consists of a set of key–value pairs. It allows documents to have different fields and structures. This database uses a document storage format called BSON which is a binary style of JSON documents. The data model that MongoDB follows is a highly elastic one that lets you combine and store data of multivariate types without having to compromise on the powerful indexing options, data access, and validation rules. There is no downtime when you want to dynamically modify the schemas. What it means is that you can concentrate more on making your data work harder rather than spending more time on preparing the data for the database.[4]

MongoDB_Architecture.jpg

Source:https://www.youtube.com/watch?v=HsOZn7eGx68 [8]

Core Architecture Principles [7]

  • Documents and MongoDB Query Language: The Fastest Way to Innovate

Built around JSON-like documents, document databases are both intuitive and flexible for developers to work with. They promise higher developer productivity, and faster evolution with application needs. As developers have experienced these benefits, the document data model has become the most popular alternative to the tabular model used by traditional relational databases. Four main advantages of Mongo DB are:

Intuitive: Faster and Easier for Developers  : Documents in the database directly map to the objects in your code, so they are much more natural to work with.

Flexible Schema: Dynamically Adapt to Change: A document’s schema is dynamic and self-describing, so you don’t need to first pre-define it in the database. Fields can vary from document to document and you modify the structure at any time, allowing you to continuously integrate new application functionality, without wrestling with disruptive schema migrations. If a new field needs to be added, it can be created without affecting all other documents in the collection, without updating a central system catalog and without taking the database offline. 

Universal: JSON Documents are Everywhere : Lightweight, language-independent, and human readable, JSON has become an established standard for data communication and storage. Documents are a superset of all other data models so you can structure data any way your application needs – rich objects, key-value pairs, tables, geospatial and time-series data, and the nodes and edges of a graph.

Powerful: Serve any Workload : An important difference between databases is the expressivity of the query language, richness of indexing, and data integrity controls. The MongoDB Query Language is comprehensive and expressive. Ad hoc queries, indexing, and real time aggregations provide powerful ways to access, group, transform, and analyze your data. 

  • A multi-cloud, global database : Freedom and Flexibility

To harness the tremendous rate of innovation in the cloud and reduce the risk of lock-in, project teams should build their applications on data platforms that deliver a consistent experience across any environment. MongoDB can be run anywhere – from developer laptops to mainframes, from private clouds to the public cloud.

Broadest Reach – Private, Hybrid, Public Clouds: With a common deployment model, it enables you to take advantage of unique capabilities in each platform without changing a single line of application code and without the heavy lift and risk of complex database migrations.

MongoDB Atlas – Fully Automated Database as a Service : MongoDB Atlas is the global cloud database service for modern applications. You can deploy fully managed MongoDB across AWS, Azure, or Google Cloud with best-in-class automation and proven practices that guarantee availability, scalability, and compliance with security standards. 

MongoDB Run by You, with Tools from Us: If you need to run the database on your own self-managed infrastructure for business or regulatory requirements, MongoDB offers on-premises management tools. These tools can be used to power a MongoDB database behind a single application, or to build your own private database service and expose it to your development teams. 

Distributed Architecture – Scalable, Resilient and Mission Critical : Through replica sets and native sharding, MongoDB enables you to scale out your applications with always-on availability. You can distribute data for low latency user access, while enforcing data sovereignty controls for data privacy regulations such as the GDPR

Availability and Data Protection with Replica Sets : MongoDB replica sets enable you to create up to 50 copies of your data, which can be provisioned across separate nodes, data centers, and geographic regions. Replica sets are predominantly designed for resilience.

 Scale-Up, Scale-Out, Scale Across Storage Tiers : MongoDB can be scaled vertically by moving up to larger instance sizes. As a distributed system, MongoDB can perform a rolling restart of the replica set to enable you to move between different instances without application downtime. Sharding with MongoDB allows you to seamlessly scale the database as your applications grow beyond the hardware limits of a single server, and it does so without adding complexity to the application. 

Privacy and Security: With the digital economy becoming so essential for economic prosperity, it’s no surprise that governments and enterprises around the world are responding to growing public concern for the safety of personal data.

  • MongoDB Cloud: Unified developer experience for modern applications

Modern data architectures are not limited to the transactional database. Many applications also require analytics and search functionality, which often requires teams to learn, deploy, and manage additional systems. If you’re building mobile apps, you’ll need to deal with data on the device and syncing it to the backend. You may also find yourself building data visualizations, writing a lot of glue code to move data between data services, or creating and operating custom data access APIs. In MongoDB Cloud, the database is fully integrated with other data services with automatic syncing, data tiering, and federated query. Search indexes run alongside the database and are automatically kept in sync. Aged data can be auto archived to cloud storage, providing fully managed data tiering while retaining access. Queries are automatically routed to the appropriate data tier without requiring you to think about data movement, replication, or ETL. MongoDB Cloud can even automatically sync backend data to an embedded database on mobile devices.  Central to MongoDB Cloud is MongoDB Atlas. MongoDB Cloud extends Atlas with other data services that work with it seamlessly, giving you more ways of working with data

MongoDB Atlas Search :  Atlas Search makes it easy to create fast, relevant, full-text search capabilities on top of your data in the cloud, and is built on top of Apache Lucene, the industry standard library.

MongoDB Atlas Data Lake: Atlas Data Lake brings a serverless, scalable data lake to the cloud platform with an on-demand query service that enables you to analyze data in cloud object storage (Amazon S3) in-place using the MongoDB Query Language (MQL).

MongoDB Realm for Data at the Edge : The Realm Mobile Database extends your data foundation out to the edge of the network and is fully integrated with the MongoDB Cloud. Realm is a lightweight database embedded directly on the client device. Realm helps solve the unique challenges of building for mobile, making it simple to store data on-device and enabling data access even when offline.

 MongoDB meets the demands of modern apps with a data platform built on three core architectural foundations:

  1.     The document data model and MongoDB Query Language, giving developers the fastest way to innovate in building transactional, operational and analytical applications.
  2.     A multi-cloud, global database, giving developers the freedom to run their applications anywhere with the flexibility to move across private and public clouds as requirements evolve – without having to change a single line of code.
  3.     The MongoDB Cloud, providing a unified developer experience for modern applications than span cloud to edge, in database, search, and the data lake, backed by comprehensive & integrated application services.

Data Storage

  • Storage engine is the primary component of MongoDB responsible for managing data. MongoDB provides a variety of storage engines, allowing you to choose one most suited to your application.

      MongoDB supports multiple storage engines, as different engines perform better for specific workloads. Choosing the appropriate storage engine for your use case can significantly impact the performance of your applications. The default            storage engine is WiredTiger Storage Engine.

WiredTiger (Default) : It is well-suited for most workloads and is recommended for new deployments. WiredTiger provides a document-level concurrency model, checkpointing and compression, among other features.

Encrypted : Encryption at rest when used in conjunction with transport encryption and good security policies, can help ensure compliance with security and privacy standards.

3rd Party :  MongoDB supports a 3rd party storage engine which can define its own storage strategies based on the database workloads

  • The journal is a log that helps the database recover in the event of a hard shutdown. There are several configurable options that allows the journal to strike a balance between performance and reliability that works for your particular use case.

With journaling, the recovery process:

  1. Looks in the data files to find the identifier of the last checkpoint.
  2. Searches in the journal files for the record that matches the identifier of the last checkpoint.
  3. Apply the operations in the journal files since the last checkpoint.
  • GridFS is a versatile storage system that is suited to handling large files, such as those exceeding the 16 MB document size limit.

A Storage API layer is provided to support different platforms and languages to do all the LOOKUP/READ/WRITE operations in the data storage. A data layer or the DOCUMENT data model is define on top of the storage layer to efficiently store documents and collections in MongoDB. The Data layer is effectively made up of  Collections of Documents[10] stored ad BSON (Binary JSON) objects in the storage. DML operations like Indexes are also represented at the Data Layer.

Data Access (Query) [8]

The Query Engine in MongoDB is robust and supported in different languages. MQL  (MongoDB Query Language)  is effectively a JSON/SQL query language that handles the following components:

  • Command Parser/Validator : This component enables schema validation, document validation as well ensure restrictions are properly met before the database instruction set can be executed. 
  • DML :  All instructions set are evaluated to identify if the database instruction is a document insertion, updating, creation of indexes etc. 
  • READ Operation:  The READ operation handles lookup/select queries to return get a collection of documents. The READ supports various find commands.
  • WRITE Operation: The WRITE operation handles document insertions and updates in the databases.
  • Query Planner: If the query is a select query to return a large collection of documents, the Query Analyzer is responsible for understanding the query using query planner

Comparison of MongoDB versus RDBMS

CategoryMONGODBRDBMS
ArchitectureSchema-less or Dynamic schemaAdheres to a Schema for a database and should be designed correctly.
ArchitectureHorizontally Scalable by adding more serversVertically scalable by increasing RAM
ArchitectureEmphasizes on CAP Theorem (Consistency, Availability and Partition Tolerance) [9]Emphasizes on ACID Properties ( Atomicity, Consistency, Isolation, Durability) [9]
ArchitectureFlexible data modelPre-defined, well thought out data model.
ArchitectureLess vulnerable to Security threats like SQL Injection [9]Quite vulnerable to SQL Injection [9]
Data StorageSuitable for Hierarchical Data StorageRDBMS architecture is best suited for relational data model storage
AccessMongoDB supports JSON and SQL as a query languageRDBMS supports only SQL query language
AccessDeep query capabilities including full-text search, regular expressions, dynamic queries, aggregation and sorting [9]Query capabilities primarily uses SQL and supports joins for relational tables along with transactional support and triggers [9]
AccessBig improvement in performances of queries due to Document data model architecture [9]Less performant and focusses primarily on Integrity due to transactional architecture [9]
ApplicationSuitable for large Logging and content management applications and not very useful for Relational data modelsSuitable for relational model where data integrity and durability is highly desired.

Performance Optimization

The Performance Advisor monitors queries that MongoDB considers slow and suggests new indexes to improve query performance. The threshold for slow queries varies based on the average time of operations on the cluster to provide recommendations pertinent to the workload. [11]

Recommended indexes are accompanied by sample queries, grouped by query shape (A combination of a query predicate, sort, and projection.), that were run against a collection that would benefit from the suggested index. 

The Performance Advisor monitors queries that take longer than 100 milliseconds to execute and groups these queries into common query shapes. The Performance Advisor calculates the inefficiency of each query shape by considering the following aggregated metrics from queries which match the shape [12]:

  • Amount of time spent executing the query.
  • Numbers of documents scanned.
  • Numbers of documents returned.

To establish recommended indexes, the Performance Advisor uses these metrics in a formula to calculate the Impact, or performance improvement that creating an index matching that query shape would cause. The Performance Advisor compares the amount of time spent executing index-specific operations to the total operational latency in the deployment. When the Performance Advisor suggests indexes, the indexes are ranked by their Impact score.

Common Reasons for Slow Queries:

If a query is slow, common reasons include:

  • The query is unsupported by current indexes.
  • Some documents in the collection have large array fields that are costly to search and index.
  • One query retrieves information from multiple collections with $lookup.

Index Considerations[13]:

Indexes improve read performance, but a large number of indexes can negatively impact write performance since indexes must be updated during writes. If your collection already has several indexes, consider this tradeoff of read and write performance when deciding whether to create new indexes. Examine whether a query for such a collection can be modified to take advantage of existing indexes, as well as whether a query occurs often enough to justify the cost of a new index.

As we develop and operate applications with MongoDB, we may need to analyze the performance of the application and its database. When we encounter degraded performance, it is often a function of database access strategies, hardware availability, and the number of open database connections.

Some users may experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a result of poor schema design patterns. Locking performance discusses how these can impact MongoDB’s internal locking.

Performance issues may indicate that the database is operating at capacity and that it is time to add additional capacity to the database, in particular, the application’s working set should fit in the available physical memory.

In some cases, performance issues may be temporary and related to the abnormal traffic load. As mentioned in Numbers of connections, scaling can help ease excessive traffic. Database Profiling can help to understand what operations are causing degradation.

Locking Performance:

MongoDB uses a locking system to ensure data set consistency. If certain operations are long-running or a queue forms, performance will degrade as requests and operations wait for the lock.

Lock-related slowdowns can be intermittent. To see if the lock has been affecting our performance, we can look at the ‘locks’ and ‘globalLock’ sections of the ‘serverStatus’ output. Dividing ‘locks.timeAcquiringMicros’ by ‘locks.acquireWaitCount’ can give an approximate average wait time for a particular lock mode.

‘locks.deadlockCount’ provides the number of times the lock acquisitions encountered deadlocks.

If ‘globalLock.currentQueue.total’ is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that may be affecting performance.

If ‘globalLock.totalTime’ is high relative to uptime, the database has existed in a lock state for a significant amount of time.

Long queries can result from ineffective use of indexes; non-optimal schema design; poor query structure; system architecture issues; or insufficient RAM resulting in disk reads.

Number of Connections:

In some cases, the number of connections between the applications and the database can overwhelm the ability of the server to handle requests. The following fields in the ‘serverStatus’ document can provide insight:

‘connections’ is a container for the following two fields:

  • ‘connections.current’ the total number of current clients connected to the database instance.
  • ‘connections.available’ the total number of unused connections available for new clients.

If there are numerous concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then we will need to increase the capacity of our deployment.

For read-heavy applications, we can increase the size of our replica set and distribute read operations to secondary members. 

For write-heavy applications, we can deploy ‘sharding’ and add one or more ‘shards’ to a ‘sharded cluster’ to distribute load among ‘mongod’ instances. (Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.)

There are two methods for addressing system growth: vertical and horizontal scaling.

  • Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space.
  • Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required.

Spikes in the number of connections can also be the result of application or driver errors. All of the officially supported MongoDB drivers implement connection pooling, which allows clients to use and reuse connections more efficiently. An extremely high number of connections, particularly without corresponding workload, is often indicative of a driver or other configuration error.

Unless constrained by system-wide limits, the maximum number of incoming connections supported by MongoDB is configured with the ‘maxIncomingConnections’ setting. On Unix-based systems, system-wide limits can be modified using the ulimit command, or by editing our system’s /etc/sysctl file. 

Database Profiling:

The database profiler collects detailed information about Database Commands executed against a running mongod instance. This includes CRUD operations as well as configuration and administration commands. The profiler writes all the data it collects to the ‘system.profile’ collection, a capped collection in the admin database. The profiler’s output can help to identify inefficient queries and operations.

We can enable and configure profiling for individual databases or for all databases on a mongod instance. Profiler settings affect only a single ‘mongod’ instance and will not propagate across a replica set or sharded cluster.

The following profiling levels are available:

Level Description

  • The profiler is off and does not collect any data. This is the default profiler level.
  • 1 The profiler collects data for operations that take longer than the value of slowms.
  • 2 The profiler collects data for all operations.

A comparison of the MongoDB to SQL approach to optimization

   DescriptionMongoDBSQL
Query rewrite based on heuristics, cost or bothUnsupported.MongoDB’s queries are simplistic in find(), save(), remove(), update() methods.Supported, range from eliminating unnecessary predicates to subquery flattening, converting appropriate LEFT OUTER JOINS to INNER JOINS, folding of derived tables, etc.
Index selectionSupported.MongoDB’s optimizer tries to pick up a suitable index for each portion of the query and index can/should be usedSupported,Selecting the optimal index(es) for each of the tableDepending on the index selected, choose the predicates to push down, see the query is covered or not, decide on sort and pagination strategy.
Join reorderingUnsupported,MongoDB’s $lookup is part of the convoluted aggregation framework where the query is written like Unix pipeline, a procedural approach.Supported,(A INNER JOIN B INNER JOIN C) is equivalent to (B INNER JOIN C INNER JOIN A). The optimizer will have to determine the most optimal way to sequence these joins.
Join type selectionUnsupported, since there’s only one type join in MongoDB. MongoDB has a constrained left outer join support via $lookup operator — arrays are unsupported in the join condition.Supported, Databases can implement multiple types of join algorithms: nested loop, hash, sort merge, zigzag, star (snowflake), etc.  Depending on the structure and cost, the optimizer will have to decide the type of join algorithm for each join operation.

Transaction Management 

Concurrent Control.jpg

Concurrent control

Concurrency control is one of another features of the software management practices that has improved efficiency of MongoDB technical aspects. A higher efficiency implies that the software will play a critical role in the aspects of data processing. Through concurrency control the system ensures that one transaction is done at a time (Kemme, 2018). The major aim is to create better relationships that will help in increasing both efficiency and productivity of the software. Agile technology of software implementation has made this process better and more efficient. More especially, through Agile technology it is possible to optimize through cloud computing and other major aspects of cloud storage. Finally, through the Agile methodology it is possible to add as many servers as the user want. 

Lock Acquisition.png
  1. Comparison of the software’s and RDBMS’ approach to technical considerations

Transaction management

Both RDBMS and the emergent software are both relational in terms of application. Both of the dabases have a very essential way of handling the large data sets. Besides, they both have the major aspects of data relationships that help in making the software more efficient in data analysis and access (Niyizamwiyitira & Lundberg, 2017).  More especially when creating a new record of data, a relationship can be established to make it easy for the user to record the data and makes the data accessible. In the process normalization can be achieved as the process is easy and not complicated as in RDBMS which has no normalization. Therefore, it becomes efficient to handle transactions to ensure that the systems are more productive as well as efficient like MongoDB. 

Concurrent control

In terms of concurrency control, there is a critical difference in the application of both the Agile software in MongoDB and the RDBMS. For instance, The MongoDB can control a series a short transaction one at a time to ensure that there is maximum integrity and confidentiality in the process (Kemme, 2018). Agile software technology has handled the major issues of the software as well the MongoBD to ensure maximum efficiency. On the other hand, RDBMS holds bulky transactions and data during the process of data execution. Alternatively, as the RDBMS can handle bulky data it implies that it has distributed databases that improves productivity of the database. 

Strengths of MongoDB

There is high efficiency through concurrency control. For instance, the Agile software technology has ensured that one transaction is processed at a time to ensure maximum integrity (Niyizamwiyitira & Lundberg, 2017). Therefore, it becomes easy to maintain the process and ensure maximum security. 

Also, the emergent technology through the Agile software technology has ensured that there is high availability. More especially, the data files are easily accessible because there is a relationship between the data records. 

Weaknesses of MongoDB 

The fact the Agile software technology allows the database system to handle one transaction at a time means that the workflow that will be done will be lower. As such, the software lacks distributed databases (Niyizamwiyitira & Lundberg, 2017). Therefore, there is a risk of being involved in less workflow or data records speed. 

As any other database system, there is an exposure to security risk despite of the high efficiency and data security that is provided (Kemme, 2018). More especially, the fact that the system handles data through concurrency control, then it means that an intruder can get time to access the files because if the transactions are accessed one by one then there is a risk though minimal. 

Concurrent Control

In a multi-user environment, multiple transactions occurring at the same time can have negative implications. The types of problems that can occur include the Lost Update Problem, Dirty Read Problem, Phantom Read, and Unrepeatable Problem. In this environment, concurrent control is necessary to ensure correct results. 

Two approaches database management systems use for concurrent control are either optimistic or pessimistic. Optimistic concurrent control consists of a transaction that is run until completion without taking into consideration any errors that might occur. Once the transaction is ready to commit, the control then checks for conflicts. Pessimistic concurrency controls put locks where errors might happen and take more of a preventative approach when completing a transaction. 

RDMS

Concurrent Control techniques are used to ensure the integrity of the database by allowing only one transaction to execute at any time.  The control reorders requests of different transactions in a schedule of requests to be serviced. Locking mechanisms deny access to other transactions at file, record, and field levels until the current transaction is completed.

MongoDB

To ensure consistency MongoDB uses locks to prevent multiple users from updating the same data at the same time. Write Operations in a single document are atomic.

Multiple granularity locking is used to allow global and collection-level locks. Any locks implemented at the document level are done by individual storage engines. [14]

Reader-writer locks used to allow concurrent readers access to a database or collection. MongoDB also uses an intent lock to lock at higher levels.

Lock modes in MongoDB

Lock Mode 
RShared S lock
WExclusive X lock
rIntent Shared IS
wIntent Exclusive IX

Locks are queued in order and will grant all shared, and intent shared requests at the same time. Once those locks have drained exclusive locks will be granted.

Security

It is important to understand that database security works in conjunction with other network services to provide security to the information on the database. To identify security issues, we will focus on how the database stores information. To store information securely a database must provide CIA (confidentiality, integrity, and availability).

Confidentiality refers to only authorized users having access to the data. Integrity is the accuracy of that data, and availability means authorized users can access the data.[16] Control measures also play are significant role in providing security. Control measures include access control, inference control, flow control and data encryption.

MongoDB

Encryption – Conversion of data into ciphertext to facilitate confidentiality.  

  • SSL-based transport encryption for data.
  • Storage and application-level encryption for data at rest encryption.
  • No data file encryption

Authentication – Verifies an identity to access the system.

  • No option to isolate permission. If users have access to part of the system there are no boundaries.[15]
  • SCRAM (Salted Challenge Response Authentication Mechanism)
  • SSL with X..509 certificate for intra-cluster authentication.

Authorization- is specifying the user’s privileges or access to resources

  • The databases enforce authorization at the database level instead of enforcing it at the collection level.

Auditing – Tracks the use of resources and to make sure users without authorization do not have access to data.  

  • Auditing features in Enterprise version to record audit events. Basic level auditing of data operations in Open source version.

In comparison, relational database management systems have stronger security features. The advantage of having security features that include role-based security, access control, data encryption, a well-defined schema, and follow strict ACID properties.. The disadvantage of these security features is that they affect the speed of data access.

Conclusion

In a nutshell, modern applications require highly scalable hierarchical data storage, and NoSQL databases like MongoDB play a very important role in these kind of applications. 

Traditional relational database systems like Oracle are still the preferred storage for a highly relational data model in large monolithic applications.

REFERENCES

[1] https://learning.oreilly.com/library/view/nosql-for-dummies/9781118905746/05_9781118905746-ch01.xhtml (Links to an external site.)

[2] https://www.dataversity.net/a-brief-history-of-non-relational-databases/# (Links to an external site.)

[3] https://learning.oreilly.com/library/view/nosql-for-dummies/9781118905746/05_9781118905746-ch01.xhtml (Links to an external site.)

[4] https://intellipaat.com/blog/what-is-mongodb/ (Links to an external site.)

[5] www.geeksforgeeks.org/use-of-nosql-in-industry/ (Links to an external site.)

[6] www.geeksforgeeks.org/use-of-nosql-in-industry/ (Links to an external site.)

[7] Architecture Guide – Core Principles (Links to an external site.)

[8] MongoDB Internals Presentation (Links to an external site.)

[9] MongoDB Architecture Comparisons with RDBMS (Links to an external site.)

[10] Terminologies (Links to an external site.)

[11] https://docs.mongodb.com/manual/administration/analyzing-mongodb-performance/ (Links to an external site.)

[12] https://blog.couchbase.com/query-optimization-in-nosql-couchbase-mongodb/ (Links to an external site.)

[13] https://docs.atlas.mongodb.com/performance-advisor/ (Links to an external site.)

[14]https://docs.mongodb.com/manual/faq/concurrency/

[15]https://www.researchgate.net/publication/337854725_A_Comparative_Study_of_NOSQL_System_Vulnerabilities_with_Big_Data (Links to an external site.)

[16]https://www.softwaretestinghelp.com/sql-vs-nosql/

[17] Kemme, B. (2018). Replicated database concurrency control. Encyclopedia of Database Systems, 3175-3176. https://doi.org/10.1007/978-1-4614-8265-9_311 (Links to an external site.)

[18] Niyizamwiyitira, C., & Lundberg, L. (2017). Performance Evaluation of SQL and NoSQL Database Management Systems in a Cluster. International Journal Of Database Management Systems9(6), 01-24. https://doi.org/10.5121/ijdms.2017.9601