Contents

1. An Overview of MongoDB

2. NoSQL Business Use Cases

3. The Architecture, Data Storage, and Access
a. Architecture, data storage, and access

4. Comparison of SQL RDBMS and MongoDB
a. Their approach to data storage and access, including both strengths and weaknesses

5. How MongoDB Achieves Performance Optimization
a. A description of the performance optimization approach
b. A comparison of MongoDB and RDBMS approaches to optimization, including both strengths and weaknesses

6. Other Considerations
a. An overview of the technical aspects of data management systems (i.e., transaction management, concurrency control, backup and recovery, security)
b. A description of MongoDB’s approach to the technical considerations reviewed in 6.a
c. A comparison of MongoDB and RDBMS approaches to the technical considerations reviewed in 6.a, including both strengths and weaknesses

7. Conclusion

8. List of References / Bibliography

Overview of NoSQL databases

NoSQL databases are interchangeably referred to as “non relational,” “NoSQL DBs,” or simply “NoSQL” to highlight the fact that they can handle huge volumes of rapidly changing, unstructured data in ways different from a relational (SQL) database, which uses rows and tables. The acronym NoSQL was first used in 1998 by Carlo Strozzi when naming his lightweight, open-source “relational” database, which did not use SQL.[1] The name came up again in 2009 when Eric Evans and Johan Oskarsson used it to describe non-relational databases. Relational databases are often referred to as SQL systems. The term NoSQL can mean either “No SQL systems” or the more commonly accepted translation of “Not only SQL,” to emphasize the fact that some systems might support SQL-like query languages.

NoSQL developed at least in the beginning as a response to web data, the need for processing unstructured data, and the need for faster processing. The NoSQL model uses a distributed database system, meaning a system with multiple computers. The non-relational system is quicker, uses an ad-hoc approach for organizing data, and processes large amounts of differing kinds of data. For general research, NoSQL databases are the better choice for large, unstructured data sets compared with relational databases due to their speed and flexibility. Not only can NoSQL systems handle both structured and unstructured data, but they can also process unstructured Big Data quickly[2]

Four core features of NoSQL, shown in the following list, apply to most NoSQL databases. The list compares NoSQL to traditional relational DBMS:

Schema agnostic: A database schema is the description of all possible data and data structures in a relational database. With a NoSQL database, a schema isn’t required, giving you the freedom to store information without doing up-front schema design.
Nonrelational: Relations in a database establish connections between tables of data. For example, a list of transaction details can be connected to a separate list of delivery details. With a NoSQL database, this information is stored as an aggregate — a single record with everything about the transaction, including the delivery address.
Commodity hardware: Some databases are designed to operate best (or only) with specialized storage and processing hardware. With a NoSQL database, cheap off-the-shelf servers can be used. Adding more of these cheap servers allows NoSQL databases to scale to handle more data.
Highly distributable: Distributed databases can store and process a set of information on more than one device. With a NoSQL database, a cluster of servers can be used to hold a single large database.

There were more than 250 NoSQL databases, but the following are the leading ones:

NoSQL Database Types	Leading Databases
Columnar	DataStax, Apache Cassandra, HBase, Apache Accumulo
Key-value	Basho Riak, Redis, Voldemort, Aerospike, Oracle
Graph	Neo4j, Ontotext’s GraphDB (formerly OWLIM), MarkLogic, OrientDB
Data Document	MongoDB, MarkLogic, CouchDB, FoundationDB, IBM
Search engine	Apache Solr, Elasticsearch, MarkLogicHybrid: OrientDB, MarkLogic, ArangoDB [3]

Business applications or use cases of NoSQL Databases

Advertising

Content and Metadata Store: To handle the storage of large amounts of data, such as digital content, e-books, etc., many companies, like publication houses, require larger storage to merge various tools for learning in a single platform. Amazon, for instance, uses Elasticache Redis, which is a key-value-based NOSQL database. The applications which are content-based for content-based applications, metadata is very frequently accessed data that needs less response time. NoSQL provides flexibility in faster access to data and in storing different types of content for building applications based on content.

Ad Targeting: Ad Targeting: Displaying ads or offers on the current web page is a decision with direct income. To determine what group of users to target and where on a web page to display ads, the platform gathers behavioral and demographic characteristics of users. A NoSQL database enables ad companies to track user details and place them very quickly, increasing the probability of clicks. Google Ads, Meta Ads, and Amazon Advertising are some of the ad targeting companies that use NoSQL.[5]

Session/User Profile Store: session data using relational databases could be cumbersome, especially in cases where applications are growing rapidly. In such cases, the right approach is to use a global session store, which manages session information for any user who visits the site. NOSQL is suitable for storing such web application session information, which is usually large in size. Since the session data is unstructured in form, it is easy to store it in schemaless documents rather than in relational database records. To enable online transactions, authentication of users, user preferences, etc., it is required to store the user profile by web and mobile applications. In recent times, the adoption of web and mobile applications has increased tremendously. The relational database could not handle such a large volume of user profile data, which increases daily, as it is limited to a single server. NOSQL becomes the only way out as its capacity can be easily increased by adding a server, which makes scaling cost-effective

Mobile Applications: Since the use of smartphones has become ubiquitous, there is an emergent problem relating to growth and volume in mobile applications. Using a NoSQL database, mobile application development can be started with a small size and can be easily expanded as the number of users increases, which is very difficult if you consider relational databases. Since NoSQL databases store the data in a schema-less for, the application developer can update the apps without having to do major modifications in the database. Mobile app companies like Kobo and Playtika, Snapchat, and Uber use NOSQL and serve millions of users across the world.

Third-Party Data Aggregation in E-commerce: commercials and other business-related reasons, e-commerce or consumer goods companies often keep track and get sales data and shoppers’ purchase history from stores. Because of the massive amount of data being generated, the NoSQL database, like MongoDB, can handle such data being generated at high speeds from many data sources

Internet of Things: In the world of today, millions of devices are connected to the internet, ranging from smart surveillance cameras in homes and sensors in smart cars. From these devices, large volumes of data are generated for analysis and decision-making by humans and the devices themselves. Relational databases cannot handle such Big data. The NOSQL permits organizations to expand concurrent access to data from millions of devices and systems that are connected, store huge amounts of data, and meet the required performance.

Social Gaming: Data-intensive applications such as social games, which can grow users to millions. Such a growth in the number of users as well as the amount of data requires a database system that can store such data and can be scaled to incorporate a growing number of users. NOSQL is suitable for such applications. NOSQL has been used by some of the mobile gaming companies like Electronic Arts, Zynga, and Tencent.[6]

To narrow down the scope of this post, I will focus on MongoDB, which is an open-source database that uses a document-oriented data model and a non-structured query language. It is one of the widely used NoSQL systems in use today. As a NoSQL database, it does not use the usual rows and columns found in relational databases like SQL.

MongoDB – A NoSQL Database

Architecture

It is an architecture that is built on collections and documents. The basic unit of data in this database consists of a set of key–value pairs. It allows documents to have different fields and structures. This database uses a document storage format called BSON, which is a binary style of JSON documents. The data model that MongoDB follows is a highly elastic one that lets you combine and store data of multivariate types without having to compromise on the powerful indexing options, data access, and validation rules. There is no downtime when you want to dynamically modify the schemas. What it means is that you can concentrate more on making your data work harder rather than spending more time preparing the data for the database.[4]

Source: https://www.youtube.com/watch?v=HsOZn7eGx68 [8]

Core Architecture Principles [7]

Documents and MongoDB Query Language: The Fastest Way to Innovate

Built around JSON-like documents, document databases are both intuitive and flexible for developers to work with. They promise higher developer productivity and faster evolution with application needs. As developers have experienced these benefits, the document data model has become the most popular alternative to the tabular model used by traditional relational databases. Four main advantages of MongoDB are:

Intuitive: Faster and Easier for Developers: Documents in the database directly map to the objects in your code, so they are much more natural to work with.

Flexible Schema: Dynamically Adapt to Change: A document’s schema is dynamic and self-describing, so you don’t need to first pre-define it in the database. Fields can vary from document to document, and you modify the structure at any time, allowing you to continuously integrate new application functionality, without wrestling with disruptive schema migrations. If a new field needs to be added, it can be created without affecting all other documents in the collection, without updating a central system catalog, and without taking the database offline.

Universal: JSON Documents are Everywhere: Lightweight, language-independent, and human-readable, JSON has become an established standard for data communication and storage. Documents are a superset of all other data models, so you can structure data any way your application needs – rich objects, key-value pairs, tables, geospatial and time-series data, and the nodes and edges of a graph.

Powerful: Serve any Workload: An important difference between databases is the expressivity of the query language, the richness of indexing, and data integrity controls. The MongoDB Query Language is comprehensive and expressive. Ad hoc queries, indexing, and real-time aggregations provide powerful ways to access, group, transform, and analyze your data.

A multi-cloud, global database: Freedom and Flexibility

To harness the tremendous rate of innovation in the cloud and reduce the risk of lock-in, project teams should build their applications on data platforms that deliver a consistent experience across any environment. MongoDB can be run anywhere – from developer laptops to mainframes, from private clouds to the public cloud.

Broadest Reach – Private, Hybrid, Public Clouds: With a common deployment model, it enables you to take advantage of unique capabilities in each platform without changing a single line of application code and without the heavy lift and risk of complex database migrations.

MongoDB Atlas – Fully Automated Database as a Service: MongoDB Atlas is the global cloud database service for modern applications. You can deploy fully managed MongoDB across AWS, Azure, or Google Cloud with best-in-class automation and proven practices that guarantee availability, scalability, and compliance with security standards.

MongoDB Run by You, with Tools from Us: If you need to run the database on your own self-managed infrastructure for business or regulatory requirements, MongoDB offers on-premises management tools. These tools can be used to power a MongoDB database behind a single application, or to build your own private database service and expose it to your development teams.

Distributed Architecture – Scalable, Resilient, and Mission Critical: Through replica sets and native sharding, MongoDB enables you to scale out your applications with always-on availability. You can distribute data for low-latency user access, while enforcing data sovereignty controls for data privacy regulations such as the GDPR

Availability and Data Protection with Replica Sets: MongoDB replica sets enable you to create up to 50 copies of your data, which can be provisioned across separate nodes, data centers, and geographic regions. Replica sets are predominantly designed for resilience.

Scale-Up, Scale-Out, Scale Across Storage Tiers: MongoDB can be scaled vertically by moving up to larger instance sizes. As a distributed system, MongoDB can perform a rolling restart of the replica set to enable you to move between different instances without application downtime. Sharding with MongoDB allows you to seamlessly scale the database as your applications grow beyond the hardware limits of a single server, and it does so without adding complexity to the application.

Privacy and Security: With the digital economy becoming so essential for economic prosperity, it’s no surprise that governments and enterprises around the world are responding to growing public concern for the safety of personal data.

MongoDB Cloud: Unified developer experience for modern applications

Modern data architectures are not limited to the transactional database. Many applications also require analytics and search functionality, which often requires teams to learn, deploy, and manage additional systems. If you’re building mobile apps, you’ll need to deal with data on the device and sync it to the backend. You may also find yourself building data visualizations, writing a lot of glue code to move data between data services, or creating and operating custom data access APIs. In MongoDB Cloud, the database is fully integrated with other data services with automatic syncing, data tiering, and federated query. Search indexes run alongside the database and are automatically kept in sync. Aged data can be auto-archived to cloud storage, providing fully managed data tiering while retaining access. Queries are automatically routed to the appropriate data tier without requiring you to think about data movement, replication, or ETL. MongoDB Cloud can even automatically sync backend data to an embedded database on mobile devices. Central to MongoDB Cloud is MongoDB Atlas. MongoDB Cloud extends Atlas with other data services that work with it seamlessly, giving you more ways of working with data

MongoDB Atlas Search: Atlas Search makes it easy to create fast, relevant, full-text search capabilities on top of your data in the cloud, and is built on top of Apache Lucene, the industry standard library.

MongoDB Atlas Data Lake: Atlas Data Lake brings a serverless, scalable data lake to the cloud platform with an on-demand query service that enables you to analyze data in cloud object storage (Amazon S3) in-place using the MongoDB Query Language (MQL).

MongoDB Realm for Data at the Edge: The Realm Mobile Database extends your data foundation out to the edge of the network and is fully integrated with the MongoDB Cloud. Realm is a lightweight database embedded directly on the client device. Realm helps solve the unique challenges of building for mobile, making it simple to store data on-device and enabling data access even when offline.

MongoDB meets the demands of modern apps with a data platform built on three core architectural foundations:

The document data model and MongoDB Query Language give developers the fastest way to innovate in building transactional, operational, and analytical applications.
A multi-cloud, global database, giving developers the freedom to run their applications anywhere with the flexibility to move across private and public clouds as requirements evolve – without having to change a single line of code.
The MongoDB Cloud provides a unified developer experience for modern applications that span cloud to edge, in database, search, and the data lake, backed by comprehensive & integrated application services.

Data Storage

The storage engine is the primary component of MongoDB responsible for managing data. MongoDB provides a variety of storage engines, allowing you to choose the one most suited to your application.

MongoDB supports multiple storage engines, as different engines perform better for specific workloads. Choosing the appropriate storage engine for your use case can significantly impact the performance of your applications. The default storage engine is the WiredTiger Storage Engine.

WiredTiger (Default): It is well-suited for most workloads and is recommended for new deployments. WiredTiger provides a document-level concurrency model, checkpointing, and compression, among other features.

Encrypted: Encryption at rest, when used in conjunction with transport encryption and good security policies, can help ensure compliance with security and privacy standards.

3rd Party: MongoDB supports a 3rd party storage engine that can define its own storage strategies based on the database workloads.

The journal is a log that helps the database recover in the event of a hard shutdown. Several configurable options allow the journal to strike a balance between performance and reliability that works for your particular use case.

With journaling, the recovery process:

Look in the data files to find the identifier of the last checkpoint.
Searches in the journal files for the record that matches the identifier of the last checkpoint.
Apply the operations in the journal files since the last checkpoint.

GridFS is a versatile storage system that is suited to handling large files, such as those exceeding the 16 MB document size limit.

A Storage API layer is provided to support different platforms and languages to do all the LOOKUP/READ/WRITE operations in the data storage. A data layer or the DOCUMENT data model is defined on top of the storage layer to efficiently store documents and collections in MongoDB. The Data layer is effectively made up of Collections of Documents[10] stored as BSON (Binary JSON) objects in the storage. DML operations like Indexes are also represented at the Data Layer.

Data Access (Query) [8]

The Query Engine in MongoDB is robust and supported in different languages. MQL (MongoDB Query Language) is effectively a JSON/SQL query language that handles the following components:

Command Parser/Validator: This component enables schema validation, document validation as well ensure restrictions are properly met before the database instruction set can be executed.
DML: All instruction sets are evaluated to identify if the database instruction is a document insertion, updating, creation of indexes, etc.
READ Operation: The READ operation handles lookup/select queries to return a collection of documents. The READ supports various find commands.
WRITE Operation: The WRITE operation handles document insertions and updates in the databases.
Query Planner: If the query is a select query to return a large collection of documents,
Query Analyzer: is responsible for understanding the query using the query planner

Comparison of MongoDB versus RDBMS

Comparison Criteria	MongoDB	RDBMS
Architecture	Document-oriented and flexible (schema-less by default) with optional schema validation; supports nested documents and arrays.	Relational and schema-based with predefined tables, columns, and constraints; schema changes typically require planned migrations.
Architecture	Designed for horizontal scalability via sharding and distributed clusters to handle large datasets and high write throughput.	Traditionally scales vertically for strong single-node performance; horizontal scaling is possible (replication, partitioning) but generally more complex.
Architecture	Encourages embedding related data to reduce joins and improve read latency for common access patterns; supports ad-hoc joins via `$lookup`.	Uses a normalized model optimized for complex joins and enforces referential integrity through foreign keys and constraints.
Architecture	Provides single-document atomicity natively; supports multi-document ACID transactions (since MongoDB 4.0) with trade-offs in distributed environments.	Provides mature, native ACID transactions across multiple tables with established isolation levels and robust transaction management.
Data Storage	Well-suited for hierarchical and document-based data storage	Best suited for relational data model storage
Access	Supports JSON-based queries and provides SQL-like query interfaces	Relies on SQL for querying, with strong transactional support and optimized join operations [9]
Access	Provides deep query capabilities, including full-text search, regular expressions, dynamic queries, aggregation, and sorting [9]	Offers mature query optimizers and indexing for relational queries; full-text search and complex nested-document queries often require extensions or additional tooling
Application	Suitable for large-scale logging, content management, and applications requiring flexibility	Highly effective for relational data models where data integrity and durability are critical

Performance Optimization

The Performance Advisor monitors queries that MongoDB considers slow and suggests new indexes to improve query performance. The threshold for slow queries varies based on the average time of operations on the cluster to provide recommendations pertinent to the workload. [11]

Recommended indexes are accompanied by sample queries, grouped by query shape (A combination of a query predicate, sort, and projection), that were run against a collection that would benefit from the suggested index.

The Performance Advisor monitors queries that take longer than 100 milliseconds to execute and groups these queries into common query shapes. The Performance Advisor calculates the inefficiency of each query shape by considering the following aggregated metrics from queries that match the shape [12]:

Amount of time spent executing the query.
Number of documents scanned.
Number of documents returned.

To establish recommended indexes, the Performance Advisor uses these metrics in a formula to calculate the Impact, or performance improvement, that creating an index matching that query shape would cause. The Performance Advisor compares the amount of time spent executing index-specific operations to the total operational latency in the deployment. When the Performance Advisor suggests indexes, the indexes are ranked by their Impact score.

Common Reasons for Slow Queries:

If a query is slow, common reasons include:

The query is unsupported by the current indexes.
Some documents in the collection have large array fields that are costly to search and index.
One query retrieves information from multiple collections with $lookup.

Index Considerations[13]:

Indexes improve read performance, but a large number of indexes can negatively impact write performance since indexes must be updated during writes. If your collection already has several indexes, consider this tradeoff of read and write performance when deciding whether to create new indexes. Examine whether a query for such a collection can be modified to take advantage of existing indexes, as well as whether a query occurs often enough to justify the cost of a new index.

As we develop and operate applications with MongoDB, we may need to analyze the performance of the application and its database. When we encounter degraded performance, it is often a function of database access strategies, hardware availability, and the number of open database connections.

Some users may experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a result of poor schema design patterns. Locking performance discusses how these can impact MongoDB’s internal locking.

Performance issues may indicate that the database is operating at capacity and that it is time to add additional capacity to the database. In particular, the application’s working set should fit in the available physical memory.

In some cases, performance issues may be temporary and related to the abnormal traffic load. As mentioned in the number of connections, scaling can help ease excessive traffic. Database Profiling can help to understand what operations are causing degradation.

Locking Performance:

MongoDB uses a locking system to ensure data set consistency. If certain operations are long-running or a queue forms, performance will degrade as requests and operations wait for the lock.

Lock-related slowdowns can be intermittent. To see if the lock has been affecting our performance, we can look at the ‘locks’ and ‘globalLock’ sections of the ‘serverStatus’ output. Dividing ‘locks.timeAcquiringMicros’ by ‘locks.acquireWaitCount’ can give an approximate average wait time for a particular lock mode.‘locks.deadlockCount’ provides the number of times the lock acquisitions encountered deadlocks. If ‘globalLock.currentQueue.total’ is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that may be affecting performance. If ‘globalLock.totalTime’ is high relative to uptime, the database has existed in a lock state for a significant amount of time. Long queries can result from ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufficient RAM, resulting in disk reads.

Number of Connections:

In some cases, the number of connections between the applications and the database can overwhelm the ability of the server to handle requests. The following fields in the ‘serverStatus’ document can provide insight:

‘connections’ is a container for the following two fields:

‘connections.current’: the total number of current clients connected to the database instance.
‘connections.available’: the total number of unused connections available for new clients.

If there are numerous concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then we will need to increase the capacity of our deployment.

For read-heavy applications, we can increase the size of our replica set and distribute read operations to secondary members.

For write-heavy applications, we can deploy ‘sharding’ and add one or more ‘shards’ to a ‘sharded cluster’ to distribute load among ‘mongod’ instances. (Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.)

There are two methods for addressing system growth: vertical and horizontal scaling.

Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space.
Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required.

Spikes in the number of connections can also be the result of application or driver errors. All of the officially supported MongoDB drivers implement connection pooling, which allows clients to use and reuse connections more efficiently. An extremely high number of connections, particularly without corresponding workload, is often indicative of a driver or other configuration error. Unless constrained by system-wide limits, the maximum number of incoming connections supported by MongoDB is configured with the ‘maxIncomingConnections’ setting. On Unix-based systems, system-wide limits can be modified using the ulimit command or by editing our system’s /etc/sysctl file.

Database Profiling:

The database profiler collects detailed information about Database Commands executed against a running mongod instance. This includes CRUD operations as well as configuration and administration commands. The profiler writes all the data it collects to the ‘system.profile’ collection, a capped collection in the admin database. The profiler’s output can help to identify inefficient queries and operations.

We can enable and configure profiling for individual databases or for all databases on a mongod instance. Profiler settings affect only a single ‘mongod’ instance and will not propagate across a replica set or sharded cluster.

The following profiling levels are available:

Level Description

0 The profiler is off and does not collect any data. This is the default profiler level.
1 The profiler collects data for operations that take longer than the value of slowms.
2 The profiler collects data for all operations.

A comparison of the MongoDB to SQL approaches to optimization

Description	MongoDB	SQL
Query rewrite based on heuristics, cost or both	Unsupported.MongoDB’s queries are simplistic in find(), save(), remove(), update() methods.	Supported range from eliminating unnecessary predicates to subquery flattening, converting appropriate LEFT OUTER JOINS to INNER JOINS, folding of derived tables, etc.
Index selection	Supported.MongoDB’s optimizer tries to pick up a suitable index for each portion of the query, and the index can/should be used	Supported, selecting the optimal index(es) for each of the tables. Depending on the index selected, choose the predicates to push down, see if the query is covered or not, and decide on the sort and pagination strategy.
Join reordering	Unsupported, MongoDB’s $lookup is part of the convoluted aggregation framework, where the query is written like a Unix pipeline, a procedural approach.	Supported,(A INNER JOIN B INNER JOIN C) is equivalent to (B INNER JOIN C INNER JOIN A). The optimizer will have to determine the most optimal way to sequence these joins.
Join type selection	Unsupported, since there’s only one type of join in MongoDB. MongoDB has a constrained left outer join support via the $lookup operator — arrays are unsupported in the join condition.	Supported, Databases can implement multiple types of join algorithms: nested loop, hash, sort-merge, zigzag, star (snowflake), etc. Depending on the structure and cost, the optimizer will have to decide the type of join algorithm for each join operation.

Transaction Management

Concurrent control

Concurrency control is one of the features of the software management practices that has improved the efficiency of MongoDB’s technical aspects. A higher efficiency implies that the software will play a critical role in the aspects of data processing. Through concurrency control, the system ensures that one transaction is done at a time (Kemme, 2018). The major aim is to create better relationships that will help in increasing both efficiency and productivity of the software. Agile technology of software implementation has made this process better and more efficient. More especially, through Agile technology, it is possible to optimize through cloud computing and other major aspects of cloud storage. Finally, through the Agile methodology, it is possible to add as many servers as the user wants.

Comparison of the software’s and RDBMS’s approach to technical considerations

Transaction management

Both RDBMS and the emergent software are both relational in terms of application. Both of the databases have a very essential way of handling the large data sets. Besides, they both have the major aspects of data relationships that help in making the software more efficient in data analysis and access (Niyizamwiyitira & Lundberg, 2017). More especially when creating a new record of data, a relationship can be established to make it easy for the user to record the data and make the data accessible. In the process, normalization can be achieved as the process is easy and not as complicated as in RDBMS, which has no normalization. Therefore, it becomes efficient to handle transactions to ensure that the systems are more productive as well as efficient, like MongoDB.

Concurrent control

In terms of concurrency control, there is a critical difference in the application of Agile software in MongoDB and the RDBMS. For instance, MongoDB can control a series a short transactions one at a time to ensure that there is maximum integrity and confidentiality in the process (Kemme, 2018). Agile software technology has handled the major issues of the software as well as the MongoDB to ensure maximum efficiency. On the other hand, RDBMS holds bulky transactions and data during the process of data execution. Alternatively, as the RDBMS can handle bulky data, it implies that it has distributed databases that improve the productivity of the database.

Strengths of MongoDB

There is high efficiency through concurrency control. For instance, the Agile software technology has ensured that one transaction is processed at a time to ensure maximum integrity (Niyizamwiyitira & Lundberg, 2017). Therefore, it becomes easy to maintain the process and ensure maximum security.

Also, the emergent technology through Agile software technology has ensured that there is high availability. More especially, the data files are easily accessible because there is a relationship between the data records.

Weaknesses of MongoDB

The fact that Agile software technology allows the database system to handle one transaction at a time means that the workflow that will be done will be lower. As such, the software lacks distributed databases (Niyizamwiyitira & Lundberg, 2017). Therefore, there is a risk of being involved in less workflow or slower data record speed.

As with any other database system, there is an exposure to security risk despite the high efficiency and data security that is provided (Kemme, 2018). More especially, the fact that the system handles data through concurrency control, then it means that an intruder can get time to access the files, because if the transactions are accessed one by one, then there is a risk, though minimal.

Concurrent Control

In a multi-user environment, multiple transactions occurring at the same time can have negative implications. The types of problems that can occur include the Lost Update Problem, Dirty Read Problem, Phantom Read, and Unrepeatable Problem. In this environment, concurrent control is necessary to ensure correct results.

Two approaches database management systems use for concurrent control are either optimistic or pessimistic. Optimistic concurrent control consists of a transaction that is run until completion without taking into consideration any errors that might occur. Once the transaction is ready to commit, the control then checks for conflicts. Pessimistic concurrency controls put locks where errors might happen and take more of a preventative approach when completing a transaction.

RDMS

Concurrent Control techniques are used to ensure the integrity of the database by allowing only one transaction to execute at any time. The control reorders requests of different transactions in a schedule of requests to be serviced. Locking mechanisms deny access to other transactions at the file, record, and field levels until the current transaction is completed.

MongoDB

To ensure consistency, MongoDB uses locks to prevent multiple users from updating the same data at the same time. Write Operations in a single document are atomic.

Multiple granularity locking is used to allow global and collection-level locks. Any locks implemented at the document level are done by individual storage engines. [14]

Reader-writer locks are used to allow concurrent readers access to a database or collection. MongoDB also uses an intent lock to lock at higher levels.

Lock modes in MongoDB

Lock Mode
R	Shared S lock
W	Exclusive X lock
r	Intent Shared IS
w	Intent Exclusive IX

Locks are queued in order and will grant all shared and intent shared requests at the same time. Once those locks have drained, exclusive locks will be granted.

Security

It is important to understand that database security works in conjunction with other network services to provide security to the information on the database. To identify security issues, we will focus on how the database stores information. To store information securely a database must provide CIA (confidentiality, integrity, and availability).

Confidentiality refers to only authorized users having access to the data. Integrity is the accuracy of that data, and availability means authorized users can access the data.[16] Control measures also play are significant role in providing security. Control measures include access control, inference control, flow control, and data encryption.

MongoDB

Encryption – Conversion of data into ciphertext to facilitate confidentiality.

SSL-based transport encryption for data.
Storage and application-level encryption for data at rest.
No data file encryption

Authentication – Verifies an identity to access the system.

No option to isolate permission. If users have access to part of the system, there are no boundaries.[15]
SCRAM (Salted Challenge Response Authentication Mechanism)
SSL with X..509 certificate for intra-cluster authentication.

Authorization- is specifying the user’s privileges or access to resources

The databases enforce authorization at the database level instead of enforcing it at the collection level.

Auditing – Tracks the use of resources and to make sure users without authorization do not have access to data.

Auditing features in the Enterprise version to record audit events. Basic level auditing of data operations in the Open source version.

In comparison, relational database management systems have stronger security features. The advantage of having security features that include role-based security, access control, data encryption, a well-defined schema, and following strict ACID properties.. The disadvantage of these security features is that they affect the speed of data access.

Conclusion

In a nutshell, modern applications require highly scalable hierarchical data storage, and NoSQL databases like MongoDB play a very important role in these kinds of applications.

Traditional relational database systems like Oracle are still the preferred storage for a highly relational data model in large monolithic applications.

REFERENCES

[1] https://learning.oreilly.com/library/view/nosql-for-dummies/9781118905746/05_9781118905746-ch01.xhtml (Links to an external site.)

[2] https://www.dataversity.net/a-brief-history-of-non-relational-databases/# (Links to an external site.)

[3] https://learning.oreilly.com/library/view/nosql-for-dummies/9781118905746/05_9781118905746-ch01.xhtml (Links to an external site.)

[4] https://intellipaat.com/blog/what-is-mongodb/ (Links to an external site.)

[5] www.geeksforgeeks.org/use-of-nosql-in-industry/ (Links to an external site.)

[6] www.geeksforgeeks.org/use-of-nosql-in-industry/ (Links to an external site.)

[7] Architecture Guide – Core Principles (Links to an external site.)

[8] MongoDB Internals Presentation (Links to an external site.)

[9] MongoDB Architecture Comparisons with RDBMS (Links to an external site.)

[10] Terminologies (Links to an external site.)

[11] https://docs.mongodb.com/manual/administration/analyzing-mongodb-performance/ (Links to an external site.)

[12] https://blog.couchbase.com/query-optimization-in-nosql-couchbase-mongodb/ (Links to an external site.)

[13] https://docs.atlas.mongodb.com/performance-advisor/ (Links to an external site.)

[14]https://docs.mongodb.com/manual/faq/concurrency/

[15]https://www.researchgate.net/publication/337854725_A_Comparative_Study_of_NOSQL_System_Vulnerabilities_with_Big_Data (Links to an external site.)

[16]https://www.softwaretestinghelp.com/sql-vs-nosql/

[17] Kemme, B. (2018). Replicated database concurrency control. Encyclopedia of Database Systems, 3175-3176. https://doi.org/10.1007/978-1-4614-8265-9_311 (Links to an external site.)

[18] Niyizamwiyitira, C., & Lundberg, L. (2017). Performance Evaluation of SQL and NoSQL Database Management Systems in a Cluster. International Journal Of Database Management Systems, 9(6), 01-24. https://doi.org/10.5121/ijdms.2017.9601

Donate to Support our Mission
USD