Contents

Overview of NoSQL databases

NoSQL databases are interchangeably referred to as “non relational,” “NoSQL DBs,” or simply “NoSQL” to highlight the fact that they can handle huge volumes of rapidly changing, unstructured data in ways different from a relational (SQL) database, which uses rows and tables. The acronym NoSQL was first used in 1998 by Carlo Strozzi when naming his lightweight, open-source “relational” database, which did not use SQL.[1] The name came up again in 2009 when Eric Evans and Johan Oskarsson used it to describe non-relational databases. Relational databases are often referred to as SQL systems. The term NoSQL can mean either “No SQL systems” or the more commonly accepted translation of “Not only SQL,” to emphasize the fact that some systems might support SQL-like query languages.

NoSQL developed at least in the beginning as a response to web data, the need for processing unstructured data, and the need for faster processing. The NoSQL model uses a distributed database system, meaning a system with multiple computers. The non-relational system is quicker, uses an ad-hoc approach for organizing data, and processes large amounts of differing kinds of data. For general research, NoSQL databases are the better choice for large, unstructured data sets compared with relational databases due to their speed and flexibility. Not only can NoSQL systems handle both structured and unstructured data, but they can also process unstructured Big Data quickly[2]

Advertising 

Content and Metadata Store: To handle the storage of large amounts of data, such as digital content, e-books, etc., many companies, like publication houses, require larger storage to merge various tools for learning in a single platform. Amazon, for instance, uses Elasticache Redis, which is a key-value-based NOSQL database. The applications which are content-based for content-based applications, metadata is very frequently accessed data that needs less response time. NoSQL provides flexibility in faster access to data and in storing different types of content for building applications based on content.

Ad Targeting: Ad Targeting: Displaying ads or offers on the current web page is a decision with direct income. To determine what group of users to target and where on a web page to display ads, the platform gathers behavioral and demographic characteristics of users. A NoSQL database enables ad companies to track user details and place them very quickly, increasing the probability of clicks. Google Ads, Meta Ads, and Amazon Advertising are some of the ad targeting companies that use NoSQL.[5]

Session/User Profile Store: session data using relational databases could be cumbersome, especially in cases where applications are growing rapidly. In such cases, the right approach is to use a global session store, which manages session information for any user who visits the site. NOSQL is suitable for storing such web application session information, which is usually large in size. Since the session data is unstructured in form, it is easy to store it in schemaless documents rather than in relational database records. To enable online transactions, authentication of users, user preferences, etc., it is required to store the user profile by web and mobile applications. In recent times, the adoption of web and mobile applications has increased tremendously. The relational database could not handle such a large volume of user profile data, which increases daily, as it is limited to a single server.  NOSQL becomes the only way out as its capacity can be easily increased by adding a server, which makes scaling cost-effective

Mobile Applications: Since the use of smartphones has become ubiquitous, there is an emergent problem relating to growth and volume in mobile applications. Using a NoSQL database, mobile application development can be started with a small size and can be easily expanded as the number of users increases, which is very difficult if you consider relational databases. Since NoSQL databases store the data in a schema-less for, the application developer can update the apps without having to do major modifications in the database. Mobile app companies like Kobo and Playtika, Snapchat, and Uber use NOSQL and serve millions of users across the world.


Third-Party Data Aggregation in E-commerce: commercials and other business-related reasons, e-commerce or consumer goods companies often keep track and get sales data and shoppers’ purchase history from stores. Because of the massive amount of data being generated, the NoSQL database, like MongoDB, can handle such data being generated at high speeds from many data sources 

Internet of Things: In the world of today, millions of devices are connected to the internet, ranging from smart surveillance cameras in homes and sensors in smart cars. From these devices, large volumes of data are generated for analysis and decision-making by humans and the devices themselves. Relational databases cannot handle such Big data. The NOSQL permits organizations to expand concurrent access to data from millions of devices and systems that are connected, store huge amounts of data, and meet the required performance.

Social Gaming: Data-intensive applications such as social games, which can grow users to millions. Such a growth in the number of users as well as the amount of data requires a database system that can store such data and can be scaled to incorporate a growing number of users. NOSQL is suitable for such applications. NOSQL has been used by some of the mobile gaming companies like Electronic Arts, Zynga, and Tencent.[6]

To narrow down the scope of this post, I will focus on MongoDB, which is an open-source database that uses a document-oriented data model and a non-structured query language. It is one of the widely used  NoSQL systems in use today. As a NoSQL database, it does not use the usual rows and columns found in relational databases like SQL. 

It is an architecture that is built on collections and documents. The basic unit of data in this database consists of a set of key–value pairs. It allows documents to have different fields and structures. This database uses a document storage format called BSON, which is a binary style of JSON documents. The data model that MongoDB follows is a highly elastic one that lets you combine and store data of multivariate types without having to compromise on the powerful indexing options, data access, and validation rules. There is no downtime when you want to dynamically modify the schemas. What it means is that you can concentrate more on making your data work harder rather than spending more time preparing the data for the database.[4]

MongoDB_Architecture.jpg

Built around JSON-like documents, document databases are both intuitive and flexible for developers to work with. They promise higher developer productivity and faster evolution with application needs. As developers have experienced these benefits, the document data model has become the most popular alternative to the tabular model used by traditional relational databases. Four main advantages of MongoDB are:

Flexible Schema: Dynamically Adapt to Change: A document’s schema is dynamic and self-describing, so you don’t need to first pre-define it in the database. Fields can vary from document to document, and you modify the structure at any time, allowing you to continuously integrate new application functionality, without wrestling with disruptive schema migrations. If a new field needs to be added, it can be created without affecting all other documents in the collection, without updating a central system catalog, and without taking the database offline. 

Universal: JSON Documents are Everywhere: Lightweight, language-independent, and human-readable, JSON has become an established standard for data communication and storage. Documents are a superset of all other data models, so you can structure data any way your application needs – rich objects, key-value pairs, tables, geospatial and time-series data, and the nodes and edges of a graph.

Powerful: Serve any Workload: An important difference between databases is the expressivity of the query language, the richness of indexing, and data integrity controls. The MongoDB Query Language is comprehensive and expressive. Ad hoc queries, indexing, and real-time aggregations provide powerful ways to access, group, transform, and analyze your data. 

To harness the tremendous rate of innovation in the cloud and reduce the risk of lock-in, project teams should build their applications on data platforms that deliver a consistent experience across any environment. MongoDB can be run anywhere – from developer laptops to mainframes, from private clouds to the public cloud.

Broadest Reach – Private, Hybrid, Public Clouds: With a common deployment model, it enables you to take advantage of unique capabilities in each platform without changing a single line of application code and without the heavy lift and risk of complex database migrations.

MongoDB Atlas – Fully Automated Database as a Service: MongoDB Atlas is the global cloud database service for modern applications. You can deploy fully managed MongoDB across AWS, Azure, or Google Cloud with best-in-class automation and proven practices that guarantee availability, scalability, and compliance with security standards. 

MongoDB Run by You, with Tools from Us: If you need to run the database on your own self-managed infrastructure for business or regulatory requirements, MongoDB offers on-premises management tools. These tools can be used to power a MongoDB database behind a single application, or to build your own private database service and expose it to your development teams. 

Distributed Architecture – Scalable, Resilient, and Mission Critical: Through replica sets and native sharding, MongoDB enables you to scale out your applications with always-on availability. You can distribute data for low-latency user access, while enforcing data sovereignty controls for data privacy regulations such as the GDPR

 Scale-Up, Scale-Out, Scale Across Storage Tiers: MongoDB can be scaled vertically by moving up to larger instance sizes. As a distributed system, MongoDB can perform a rolling restart of the replica set to enable you to move between different instances without application downtime. Sharding with MongoDB allows you to seamlessly scale the database as your applications grow beyond the hardware limits of a single server, and it does so without adding complexity to the application. 

Privacy and Security: With the digital economy becoming so essential for economic prosperity, it’s no surprise that governments and enterprises around the world are responding to growing public concern for the safety of personal data.

Modern data architectures are not limited to the transactional database. Many applications also require analytics and search functionality, which often requires teams to learn, deploy, and manage additional systems. If you’re building mobile apps, you’ll need to deal with data on the device and sync it to the backend. You may also find yourself building data visualizations, writing a lot of glue code to move data between data services, or creating and operating custom data access APIs. In MongoDB Cloud, the database is fully integrated with other data services with automatic syncing, data tiering, and federated query. Search indexes run alongside the database and are automatically kept in sync. Aged data can be auto-archived to cloud storage, providing fully managed data tiering while retaining access. Queries are automatically routed to the appropriate data tier without requiring you to think about data movement, replication, or ETL. MongoDB Cloud can even automatically sync backend data to an embedded database on mobile devices.  Central to MongoDB Cloud is MongoDB Atlas. MongoDB Cloud extends Atlas with other data services that work with it seamlessly, giving you more ways of working with data

MongoDB Atlas Search:  Atlas Search makes it easy to create fast, relevant, full-text search capabilities on top of your data in the cloud, and is built on top of Apache Lucene, the industry standard library.

MongoDB Atlas Data Lake: Atlas Data Lake brings a serverless, scalable data lake to the cloud platform with an on-demand query service that enables you to analyze data in cloud object storage (Amazon S3) in-place using the MongoDB Query Language (MQL).

MongoDB Realm for Data at the Edge: The Realm Mobile Database extends your data foundation out to the edge of the network and is fully integrated with the MongoDB Cloud. Realm is a lightweight database embedded directly on the client device. Realm helps solve the unique challenges of building for mobile, making it simple to store data on-device and enabling data access even when offline.

      MongoDB supports multiple storage engines, as different engines perform better for specific workloads. Choosing the appropriate storage engine for your use case can significantly impact the performance of your applications. The default storage engine is the WiredTiger Storage Engine.

WiredTiger (Default): It is well-suited for most workloads and is recommended for new deployments. WiredTiger provides a document-level concurrency model, checkpointing, and compression, among other features.

The Performance Advisor monitors queries that MongoDB considers slow and suggests new indexes to improve query performance. The threshold for slow queries varies based on the average time of operations on the cluster to provide recommendations pertinent to the workload. [11]

Recommended indexes are accompanied by sample queries, grouped by query shape (A combination of a query predicate, sort, and projection), that were run against a collection that would benefit from the suggested index. 

The Performance Advisor monitors queries that take longer than 100 milliseconds to execute and groups these queries into common query shapes. The Performance Advisor calculates the inefficiency of each query shape by considering the following aggregated metrics from queries that match the shape [12]:

To establish recommended indexes, the Performance Advisor uses these metrics in a formula to calculate the Impact, or performance improvement, that creating an index matching that query shape would cause. The Performance Advisor compares the amount of time spent executing index-specific operations to the total operational latency in the deployment. When the Performance Advisor suggests indexes, the indexes are ranked by their Impact score.

Indexes improve read performance, but a large number of indexes can negatively impact write performance since indexes must be updated during writes. If your collection already has several indexes, consider this tradeoff of read and write performance when deciding whether to create new indexes. Examine whether a query for such a collection can be modified to take advantage of existing indexes, as well as whether a query occurs often enough to justify the cost of a new index.

As we develop and operate applications with MongoDB, we may need to analyze the performance of the application and its database. When we encounter degraded performance, it is often a function of database access strategies, hardware availability, and the number of open database connections.

Some users may experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a result of poor schema design patterns. Locking performance discusses how these can impact MongoDB’s internal locking.

Performance issues may indicate that the database is operating at capacity and that it is time to add additional capacity to the database. In particular, the application’s working set should fit in the available physical memory.

In some cases, performance issues may be temporary and related to the abnormal traffic load. As mentioned in the number of connections, scaling can help ease excessive traffic. Database Profiling can help to understand what operations are causing degradation.

Lock-related slowdowns can be intermittent. To see if the lock has been affecting our performance, we can look at the ‘locks’ and ‘globalLock’ sections of the ‘serverStatus’ output. Dividing ‘locks.timeAcquiringMicros’ by ‘locks.acquireWaitCount’ can give an approximate average wait time for a particular lock mode.‘locks.deadlockCount’ provides the number of times the lock acquisitions encountered deadlocks. If ‘globalLock.currentQueue.total’ is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that may be affecting performance. If ‘globalLock.totalTime’ is high relative to uptime, the database has existed in a lock state for a significant amount of time. Long queries can result from ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufficient RAM, resulting in disk reads.

Spikes in the number of connections can also be the result of application or driver errors. All of the officially supported MongoDB drivers implement connection pooling, which allows clients to use and reuse connections more efficiently. An extremely high number of connections, particularly without corresponding workload, is often indicative of a driver or other configuration error. Unless constrained by system-wide limits, the maximum number of incoming connections supported by MongoDB is configured with the ‘maxIncomingConnections’ setting. On Unix-based systems, system-wide limits can be modified using the ulimit command or by editing our system’s /etc/sysctl file. 

Database Profiling:

The database profiler collects detailed information about Database Commands executed against a running mongod instance. This includes CRUD operations as well as configuration and administration commands. The profiler writes all the data it collects to the ‘system.profile’ collection, a capped collection in the admin database. The profiler’s output can help to identify inefficient queries and operations.

We can enable and configure profiling for individual databases or for all databases on a mongod instance. Profiler settings affect only a single ‘mongod’ instance and will not propagate across a replica set or sharded cluster.

Concurrent Control.jpg

Concurrency control is one of the features of the software management practices that has improved the efficiency of MongoDB’s technical aspects. A higher efficiency implies that the software will play a critical role in the aspects of data processing. Through concurrency control, the system ensures that one transaction is done at a time (Kemme, 2018). The major aim is to create better relationships that will help in increasing both efficiency and productivity of the software. Agile technology of software implementation has made this process better and more efficient. More especially, through Agile technology, it is possible to optimize through cloud computing and other major aspects of cloud storage. Finally, through the Agile methodology, it is possible to add as many servers as the user wants. 

Lock Acquisition.png

Both RDBMS and the emergent software are both relational in terms of application. Both of the databases have a very essential way of handling the large data sets. Besides, they both have the major aspects of data relationships that help in making the software more efficient in data analysis and access (Niyizamwiyitira & Lundberg, 2017).  More especially when creating a new record of data, a relationship can be established to make it easy for the user to record the data and make the data accessible. In the process, normalization can be achieved as the process is easy and not as complicated as in RDBMS, which has no normalization. Therefore, it becomes efficient to handle transactions to ensure that the systems are more productive as well as efficient, like MongoDB. 

In terms of concurrency control, there is a critical difference in the application of Agile software in MongoDB and the RDBMS. For instance, MongoDB can control a series a short transactions one at a time to ensure that there is maximum integrity and confidentiality in the process (Kemme, 2018). Agile software technology has handled the major issues of the software as well as the MongoDB to ensure maximum efficiency. On the other hand, RDBMS holds bulky transactions and data during the process of data execution. Alternatively, as the RDBMS can handle bulky data, it implies that it has distributed databases that improve the productivity of the database. 

There is high efficiency through concurrency control. For instance, the Agile software technology has ensured that one transaction is processed at a time to ensure maximum integrity (Niyizamwiyitira & Lundberg, 2017). Therefore, it becomes easy to maintain the process and ensure maximum security. 

Also, the emergent technology through Agile software technology has ensured that there is high availability. More especially, the data files are easily accessible because there is a relationship between the data records. 

The fact that Agile software technology allows the database system to handle one transaction at a time means that the workflow that will be done will be lower. As such, the software lacks distributed databases (Niyizamwiyitira & Lundberg, 2017). Therefore, there is a risk of being involved in less workflow or slower data record speed. 

As with any other database system, there is an exposure to security risk despite the high efficiency and data security that is provided (Kemme, 2018). More especially, the fact that the system handles data through concurrency control, then it means that an intruder can get time to access the files, because if the transactions are accessed one by one, then there is a risk, though minimal. 

In a multi-user environment, multiple transactions occurring at the same time can have negative implications. The types of problems that can occur include the Lost Update Problem, Dirty Read Problem, Phantom Read, and Unrepeatable Problem. In this environment, concurrent control is necessary to ensure correct results. 

Two approaches database management systems use for concurrent control are either optimistic or pessimistic. Optimistic concurrent control consists of a transaction that is run until completion without taking into consideration any errors that might occur. Once the transaction is ready to commit, the control then checks for conflicts. Pessimistic concurrency controls put locks where errors might happen and take more of a preventative approach when completing a transaction. 

Concurrent Control techniques are used to ensure the integrity of the database by allowing only one transaction to execute at any time.  The control reorders requests of different transactions in a schedule of requests to be serviced. Locking mechanisms deny access to other transactions at the file, record, and field levels until the current transaction is completed.

To ensure consistency, MongoDB uses locks to prevent multiple users from updating the same data at the same time. Write Operations in a single document are atomic.

Multiple granularity locking is used to allow global and collection-level locks. Any locks implemented at the document level are done by individual storage engines. [14]

Reader-writer locks are used to allow concurrent readers access to a database or collection. MongoDB also uses an intent lock to lock at higher levels.

It is important to understand that database security works in conjunction with other network services to provide security to the information on the database. To identify security issues, we will focus on how the database stores information. To store information securely a database must provide CIA (confidentiality, integrity, and availability).

Confidentiality refers to only authorized users having access to the data. Integrity is the accuracy of that data, and availability means authorized users can access the data.[16] Control measures also play are significant role in providing security. Control measures include access control, inference control, flow control, and data encryption.

In comparison, relational database management systems have stronger security features. The advantage of having security features that include role-based security, access control, data encryption, a well-defined schema, and following strict ACID properties.. The disadvantage of these security features is that they affect the speed of data access.

[1] https://learning.oreilly.com/library/view/nosql-for-dummies/9781118905746/05_9781118905746-ch01.xhtml (Links to an external site.)

[2] https://www.dataversity.net/a-brief-history-of-non-relational-databases/# (Links to an external site.)

[3] https://learning.oreilly.com/library/view/nosql-for-dummies/9781118905746/05_9781118905746-ch01.xhtml (Links to an external site.)

[4] https://intellipaat.com/blog/what-is-mongodb/ (Links to an external site.)

[5] www.geeksforgeeks.org/use-of-nosql-in-industry/ (Links to an external site.)

[6] www.geeksforgeeks.org/use-of-nosql-in-industry/ (Links to an external site.)

[7] Architecture Guide – Core Principles (Links to an external site.)

[8] MongoDB Internals Presentation (Links to an external site.)

[9] MongoDB Architecture Comparisons with RDBMS (Links to an external site.)

[10] Terminologies (Links to an external site.)

[11] https://docs.mongodb.com/manual/administration/analyzing-mongodb-performance/ (Links to an external site.)

[12] https://blog.couchbase.com/query-optimization-in-nosql-couchbase-mongodb/ (Links to an external site.)

[13] https://docs.atlas.mongodb.com/performance-advisor/ (Links to an external site.)

[14]https://docs.mongodb.com/manual/faq/concurrency/

[15]https://www.researchgate.net/publication/337854725_A_Comparative_Study_of_NOSQL_System_Vulnerabilities_with_Big_Data (Links to an external site.)

[16]https://www.softwaretestinghelp.com/sql-vs-nosql/

[17] Kemme, B. (2018). Replicated database concurrency control. Encyclopedia of Database Systems, 3175-3176. https://doi.org/10.1007/978-1-4614-8265-9_311 (Links to an external site.)

[18] Niyizamwiyitira, C., & Lundberg, L. (2017). Performance Evaluation of SQL and NoSQL Database Management Systems in a Cluster. International Journal Of Database Management Systems9(6), 01-24. https://doi.org/10.5121/ijdms.2017.9601


Eduinfomark

Education, Reflection, Scholarship