MongoDB – walk-through – Part 2

You cannot use undefined in query documents. Consider the following document inserted into the people collection:

NumberLong is a 64 bit signed integer. You must include quotation marks or it will be interpreted as a floating point number, resulting in a loss of accuracy.

To install specific MongoDB component sets, you can specify them in the ADDLOCAL argument using a comma-separated list including one or more of the following component sets:

If the mongo shell does not accept the name of the collection, for instance if the name contains a space, hyphen, or starts with a number, you can use an alternate syntax to refer to the collection, as in the following:

The db.collection.find() method returns a cursor to the results; however, in the mongo shell, if the returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents that match the query.

To format the printed result, you can add the .pretty() to the operation, as in the following:

db.serverStatus()

To see the implementation of a method in the shell, type the db.<method name> without the parenthesis (()), as in the following example which will return the implementation of the method db.updateUser():

To list the available modifier and cursor handling methods, use the db.collection.find().help() command:

The following table maps the most common mongo shell helpers to their JavaScript equivalents:

To print all items in a result cursor in mongo shell scripts, use the following idiom:

db.getLastError() and db.getLastErrorObj() methods to return error information.

BSON is a binary representation of JSON with additional type information.

Cursors are queries return iterable objects, called cursors, that hold the full result set.

Distributed Queries are Describes how sharded clusters and replica sets affect the performance of read operations.

MongoDB queries exhibit the following behavior:

  • All queries in MongoDB address a single collection.
  • The order of documents returned by a query is not defined unless you specify a sort().
  • Operations that modify existing documents (i.e. updates) use the same query syntax as queries to select documents to update.
  • In aggregation pipeline, the $match pipeline stage provides access to MongoDB queries.

By default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor. To override this behavior in the mongo shell, you can use the cursor.noCursorTimeout() method.

After setting the noCursorTimeout option, you must either close the cursor manually with cursor.close() or by exhausting the cursor’s results.

For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.

To see how many documents remain in the batch as you iterate the cursor, you can use the objsLeftInBatch() method, as in the following example:

An index covers a query when both of the following apply:

  • all the fields in the query are part of an index, and
  • all the fields returned in the results are in the same index.

Indexes are typically available in RAM or located sequentially on disk.

index restrictions

An index cannot cover a query if:

  • any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index and cannot support a covered query.

  • any of the indexed fields in the query predicate or returned in the projection are fields in embedded documents. For example, consider a collection users with documents of the following form:

To determine whether a query is a covered query, use the db.collection.explain() or the explain() method and review the results.

The MongoDB query optimizer processes queries and chooses the most efficient query plan for a query given the available indexes. The query system then uses this query plan each time the query runs.

In MongoDB, write operations target a single collection. All write operations in MongoDB are atomic on the level of a single document.

No insert, update, or delete can affect more than one document atomically.

MongoDB provides the following methods for inserting documents into a collection:

  • db.collection.insertOne()
  • db.collection.insertMany()
  • db.collection.insert()
 The updateOne(), updateMany(), and replaceOne() operations accept the upsert parameter. When upsert : true, if no document in the collection matches the filter, a new document is created based on the information passed to the operation.
update statement:

db.collection.replaceOne() replaces a single document.

sql equivalent:

Operations performed by an update are atomic within a single document. For example, you can safely use the $inc and $mul operators to modify frequently-changed fields in concurrent applications.

MongoDB provides the following methods for deleting documents from a collection:

  • db.collection.deleteOne()
  • db.collection.deleteMany()
  • db.collection.remove()

The db.collection.save() method can either update an existing document or insert a document if the document cannot be found by the _id field.

 The following bulkWrite() inserts several documents, performs an update, and then deletes several documents.

capped collections in mongodb.

https://docs.mongodb.com/manual/core/capped-collections/

Capped collections are fixed-size collections that support high-throughput operations that insert and retrieve documents based on insertion order. Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.

A projection can explicitly include several fields.

Collections with validation compare each inserted or updated document against the criteria specified in the validator option. Depending on the validationLevel and validationAction, MongoDB either returns a warning, or refuses to insert or update the document if it fails to meet the specified criteria.

$eq example

Use $elemMatch operator to specify multiple criteria on the elements of an array such that at least one array element satisfies all the specified criteria.

If you do not know the index position of the document in the array, concatenate the name of the field that contains the array, with a dot (.) and the name of the field in the embedded document.

The following example selects all documents where the points is an array with at least one embedded document that contains the field points whose value is less than or equal to 55:

In sharded clusters, if you do not use the _id field as the shard key, then your application must ensure the uniqueness of the values in the _id field to prevent errors. This is most-often done by using a standard auto-generated ObjectId.

MongoDB indexes use a B-tree data structure. B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children. B-trees are a good example of a data structure for external memory. It is commonly used in databases and filesystems.

 

Disadvantages of B-trees

maximum key length cannot be changed without completely rebuilding the database. This led to many database systems truncating full human names to 70 characters.

A B Tree insertion example with each iteration. The nodes of this B tree have at most 3 children (Knuth order 3).

Initial construction of B-Trees

Most of the tree operations (search, insert, delete, max, min, ..etc ) require O(h) disk accesses where h is height of the tree. B-tree is a fat tree. Height of B-Trees is kept low by putting maximum possible keys in a B-Tree node. Generally, a B-Tree node size is kept equal to the disk block size. Since h is low for B-Tree, total disk accesses for most of the operations are reduced significantly compared to balanced Binary Search Trees like AVL Tree, Red Black Tree, ..etc.

Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. As a result, there is a practical maximum for vertical scaling. Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. MongoDB supports horizontal scaling through sharding.

The following graphic describes the interaction of components within a sharded cluster:

 

MongoDB – walk-through – (Part 1)

MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling. MongoDB obviates (remove a need or difficulty) the need for an Object Relational Mapping (ORM) to facilitate development.

Download RoboMongo Here: https://robomongo.org/

Install MongoDB for Windows

MongoDB requires a data directory to store all data. MongoDB’s default data directory path is \data\db.

Create a configuration file. The file must set systemLog.path. Include additional configuration options as appropriate.

For example, create a file at C:\mongodb\mongod.cfg that specifies both systemLog.path and storage.dbPath:

Install the MongoDB service by starting mongod.exe with the –install option and the -config option to specify the previously created configuration file.

Starting and stopping the service

Mongoimport

mongoimport –db test –collection restaurants –drop –file primer-dataset.json

Help command section

Insert document

All queries in MongoDB have the scope of a single collection. This means you can only return result from a single collection (list of documents).

Query by a Field in an Embedded Document

Query by a Field in an Array

Greater Than Operator ($gt)

You can specify a logical conjunction (AND) for a list of query conditions by separating the conditions with a comma in the conditions document.

You can specify a logical disjunction (OR) for a list of query conditions by using the $or query operator.

To specify an order for the result set, append the sort() method to the query.

You can use the update() method to update documents of a collection. The method accepts as its parameters:

  • a filter document to match the documents to update,
  • an update document to specify the modification to perform, and
  • an options parameter (optional).

Some update operators, such as $set, will create the field if the field does not exist.

Update an Embedded Field

To update a field within an embedded document, use the dot notation. When using the dot notation, enclose the whole dotted field name in quotes.

By default, the update() method updates a single document. To update multiple documents, use the multi option in the update() method.

To replace the entire document except for the _id field, pass an entirely new document as the second argument to the update() method.

After the following update, the modified document will only contain the _id field, name field, the addressfield.
i.e. the document will not contain the restaurant_id, cuisine, grades, and the borough fields.

If no document matches the update condition, the default behavior of the update method is to do nothing. By specifying the upsert option to true, the update operation either updates matching document(s) or inserts a new document if no matching document exists.

To specify a remove condition, use the same structure and syntax as the query conditions.

By default, the remove() method removes all documents that match the remove condition. Use the justOne option to limit the remove operation to only one of the matching documents.

Remove All Documents

To remove all documents from a collection, pass an empty conditions document {} to the remove() method.

 

Drop a Collection

To remove all documents from a collection, it may be more efficient to drop the entire collection, including the indexes, and then recreate the collection and rebuild the indexes. Use the drop() method to drop a collection, including any indexes.

In MongoDB, write operations are atomic on the level of a single document. If a single remove operation removes multiple documents from a collection, the operation can interleave with other write operations on that collection.

So the transactions are interleaved, means second transaction is started before the first one could end. And execution can switch between the transactions back and forth. It can also switch between multiple transactions.

This could actually cause inconsistency in the system. But they are handled by the database systems.

MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.

To create an index on a field or fields, pass to the createIndex() method an index key specification document that lists the fields to index and the index type for each field:

  • For an ascending index type, specify 1 for <type>.
  • For a descending index type, specify -1 for <type>.

Create an ascending index on the "cuisine" field of the restaurants collection.

Compound indexes which are indexes on multiple fields.

Drivers for MongoDB are the client libraries that handle the interface between the application and the MongoDB servers and deployments. Drivers are responsible for managing connections to MongoDB standalone instances, replica sets, or sharded clusters. Drivers provide the methods and interfaces that applications use to interact with MongoDB as well as handle the translation of documents between BSON objects and native mapping structures.

The advantages of using documents are:

  • Documents (i.e. objects) correspond to native data types in many programming languages.
  • Embedded documents and arrays reduce need for expensive joins.
  • Dynamic schema supports fluent polymorphism.

MongoDB’s replication facility, called replica set, provides:

  • automatic failover and
  • data redundancy.

A replica set is a group of MongoDB servers that maintain the same data set, providing redundancy and increasing data availability.

MongoDB provides horizontal scalability as part of its core functionality:

  • Sharding distributes data across a cluster of machines.
  • Tag aware sharding allows for directing data to specific shards, such as to take into consideration geographic distribution of the shards.

If a collection does not exist, MongoDB creates the collection when you first store data for that collection.

MongoDB provides the db.createCollection() method to explicitly create a collection with various options, such as setting the maximum size or the documentation validation rules. If you are not specifying these options, you do not need to explicitly create the collection since MongoDB creates new collections when you first store data for the collections.

Starting in MongoDB 3.2, however, you can enforce document validation rules for a collection during update and insert operations.

Capped collections are fixed-size collections that support high-throughput operations that insert and retrieve documents based on insertion order. Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.

cap
1. put a lid or cover on.
“he capped his pen”
2. place a limit or restriction on (prices, expenditure, or other activity).
“council budgets will be capped”
synonyms: set a limit on, limit, restrict.

Consider the following potential use cases for capped collections:

  • Store log information generated by high-volume systems. Inserting documents in a capped collection without an index is close to the speed of writing log information directly to a file system. Furthermore, the built-in first-in-first-out property maintains the order of events, while managing storage use.
  • Cache small amounts of data in a capped collections. Since caches are read rather than write heavy, you would either need to ensure that this collection always remains in the working set (i.e. in RAM) or accept some write penalty for the required index or indexes.

If you plan to update documents in a capped collection, create an index so that these update operations do not require a collection scan. If an update or a replacement operation changes the document size, the operation will fail. That’s why try to use capped collections for write-and-forget type storage. Like error logs and cache that you will not possibly update the document over time.

You cannot delete documents from a capped collection. To remove all documents from a collection, use the drop() method to drop the collection and recreate the capped collection. You cannot shard a capped collection.

Use natural ordering to retrieve the most recently inserted elements from the collection efficiently. This is (somewhat) analogous to tail on a log file.
You must create capped collections explicitly using the db.createCollection() method.

You may also specify a maximum number of documents for the collection using the max field as in the following document. The size argument is always required, even when you specify max number of documents. MongoDB will remove older documents if a collection reaches the maximum size limit before it reaches the maximum document count.

If you perform a find() on a capped collection with no ordering specified, MongoDB guarantees that the ordering of results is the same as the insertion order.

To retrieve documents in reverse insertion order:

Use the isCapped() method to determine if a collection is capped, as follows:

You can convert a non-capped collection to a capped collection with the convertToCapped command:

This command obtains a global write lock and will block other operations until it has completed.

For additional flexibility when expiring data, consider MongoDB’s TTL indexes, as described in Expire Data from Collections by Setting TTL. These indexes allow you to expire and remove data from normal collections using a special type, based on the value of a date-typed field and a TTL value for the index.

TTL Collections are not compatible with capped collections.

BSON con­tains ex­ten­sions that al­low rep­res­ent­a­tion of data types that are not part of the JSON spec. For ex­ample, BSON has a Date type and a BinData type.

Documents have the following restrictions on field names:

  • The field name _id is reserved for use as a primary key; its value must be unique in the collection, is immutable, and may be of any type other than an array.
  • The field names cannot start with the dollar sign ($) character.
  • The field names cannot contain the dot (.) character.
  • The field names cannot contain the null character.

The maximum BSON document size is 16 megabytes.

To store documents larger than the maximum size, MongoDB provides the GridFS API.

You can access the creation time of the ObjectId, using the ObjectId.getTimestamp() method.

BSON strings are UTF-8. UTF-8 is a character encoding capable of encoding all possible characters, or code points, defined by Unicode. The encoding is variable-length and uses 8-bit code units.

The BSON timestamp type is for internal MongoDB use. For most cases, in application development, you will want to use the BSON date type.

BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This results in a representable date range of about 290 million years into the past and future. BSON Date type is signed. Negative values represent dates before 1970.

var mydate2 = ISODate() will equivalent below;