MongoDB – walk-through – Part 2

You cannot use undefined in query documents. Consider the following document inserted into the people collection:

NumberLong is a 64 bit signed integer. You must include quotation marks or it will be interpreted as a floating point number, resulting in a loss of accuracy.

To install specific MongoDB component sets, you can specify them in the ADDLOCAL argument using a comma-separated list including one or more of the following component sets:

If the mongo shell does not accept the name of the collection, for instance if the name contains a space, hyphen, or starts with a number, you can use an alternate syntax to refer to the collection, as in the following:

The db.collection.find() method returns a cursor to the results; however, in the mongo shell, if the returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents that match the query.

To format the printed result, you can add the .pretty() to the operation, as in the following:


To see the implementation of a method in the shell, type the db.<method name> without the parenthesis (()), as in the following example which will return the implementation of the method db.updateUser():

To list the available modifier and cursor handling methods, use the db.collection.find().help() command:

The following table maps the most common mongo shell helpers to their JavaScript equivalents:

To print all items in a result cursor in mongo shell scripts, use the following idiom:

db.getLastError() and db.getLastErrorObj() methods to return error information.

BSON is a binary representation of JSON with additional type information.

Cursors are queries return iterable objects, called cursors, that hold the full result set.

Distributed Queries are Describes how sharded clusters and replica sets affect the performance of read operations.

MongoDB queries exhibit the following behavior:

  • All queries in MongoDB address a single collection.
  • The order of documents returned by a query is not defined unless you specify a sort().
  • Operations that modify existing documents (i.e. updates) use the same query syntax as queries to select documents to update.
  • In aggregation pipeline, the $match pipeline stage provides access to MongoDB queries.

By default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor. To override this behavior in the mongo shell, you can use the cursor.noCursorTimeout() method.

After setting the noCursorTimeout option, you must either close the cursor manually with cursor.close() or by exhausting the cursor’s results.

For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.

To see how many documents remain in the batch as you iterate the cursor, you can use the objsLeftInBatch() method, as in the following example:

An index covers a query when both of the following apply:

  • all the fields in the query are part of an index, and
  • all the fields returned in the results are in the same index.

Indexes are typically available in RAM or located sequentially on disk.

index restrictions

An index cannot cover a query if:

  • any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index and cannot support a covered query.

  • any of the indexed fields in the query predicate or returned in the projection are fields in embedded documents. For example, consider a collection users with documents of the following form:

To determine whether a query is a covered query, use the db.collection.explain() or the explain() method and review the results.

The MongoDB query optimizer processes queries and chooses the most efficient query plan for a query given the available indexes. The query system then uses this query plan each time the query runs.

In MongoDB, write operations target a single collection. All write operations in MongoDB are atomic on the level of a single document.

No insert, update, or delete can affect more than one document atomically.

MongoDB provides the following methods for inserting documents into a collection:

  • db.collection.insertOne()
  • db.collection.insertMany()
  • db.collection.insert()
 The updateOne(), updateMany(), and replaceOne() operations accept the upsert parameter. When upsert : true, if no document in the collection matches the filter, a new document is created based on the information passed to the operation.
update statement:

db.collection.replaceOne() replaces a single document.

sql equivalent:

Operations performed by an update are atomic within a single document. For example, you can safely use the $inc and $mul operators to modify frequently-changed fields in concurrent applications.

MongoDB provides the following methods for deleting documents from a collection:

  • db.collection.deleteOne()
  • db.collection.deleteMany()
  • db.collection.remove()

The method can either update an existing document or insert a document if the document cannot be found by the _id field.

 The following bulkWrite() inserts several documents, performs an update, and then deletes several documents.

capped collections in mongodb.

Capped collections are fixed-size collections that support high-throughput operations that insert and retrieve documents based on insertion order. Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.

A projection can explicitly include several fields.

Collections with validation compare each inserted or updated document against the criteria specified in the validator option. Depending on the validationLevel and validationAction, MongoDB either returns a warning, or refuses to insert or update the document if it fails to meet the specified criteria.

$eq example

Use $elemMatch operator to specify multiple criteria on the elements of an array such that at least one array element satisfies all the specified criteria.

If you do not know the index position of the document in the array, concatenate the name of the field that contains the array, with a dot (.) and the name of the field in the embedded document.

The following example selects all documents where the points is an array with at least one embedded document that contains the field points whose value is less than or equal to 55:

In sharded clusters, if you do not use the _id field as the shard key, then your application must ensure the uniqueness of the values in the _id field to prevent errors. This is most-often done by using a standard auto-generated ObjectId.

MongoDB indexes use a B-tree data structure. B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children. B-trees are a good example of a data structure for external memory. It is commonly used in databases and filesystems.


Disadvantages of B-trees

maximum key length cannot be changed without completely rebuilding the database. This led to many database systems truncating full human names to 70 characters.

A B Tree insertion example with each iteration. The nodes of this B tree have at most 3 children (Knuth order 3).

Initial construction of B-Trees

Most of the tree operations (search, insert, delete, max, min, ..etc ) require O(h) disk accesses where h is height of the tree. B-tree is a fat tree. Height of B-Trees is kept low by putting maximum possible keys in a B-Tree node. Generally, a B-Tree node size is kept equal to the disk block size. Since h is low for B-Tree, total disk accesses for most of the operations are reduced significantly compared to balanced Binary Search Trees like AVL Tree, Red Black Tree, ..etc.

Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. As a result, there is a practical maximum for vertical scaling. Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. MongoDB supports horizontal scaling through sharding.

The following graphic describes the interaction of components within a sharded cluster:


Leave a Reply

Your email address will not be published. Required fields are marked *