More than 3 years have passed since last update.

MongoDB (Advanced) - Note

Last updated at 2021-06-09Posted at 2021-06-09

Content

Relationships
Database References
Covered Queries
Analyzing Queries
Atomic Operations
Advanced Indexing & Indexing Limitations
Text Search
Regular Expression
ObjectId
Map Reduce (= Aggregation)
RockMongo
GridFS
Capped Collections
Auto-Increment Sequence

参照サイト

Summary

Quick Guide : https://www.tutorialspoint.com/mongodb/mongodb_quick_guide.htm

Relationships

Relationships in MongoDB represent how various documents are logically related to each other.
Relationships can be modeled via Embedded and Referenced (Manual References) approaches.
Such relationships can be either 1:1, 1:N, N:1 or N:N.

Let us consider the case of storing addresses for users. So, one user can have multiple addresses making this a 1:N relationship.

Each document

# user document −
{
   "_id":ObjectId("52ffc33cd85242f436000001"),
   "name": "Tom Hanks",
   "contact": "987654321",
   "dob": "01-01-1991"
}

# address document −
{
   "_id":ObjectId("52ffc4a5d85242602e000000"),
   "building": "22 A, Indiana Apt",
   "pincode": 123456,
   "city": "Los Angeles",
   "state": "California"
}

Modeling Embedded Relationships

> db.users.insert({
	{
		"_id":ObjectId("52ffc33cd85242f436000001"),
		"contact": "987654321",
		"dob": "01-01-1991",
		"name": "Tom Benzamin",
		"address": [
			{
				"building": "22 A, Indiana Apt",
				"pincode": 123456,
				"city": "Los Angeles",
				"state": "California"
			},
			{
				"building": "170 A, Acropolis Apt",
				"pincode": 456789,
				"city": "Chicago",
				"state": "Illinois"
			}
		]
	}
})

> >db.users.findOne({"name":"Tom Benzamin"},{"address":1})

Modeling Referenced Relationships (Manual References)

{
   "_id":ObjectId("52ffc33cd85242f436000001"),
   "contact": "987654321",
   "dob": "01-01-1991",
   "name": "Tom Benzamin",
   "address_ids": [
      ObjectId("52ffc4a5d85242602e000000"),
      ObjectId("52ffc4a5d85242602e000001")
   ]
}

# Java
>var result = db.users.findOne({"name":"Tom Benzamin"},{"address_ids":1})
>var addresses = db.address.find({"_id":{"$in":result["address_ids"]}})

Database References

Manual References

Store the referenced document's id inside other document

DBRefs

A document contains references from different collections

3 Fields of DBRefs	Description
`$ref`	This field specifies the collection of the referenced document
`$id`	This field specifies the _id field of the referenced document
`$db`	This is an optional field and contains the name of the database in which the referenced document lies

Example :
Storing different types of addresses (home, office, mailing, etc.) in

different collections (address_home, address_office, address_mailing, etc)
Following example shows address DBRef field here specifies that the referenced address document lies in address_home collection under tutorialspoint database and has an id of 534009e4d852427820000002.

# 'user' document having DBRef field address
{
   "_id":ObjectId("53402597d852426020000002"),
   "address": {
     "$ref": "address_home",
     "$id": ObjectId("534009e4d852427820000002"),
     "$db": "tutorialspoint"
   },
   "contact": "987654321",
   "dob": "01-01-1991",
   "name": "Tom Benzamin"
}

# in Java
>var user = db.users.findOne({"name":"Tom Benzamin"})
>var dbRef = user.address
>db[dbRef.$ref].findOne({"_id":(dbRef.$id)})

# will return following 'address' document present in address_home collection
{
   "_id" : ObjectId("534009e4d852427820000002"),
   "building" : "22 A, Indiana Apt",
   "pincode" : 123456,
   "city" : "Los Angeles",
   "state" : "California"
}

Covered Queries

A covered query is a query in which −
- All the fields in the query are part of an index : MongoDB matches the query conditions and returns the result using the same index without actually looking inside the documents. Since indexes are present in RAM, fetching data from indexes is much faster as compared to fetching data by scanning documents.
- All the fields returned in the query are in the same index.
Remember that an index cannot cover a query if −
- Any of the indexed fields is an array
- Any of the indexed fields is a subdocument

Using Covered Queries

Following covered query say that MongoDB would not go looking into database documents.
Instead it would fetch the required data from indexed data which is very fast.

# Following document exist in the users collection −
{
   "_id": ObjectId("53402597d852426020000003"),
   "contact": "987654321",
   "dob": "01-01-1991",
   "gender": "M",
   "name": "Tom Benzamin",
   "user_name": "tombenzamin"
}

# At first, will create a compound index for the users collection on the fields gender and user_name using the following query −
>db.users.createIndex({gender:1,user_name:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}

# Now, this index will cover the following query −
>db.users.find({gender:"M"},{user_name:1,_id:0})
{ "user_name" : "tombenzamin" }

Since index does not include _id field, need to explicitly excluded it from result set of query, as MongoDB by default returns _id field in every query.
So the following query would not have been covered inside the index created above −

>db.users.find({gender:"M"},{user_name:1})
{ "_id" : ObjectId("53402597d852426020000003"), "user_name" : "tombenzamin" }

Analyzing Queries

Analyzing queries is a very important aspect of measuring how effective the database and indexing design is.
We will learn about the frequently used $explain and $hint queries.

Operator	Description
$explain	Provides information on the query, indexes used in a query and other statistics. It is very useful when analyzing how well your indexes are optimized.
$hint	This operator forces the query optimizer to use the specified index to run a query. This is particularly useful when you want to test performance of a query with different indexes.

At first, will create a compound index for the users collection on the fields gender and user_name using the following query −

>db.users.createIndex({gender:1,user_name:1})
{
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 2,
	"note" : "all indexes already exist",
	"ok" : 1
}

explain()

explain() query returns the following analyzed result −

>db.users.find({gender:"M"},{user_name:1,_id:0}).explain()

{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "mydb.users",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"gender" : {
				"$eq" : "M"
			}
		},
		"queryHash" : "B4037D3C",
		"planCacheKey" : "DEAAE17C",
		"winningPlan" : {
			"stage" : "PROJECTION_COVERED",
			"transformBy" : {
				"user_name" : 1,
				"_id" : 0
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"gender" : 1,
					"user_name" : 1
				},
				"indexName" : "gender_1_user_name_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"gender" : [ ],
					"user_name" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"gender" : [
						"[\"M\", \"M\"]"
					],
					"user_name" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Krishna",
		"port" : 27017,
		"version" : "4.2.1",
		"gitVersion" : "edf6d45851c0b9ee15548f0f847df141764a317e"
	},
	"ok" : 1
}

Explanation of above result

We will now look at the fields in this result set −
- The true value of indexOnly indicates that this query has used indexing.
- The cursor field specifies the type of cursor used. BTreeCursor type indicates that
  an index was used and also gives the name of the index used.
  BasicCursor indicates that a full scan was made without using any indexes.
- n indicates the number of documents matching returned.
- nscannedObjects indicates the total number of documents scanned.
- nscanned indicates the total number of documents or index entries scanned.

hint()

For example, the following query specifies the index on fields gender and user_name to be used for this query −

>db.users.find({gender:"M"},{user_name:1,_id:0}).hint({gender:1,user_name:1})
{ "user_name" : "tombenzamin" }

# To analyze the above query using $explain −
>db.users.find({gender:"M"},{user_name:1,_id:0}).hint({gender:1,user_name:1}).explain()

{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "mydb.users",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"gender" : {
				"$eq" : "M"
			}
		},
		"queryHash" : "B4037D3C",
		"planCacheKey" : "DEAAE17C",
		"winningPlan" : {
			"stage" : "PROJECTION_COVERED",
			"transformBy" : {
				"user_name" : 1,
				"_id" : 0
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"gender" : 1,
					"user_name" : 1
				},
				"indexName" : "gender_1_user_name_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"gender" : [ ],
					"user_name" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"gender" : [
						"[\"M\", \"M\"]"
					],
					"user_name" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Krishna",
		"port" : 27017,
		"version" : "4.2.1",
		"gitVersion" : "edf6d45851c0b9ee15548f0f847df141764a317e"
	},
	"ok" : 1
}

Atomic Operations

Do find and update for Existing Data of Following process will atomic.
- Find product_available is > 0.
- If yes,
  - Decrease product_available
  - add customer info to product_bought_by.

>db.products.findAndModify({ 
   query:{_id:2,product_available:{$gt:0}}, 
   update:{ 
      $inc:{product_available:-1}, 
      $push:{product_bought_by:{customer:"rob",date:"9-Jan-2014"}} 
   }    
})

Existing Data

>db.createCollection("products")
{ "ok" : 1 }
> db.productDetails.insert(
	{
		"_id":1,
		"product_name": "Samsung S3",
		"category": "mobiles",
		"product_total": 5,
		"product_available": 3,
		"product_bought_by": [
			{
				"customer": "john",
				"date": "7-Jan-2014"
			},
			{
				"customer": "mark",
				"date": "8-Jan-2014"
			}
		]
	}
)
WriteResult({ "nInserted" : 1 })
>

Advanced Indexing & Indexing Limitations

Query Limitations

Indexing can't be used in queries which use −
- Regular expressions or negation operators like $nin, $not, etc.
- Arithmetic operators like $mod, etc.
- $where clause
Hence, it is always advisable to check the index usage for your queries.

Maximum Ranges

A collection cannot have more than 64 indexes.
The length of the index name cannot be longer than 125 characters.
A compound index can have maximum 31 fields indexed.

Preparation

Following document contains an address sub-document and a tags array.

db.users.insert(
	{
		"address": {
			"city": "Los Angeles",
			"state": "California",
			"pincode": "123"
		},
		"tags": [
			"music",
			"cricket",
			"blogs"
		],
		"name": "Tom Benzamin"
	}
)

Indexing Array Fields

Create an index on 'tags' array.

>db.users.createIndex({"tags":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
>

Can search on the tags field of the collection like this.

> db.users.find({tags:"cricket"}).pretty()
{
	"_id" : ObjectId("5dd7c927f1dd4583e7103fdf"),
	"address" : {
		"city" : "Los Angeles",
		"state" : "California",
		"pincode" : "123"
	},
	"tags" : [
		"music",
		"cricket",
		"blogs"
	],
	"name" : "Tom Benzamin"
}
>

# To verify that proper indexing is used, use the following explain command −
# This command resulted in "cursor" : "BtreeCursor tags_1" which confirms that proper indexing is used.
>db.users.find({tags:"cricket"}).explain()

Indexing Sub-Document Fields

Remember that the query expression has to follow the order of the index specified.
Creating an index on all the three fields of the sub-document

>db.users.createIndex({"address.city":1,"address.state":1,"address.pincode":1})
{
	"numIndexesBefore" : 4,
	"numIndexesAfter" : 4,
	"note" : "all indexes already exist",
	"ok" : 1
}
>

Can search for any of the sub-document fields utilizing this index.

> db.users.find({"address.city":"Los Angeles"}).pretty()
{
	"_id" : ObjectId("5dd7c927f1dd4583e7103fdf"),
	"address" : {
		"city" : "Los Angeles",
		"state" : "California",
		"pincode" : "123"
	},
	"tags" : [
		"music",
		"cricket",
		"blogs"
	],
	"name" : "Tom Benzamin"
}  

# also support following search order
>db.users.find({"address.city":"Los Angeles","address.state":"California"}).pretty()

Text Search

Support starting from version 2.4, supporting text indexes to search inside string content.
Starting from version 2.6, Text Search was enabled.

# Existing document
> db.posts.insert({
   "post_text": "enjoy the mongodb articles on tutorialspoint",
   "tags": ["mongodb", "tutorialspoint"]
}

Title	Syntax	Example
Creating Text Index	db.COLLECTIONNAME.createIndex({FIELD_NAME:"text"})	> db.posts.createIndex({post_text:"text"}) Example
Using Text Index	db.COLLECTIONNAME.find({`$text`:{`$search`:"SEARCH_KEYWORD"}}).pretty()	# Search text contains 'tutorialspoint' for post_text field > db.posts.find({`$text`:{`$search`:"tutorialspoint"}}).pretty() Example
Deleting Text Index	>db.COLLECTIONNAME.dropIndex("INDEX_NAME")	# Fristly, see index name >db.COLLECTIONNAME.getIndexes() # Drop index >db.posts.dropIndex("post_text_text") { "nIndexesWas" : 2, "ok" : 1 } Example

Regular Expression

Optimizing Regular Expression Queries

If the document fields are indexed, the query will use make use of indexed values to match the regular expression.

This makes the search very fast as compared to the regular expression scanning the whole collection.
If the regular expression is a prefix expression, all the matches are meant to start with a certain string characters.

For e.g., if the regex expression is ^tut, then the query has to search for only those strings that begin with tut.

Title	Syntax	Example
Using regex Expression	db.COLLECTIONNAME.find( {FIELD_NAME:{$regex:"SEARCH_KEYWORD"}}) .pretty()	> db.posts.find({post_text:{$regex:"tutorialspoint"}}).pretty() or > db.posts.find({post_text:/tutorialspoint/})> # Also can search for array field > db.posts.find({tags:{$regex:"tutorial"}}) Example
Using regex Expression with Case Insensitive Use `$options` parameter with value `$i`	db.COLLECTIONNAME.find( {FIELD_NAME:{$regex:"SEARCH_KEYWORD", $options:"$i"}})	> db.posts.find({post_text:{$regex:"tutorialspoint", $options:"$i"}}) Example

ObjectId

Creating New ObjectId

# Generate a new ObjectId
>newObjectId = ObjectId()
or
# Instead of MongoDB generating the ObjectId, you can also provide a 12-byte id −
>myObjectId = ObjectId("5349b4ddd2781d08c09890f4")

# will returned the following uniquely generated id −
ObjectId("5349b4ddd2781d08c09890f3")

# Converting ObjectId to String
>newObjectId.str
The above code will return the string format of the Guid −

# will return ObjectId as string
5349b4ddd2781d08c09890f3

Creating Timestamp of a Document

# Get the creation time of a document using getTimestamp method −
>ObjectId("5349b4ddd2781d08c09890f4").getTimestamp()

# will return the creation time of this document in ISO date format −
ISODate("2014-04-12T21:49:17Z")

Map Reduce (= Aggregation)

Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results.
Syntax

>db.collection.mapReduce(
   function() {emit(key,value);},  //map function
   function(key,values) {return reduceFunction}, {   //reduce function
      out: collection,
      query: document,
      sort: document,
      limit: number
   }
)

keyword	Description
map	a javascript function that maps a value with a key and emits a key-value pair
reduce	a javascript function that reduces or groups all the documents having the same key
out	specifies the location of the map-reduce query result
query	specifies the optional selection criteria for selecting documents
sort	specifies the optional sort criteria
limit	specifies the optional maximum number of documents to be returned

Example

# existing 'post' document : 
{
   "post_text": "tutorialspoint is an awesome website for tutorials",
   "user_name": "mark",
   "status":"active"
}

# From posts collection, retrieve all the active posts,
# group them on the basis of user_name and then count the number of posts by each user.
>db.posts.mapReduce( 
   function() { emit(this.user_id,1); }, 
	
   function(key, values) {return Array.sum(values)}, {  
      query:{status:"active"},  
      out:"post_total" 
   }
)

Result shows that a total of 4 documents matched the query (status:"active"),

the map function emitted 4 documents with key-value pairs and finally

the reduce function grouped mapped documents having the same keys into 2.

{
   "result" : "post_total",
   "timeMillis" : 9,
   "counts" : {
      "input" : 4,
      "emit" : 4,
      "reduce" : 2,
      "output" : 2
   },
   "ok" : 1,
}

To see the result of this mapReduce query

>db.posts.mapReduce( 
   function() { emit(this.user_id,1); }, 
   function(key, values) {return Array.sum(values)}, {  
      query:{status:"active"},  
      out:"post_total" 
   }
	
).find()

{ "_id" : "tom", "value" : 2 }
{ "_id" : "mark", "value" : 2 }

RockMongo

RockMongo is a MongoDB administration tool using which you can manage your server, databases, collections, documents, indexes, and a lot more.
It provides a very user-friendly way for reading, writing, and creating documents.
It is similar to PHPMyAdmin tool for PHP and MySQL.
Refer details in tutorial site.

GridFS

GridFS is the MongoDB specification for storing and retrieving large files such as images, audio files, video files, etc.
It is kind of a file system to store files but its data is stored within MongoDB collections.
GridFS has the capability to store files even greater than its document size limit of 16MB.
GridFS by default uses two collections fs.files : serves as a parent document and
fs.chunks : 'files_id' field links the chunk to its parent to store the file's metadata and the chunks
Each chunk is identified by its unique _id ObjectId field.

# Sample of fs.files collection : The document specifies the file name, chunk size, uploaded date, and length
{
   "filename": "test.txt",
   "chunkSize": NumberInt(261120),
   "uploadDate": ISODate("2014-04-13T11:32:33.557Z"),
   "md5": "7b762939321e146569b07f72c62cca4f",
   "length": NumberInt(646)
}

# Sample of fs.chunks document
{
   "files_id": ObjectId("534a75d19f54bfec8a2fe44b"),
   "n": NumberInt(0),
   "data": "Mongo Binary Data"
}

Adding Files to GridFS

> Go to bin folder

# Store an mp3 file
> mongofiles.exe -d gridfs put song.mp3

# Get all files
> db.fs.files.find()
{
   _id: ObjectId('534a811bf8b4aa4d33fdf94d'), 
   filename: "song.mp3", 
   chunkSize: 261120, 
   uploadDate: new Date(1397391643474), md5: "e4f53379c909f7bed2e9d631e15c1c41",
   length: 10401959 
}

# Find chunks : eg. If the query returned 40 documents meaning that the whole mp3 document was divided in 40 chunks of data.
> db.fs.chunks.find({files_id:ObjectId('534a811bf8b4aa4d33fdf94d')})

Capped Collections

Capped collections are fixed-size circular collections that follow the insertion order to

support high performance for create, read, and delete operations.
By circular, it means that when the fixed size allocated to the collection is exhausted,

it will start deleting the oldest document in the collection without providing any explicit commands.
Capped collections restrict updates to the documents if the update results in increased document size.
Since capped collections store documents in the order of the disk storage,

it ensures that the document size does not increase the size allocated on the disk.
Capped collections are best for storing log information, cache data, or any other high volume data.

Important points of capped collections

Cannot delete documents from a capped collection.
There are no default indexes present in a capped collection, not even on _id field.
While inserting a new document, MongoDB does not have to actually look for a place to accommodate new document on the disk.

It can blindly insert the new document at the tail of the collection.

This makes insert operations in capped collections very fast.
Similarly, while reading documents MongoDB returns the documents in the same order as present on disk.

This makes the read operation very fast.

Creating Capped Collection

# Add 'capped' option as 'true' and specifying 'the maximum size of collection in bytes'.
> db.createCollection("cappedLogCollection",{capped:true,size:10000})

# Use 'max' to limit the number of documents
> db.createCollection("cappedLogCollection",{capped:true,size:10000,max:1000})

# check whether a collection is capped or not
> db.cappedLogCollection.isCapped()

# Convert an existing collection to capped
# Following will convert existing collection posts to a capped collection.
> db.runCommand({"convertToCapped":"posts",size:10000})

Querying Capped Collection

By default, a find query on a capped collection will display results in insertion order.

# Retrieve the documents in reverse order
> db.cappedLogCollection.find().sort({$natural:-1})

Auto-Increment Sequence

MongoDB does not have out-of-the-box auto-increment functionality, like SQL databases.
By default, it uses the 12-byte ObjectId for the _id field as the primary key to uniquely identify the documents.
However, there may be scenarios where we may want the _id field to have some auto-incremented value other than the ObjectId.
Since this is not a default feature in MongoDB, we will programmatically achieve this functionality by using a counters collection as suggested by the MongoDB documentation.

Example of create document with increment sequence

Let do _id field to be an auto-incremented integer sequence that

starting from 1,2,3,4 upto n for following products document.

{
  "_id":1,
  "product_name": "Apple iPhone",
  "category": "mobiles"
}

Step for doing _id field to be an auto-incremented integer sequence
- Create collection : set keyIdName for _id(sequence that you want to set)
- Insert sequence document in the collection
- Creating Javascript Function for increment sequence
- Insert document by using increment sequence Javascript Function

# Create collection :  set 'productid' as its key
> db.counters.insert({
	"_id":"productid",
	"sequence_value": 0
})
WriteResult({ "nInserted" : 1 })

# Insert sequence document in the collection
> db.counters.insert({_id:"productid",sequence_value:0})


# Creating Javascript Function for increment sequence
> function getNextSequenceValue(sequenceName){
   var sequenceDocument = db.counters.findAndModify({
      query:{_id: sequenceName },
      update: {$inc:{sequence_value:1}},
      new:true
   });
   return sequenceDocument.sequence_value;
}

# Insert 2documents by using the Javascript Function
>db.products.insert({
   "_id":getNextSequenceValue("productid"),
   "product_name":"Apple iPhone",
   "category":"mobiles"
})
>db.products.insert({
   "_id":getNextSequenceValue("productid"),
   "product_name":"Samsung S3",
   "category":"mobiles"
})

# Result
> db.products.find()
{ "_id" : 1, "product_name" : "Apple iPhone", "category" : "mobiles"}
{ "_id" : 2, "product_name" : "Samsung S3", "category" : "mobiles" }

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up