Content
- Relationships
- Database References
- Covered Queries
- Analyzing Queries
- Atomic Operations
- Advanced Indexing & Indexing Limitations
- Text Search
- Regular Expression
- ObjectId
- Map Reduce (= Aggregation)
- RockMongo
- GridFS
- Capped Collections
- Auto-Increment Sequence
参照サイト
Summary
Relationships
- Relationships in MongoDB represent how various documents are logically related to each other.
- Relationships can be modeled via
Embedded
andReferenced (Manual References)
approaches. - Such relationships can be either 1:1, 1:N, N:1 or N:N.
Let us consider the case of storing addresses for users. So, one user can have multiple addresses making this a 1:N relationship.
Each document
# user document −
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"name": "Tom Hanks",
"contact": "987654321",
"dob": "01-01-1991"
}
# address document −
{
"_id":ObjectId("52ffc4a5d85242602e000000"),
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
}
Modeling Embedded Relationships
> db.users.insert({
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address": [
{
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
},
{
"building": "170 A, Acropolis Apt",
"pincode": 456789,
"city": "Chicago",
"state": "Illinois"
}
]
}
})
> >db.users.findOne({"name":"Tom Benzamin"},{"address":1})
Modeling Referenced Relationships (Manual References)
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address_ids": [
ObjectId("52ffc4a5d85242602e000000"),
ObjectId("52ffc4a5d85242602e000001")
]
}
# Java
>var result = db.users.findOne({"name":"Tom Benzamin"},{"address_ids":1})
>var addresses = db.address.find({"_id":{"$in":result["address_ids"]}})
Database References
Manual References
- Store the referenced document's id inside other document
DBRefs
- A document contains references from different collections
3 Fields of DBRefs | Description |
---|---|
$ref |
This field specifies the collection of the referenced document |
$id |
This field specifies the _id field of the referenced document |
$db |
This is an optional field and contains the name of the database in which the referenced document lies |
- Example :
Storing different types of addresses (home, office, mailing, etc.) in
different collections (address_home, address_office, address_mailing, etc) - Following example shows address DBRef field here specifies that the referenced address document lies in address_home collection under tutorialspoint database and has an id of 534009e4d852427820000002.
# 'user' document having DBRef field address
{
"_id":ObjectId("53402597d852426020000002"),
"address": {
"$ref": "address_home",
"$id": ObjectId("534009e4d852427820000002"),
"$db": "tutorialspoint"
},
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin"
}
# in Java
>var user = db.users.findOne({"name":"Tom Benzamin"})
>var dbRef = user.address
>db[dbRef.$ref].findOne({"_id":(dbRef.$id)})
# will return following 'address' document present in address_home collection
{
"_id" : ObjectId("534009e4d852427820000002"),
"building" : "22 A, Indiana Apt",
"pincode" : 123456,
"city" : "Los Angeles",
"state" : "California"
}
Covered Queries
- A covered query is a query in which −
- All the fields in the query are part of an index :
MongoDB matches the query conditions and returns the result using the same index without actually looking inside the documents. Since indexes are present in RAM, fetching data from indexes is much faster as compared to fetching data by scanning documents.
- All the fields returned in the query are in the same index.
- All the fields in the query are part of an index :
- Remember that an index cannot cover a query if −
- Any of the indexed fields is an array
- Any of the indexed fields is a subdocument
Using Covered Queries
- Following covered query say that MongoDB would not go looking into database documents.
- Instead it would fetch the required data from indexed data which is very fast.
# Following document exist in the users collection −
{
"_id": ObjectId("53402597d852426020000003"),
"contact": "987654321",
"dob": "01-01-1991",
"gender": "M",
"name": "Tom Benzamin",
"user_name": "tombenzamin"
}
# At first, will create a compound index for the users collection on the fields gender and user_name using the following query −
>db.users.createIndex({gender:1,user_name:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
# Now, this index will cover the following query −
>db.users.find({gender:"M"},{user_name:1,_id:0})
{ "user_name" : "tombenzamin" }
- Since index does not include _id field, need to explicitly excluded it from result set of query, as MongoDB by default returns _id field in every query.
- So the following query would not have been covered inside the index created above −
>db.users.find({gender:"M"},{user_name:1})
{ "_id" : ObjectId("53402597d852426020000003"), "user_name" : "tombenzamin" }
Analyzing Queries
- Analyzing queries is a very important aspect of measuring how effective the database and indexing design is.
- We will learn about the frequently used
$explain
and$hint
queries.
Operator | Description |
---|---|
$explain | Provides information on the query, indexes used in a query and other statistics. It is very useful when analyzing how well your indexes are optimized. |
$hint | This operator forces the query optimizer to use the specified index to run a query. This is particularly useful when you want to test performance of a query with different indexes. |
- At first, will create a compound index for the users collection on the fields gender and user_name using the following query −
>db.users.createIndex({gender:1,user_name:1})
{
"numIndexesBefore" : 2,
"numIndexesAfter" : 2,
"note" : "all indexes already exist",
"ok" : 1
}
explain()
-
explain()
query returns the following analyzed result −
>db.users.find({gender:"M"},{user_name:1,_id:0}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb.users",
"indexFilterSet" : false,
"parsedQuery" : {
"gender" : {
"$eq" : "M"
}
},
"queryHash" : "B4037D3C",
"planCacheKey" : "DEAAE17C",
"winningPlan" : {
"stage" : "PROJECTION_COVERED",
"transformBy" : {
"user_name" : 1,
"_id" : 0
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"gender" : 1,
"user_name" : 1
},
"indexName" : "gender_1_user_name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"gender" : [ ],
"user_name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"gender" : [
"[\"M\", \"M\"]"
],
"user_name" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "Krishna",
"port" : 27017,
"version" : "4.2.1",
"gitVersion" : "edf6d45851c0b9ee15548f0f847df141764a317e"
},
"ok" : 1
}
Explanation of above result
- We will now look at the fields in this result set −
- The true value of
indexOnly
indicates that this query has used indexing. - The
cursor
field specifies the type of cursor used. BTreeCursor type indicates that
an index was used and also gives the name of the index used.
BasicCursor indicates that a full scan was made without using any indexes. -
n
indicates the number of documents matching returned. -
nscannedObjects
indicates the total number of documents scanned. -
nscanned
indicates the total number of documents or index entries scanned.
- The true value of
hint()
- For example, the following query specifies the index on fields gender and user_name to be used for this query −
>db.users.find({gender:"M"},{user_name:1,_id:0}).hint({gender:1,user_name:1})
{ "user_name" : "tombenzamin" }
# To analyze the above query using $explain −
>db.users.find({gender:"M"},{user_name:1,_id:0}).hint({gender:1,user_name:1}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb.users",
"indexFilterSet" : false,
"parsedQuery" : {
"gender" : {
"$eq" : "M"
}
},
"queryHash" : "B4037D3C",
"planCacheKey" : "DEAAE17C",
"winningPlan" : {
"stage" : "PROJECTION_COVERED",
"transformBy" : {
"user_name" : 1,
"_id" : 0
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"gender" : 1,
"user_name" : 1
},
"indexName" : "gender_1_user_name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"gender" : [ ],
"user_name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"gender" : [
"[\"M\", \"M\"]"
],
"user_name" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "Krishna",
"port" : 27017,
"version" : "4.2.1",
"gitVersion" : "edf6d45851c0b9ee15548f0f847df141764a317e"
},
"ok" : 1
}
Atomic Operations
- Do find and update for Existing Data of Following process will atomic.
- Find product_available is > 0.
- If yes,
- Decrease product_available
- add customer info to product_bought_by.
>db.products.findAndModify({
query:{_id:2,product_available:{$gt:0}},
update:{
$inc:{product_available:-1},
$push:{product_bought_by:{customer:"rob",date:"9-Jan-2014"}}
}
})
- Existing Data
>db.createCollection("products")
{ "ok" : 1 }
> db.productDetails.insert(
{
"_id":1,
"product_name": "Samsung S3",
"category": "mobiles",
"product_total": 5,
"product_available": 3,
"product_bought_by": [
{
"customer": "john",
"date": "7-Jan-2014"
},
{
"customer": "mark",
"date": "8-Jan-2014"
}
]
}
)
WriteResult({ "nInserted" : 1 })
>
Advanced Indexing & Indexing Limitations
Query Limitations
- Indexing can't be used in queries which use −
- Regular expressions or negation operators like $nin, $not, etc.
- Arithmetic operators like $mod, etc.
- $where clause
- Hence, it is always advisable to check the index usage for your queries.
Maximum Ranges
- A collection cannot have more than 64 indexes.
- The length of the index name cannot be longer than 125 characters.
- A compound index can have maximum 31 fields indexed.
Preparation
- Following document contains an address sub-document and a tags array.
db.users.insert(
{
"address": {
"city": "Los Angeles",
"state": "California",
"pincode": "123"
},
"tags": [
"music",
"cricket",
"blogs"
],
"name": "Tom Benzamin"
}
)
Indexing Array Fields
- Create an index on 'tags' array.
>db.users.createIndex({"tags":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
>
- Can search on the tags field of the collection like this.
> db.users.find({tags:"cricket"}).pretty()
{
"_id" : ObjectId("5dd7c927f1dd4583e7103fdf"),
"address" : {
"city" : "Los Angeles",
"state" : "California",
"pincode" : "123"
},
"tags" : [
"music",
"cricket",
"blogs"
],
"name" : "Tom Benzamin"
}
>
# To verify that proper indexing is used, use the following explain command −
# This command resulted in "cursor" : "BtreeCursor tags_1" which confirms that proper indexing is used.
>db.users.find({tags:"cricket"}).explain()
Indexing Sub-Document Fields
-
Remember that the query expression has to follow the order of the index specified.
-
Creating an index on all the three fields of the sub-document
>db.users.createIndex({"address.city":1,"address.state":1,"address.pincode":1})
{
"numIndexesBefore" : 4,
"numIndexesAfter" : 4,
"note" : "all indexes already exist",
"ok" : 1
}
>
- Can search for any of the sub-document fields utilizing this index.
> db.users.find({"address.city":"Los Angeles"}).pretty()
{
"_id" : ObjectId("5dd7c927f1dd4583e7103fdf"),
"address" : {
"city" : "Los Angeles",
"state" : "California",
"pincode" : "123"
},
"tags" : [
"music",
"cricket",
"blogs"
],
"name" : "Tom Benzamin"
}
# also support following search order
>db.users.find({"address.city":"Los Angeles","address.state":"California"}).pretty()
Text Search
- Support starting from version 2.4, supporting text indexes to search inside string content.
- Starting from version 2.6, Text Search was enabled.
# Existing document
> db.posts.insert({
"post_text": "enjoy the mongodb articles on tutorialspoint",
"tags": ["mongodb", "tutorialspoint"]
}
Title | Syntax | Example |
---|---|---|
Creating Text Index | db.COLLECTIONNAME.createIndex({FIELD_NAME:"text"}) | > db.posts.createIndex({post_text:"text"}) Example |
Using Text Index | db.COLLECTIONNAME.find({$text :{$search :"SEARCH_KEYWORD"}}).pretty() |
# Search text contains 'tutorialspoint' for post_text field > db.posts.find({ $text :{$search :"tutorialspoint"}}).pretty()Example |
Deleting Text Index | >db.COLLECTIONNAME.dropIndex("INDEX_NAME") | # Fristly, see index name >db.COLLECTIONNAME.getIndexes() # Drop index >db.posts.dropIndex("post_text_text") { "nIndexesWas" : 2, "ok" : 1 } Example |
Regular Expression
Optimizing Regular Expression Queries
- If the document fields are indexed, the query will use make use of indexed values to match the regular expression.
This makes the search very fast as compared to the regular expression scanning the whole collection. - If the regular expression is a prefix expression, all the matches are meant to start with a certain string characters.
For e.g., if the regex expression is^tut
, then the query has to search for only those strings that begin with tut.
Title | Syntax | Example |
---|---|---|
Using regex Expression | db.COLLECTIONNAME.find( {FIELD_NAME:{$regex:"SEARCH_KEYWORD"}}) .pretty() |
> db.posts.find({post_text:{$regex:"tutorialspoint"}}).pretty() or > db.posts.find({post_text:/tutorialspoint/})> # Also can search for array field > db.posts.find({tags:{$regex:"tutorial"}}) Example |
Using regex Expression with Case Insensitive Use $options parameter withvalue $i
|
db.COLLECTIONNAME.find( {FIELD_NAME:{$regex:"SEARCH_KEYWORD", $options:"$i"}}) |
> db.posts.find({post_text:{$regex:"tutorialspoint", $options:"$i"}}) Example |
ObjectId
Creating New ObjectId
# Generate a new ObjectId
>newObjectId = ObjectId()
or
# Instead of MongoDB generating the ObjectId, you can also provide a 12-byte id −
>myObjectId = ObjectId("5349b4ddd2781d08c09890f4")
# will returned the following uniquely generated id −
ObjectId("5349b4ddd2781d08c09890f3")
# Converting ObjectId to String
>newObjectId.str
The above code will return the string format of the Guid −
# will return ObjectId as string
5349b4ddd2781d08c09890f3
Creating Timestamp of a Document
# Get the creation time of a document using getTimestamp method −
>ObjectId("5349b4ddd2781d08c09890f4").getTimestamp()
# will return the creation time of this document in ISO date format −
ISODate("2014-04-12T21:49:17Z")
Map Reduce (= Aggregation)
- Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results.
- Syntax
>db.collection.mapReduce(
function() {emit(key,value);}, //map function
function(key,values) {return reduceFunction}, { //reduce function
out: collection,
query: document,
sort: document,
limit: number
}
)
keyword | Description |
---|---|
map | a javascript function that maps a value with a key and emits a key-value pair |
reduce | a javascript function that reduces or groups all the documents having the same key |
out | specifies the location of the map-reduce query result |
query | specifies the optional selection criteria for selecting documents |
sort | specifies the optional sort criteria |
limit | specifies the optional maximum number of documents to be returned |
- Example
# existing 'post' document :
{
"post_text": "tutorialspoint is an awesome website for tutorials",
"user_name": "mark",
"status":"active"
}
# From posts collection, retrieve all the active posts,
# group them on the basis of user_name and then count the number of posts by each user.
>db.posts.mapReduce(
function() { emit(this.user_id,1); },
function(key, values) {return Array.sum(values)}, {
query:{status:"active"},
out:"post_total"
}
)
- Result shows that a total of 4 documents matched the query (status:"active"),
the map function emitted 4 documents with key-value pairs and finally
the reduce function grouped mapped documents having the same keys into 2.
{
"result" : "post_total",
"timeMillis" : 9,
"counts" : {
"input" : 4,
"emit" : 4,
"reduce" : 2,
"output" : 2
},
"ok" : 1,
}
- To see the result of this mapReduce query
>db.posts.mapReduce(
function() { emit(this.user_id,1); },
function(key, values) {return Array.sum(values)}, {
query:{status:"active"},
out:"post_total"
}
).find()
{ "_id" : "tom", "value" : 2 }
{ "_id" : "mark", "value" : 2 }
RockMongo
- RockMongo is a MongoDB administration tool using which you can manage your server, databases, collections, documents, indexes, and a lot more.
- It provides a very user-friendly way for reading, writing, and creating documents.
- It is similar to PHPMyAdmin tool for PHP and MySQL.
- Refer details in tutorial site.
GridFS
-
GridFS is the MongoDB specification for storing and retrieving large files such as images, audio files, video files, etc.
-
It is kind of a file system to store files but its data is stored within MongoDB collections.
-
GridFS has the capability to store files even greater than its document size limit of 16MB.
-
GridFS by default uses two collections
fs.files : serves as a parent document
andfs.chunks : 'files_id' field links the chunk to its parent
to store the file's metadata and the chunks -
Each chunk is identified by its unique _id ObjectId field.
# Sample of fs.files collection : The document specifies the file name, chunk size, uploaded date, and length
{
"filename": "test.txt",
"chunkSize": NumberInt(261120),
"uploadDate": ISODate("2014-04-13T11:32:33.557Z"),
"md5": "7b762939321e146569b07f72c62cca4f",
"length": NumberInt(646)
}
# Sample of fs.chunks document
{
"files_id": ObjectId("534a75d19f54bfec8a2fe44b"),
"n": NumberInt(0),
"data": "Mongo Binary Data"
}
Adding Files to GridFS
> Go to bin folder
# Store an mp3 file
> mongofiles.exe -d gridfs put song.mp3
# Get all files
> db.fs.files.find()
{
_id: ObjectId('534a811bf8b4aa4d33fdf94d'),
filename: "song.mp3",
chunkSize: 261120,
uploadDate: new Date(1397391643474), md5: "e4f53379c909f7bed2e9d631e15c1c41",
length: 10401959
}
# Find chunks : eg. If the query returned 40 documents meaning that the whole mp3 document was divided in 40 chunks of data.
> db.fs.chunks.find({files_id:ObjectId('534a811bf8b4aa4d33fdf94d')})
Capped Collections
- Capped collections are fixed-size circular collections that follow the insertion order to
support high performance for create, read, and delete operations. - By circular, it means that when the fixed size allocated to the collection is exhausted,
it will start deleting the oldest document in the collection without providing any explicit commands. - Capped collections restrict updates to the documents if the update results in increased document size.
- Since capped collections store documents in the order of the disk storage,
it ensures that the document size does not increase the size allocated on the disk. - Capped collections are best for storing log information, cache data, or any other high volume data.
Important points of capped collections
- Cannot delete documents from a capped collection.
- There are no default indexes present in a capped collection, not even on _id field.
- While inserting a new document, MongoDB does not have to actually look for a place to accommodate new document on the disk.
It can blindly insert the new document at the tail of the collection.
This makes insert operations in capped collections very fast. - Similarly, while reading documents MongoDB returns the documents in the same order as present on disk.
This makes the read operation very fast.
Creating Capped Collection
# Add 'capped' option as 'true' and specifying 'the maximum size of collection in bytes'.
> db.createCollection("cappedLogCollection",{capped:true,size:10000})
# Use 'max' to limit the number of documents
> db.createCollection("cappedLogCollection",{capped:true,size:10000,max:1000})
# check whether a collection is capped or not
> db.cappedLogCollection.isCapped()
# Convert an existing collection to capped
# Following will convert existing collection posts to a capped collection.
> db.runCommand({"convertToCapped":"posts",size:10000})
Querying Capped Collection
- By default, a find query on a capped collection will display results in insertion order.
# Retrieve the documents in reverse order
> db.cappedLogCollection.find().sort({$natural:-1})
Auto-Increment Sequence
- MongoDB does not have out-of-the-box auto-increment functionality, like SQL databases.
- By default, it uses the 12-byte ObjectId for the _id field as the primary key to uniquely identify the documents.
- However, there may be scenarios where we may want the _id field to have some auto-incremented value other than the ObjectId.
- Since this is not a default feature in MongoDB, we will programmatically achieve this functionality by using a
counters
collection as suggested by the MongoDB documentation.
Example of create document with increment sequence
- Let do
_id
field to be anauto-incremented integer sequence
that
starting from 1,2,3,4 upto n for following products document.
{
"_id":1,
"product_name": "Apple iPhone",
"category": "mobiles"
}
- Step for doing
_id
field to be anauto-incremented integer sequence
- Create collection : set keyIdName for _id(sequence that you want to set)
- Insert sequence document in the collection
- Creating Javascript Function for increment sequence
- Insert document by using increment sequence Javascript Function
# Create collection : set 'productid' as its key
> db.counters.insert({
"_id":"productid",
"sequence_value": 0
})
WriteResult({ "nInserted" : 1 })
# Insert sequence document in the collection
> db.counters.insert({_id:"productid",sequence_value:0})
# Creating Javascript Function for increment sequence
> function getNextSequenceValue(sequenceName){
var sequenceDocument = db.counters.findAndModify({
query:{_id: sequenceName },
update: {$inc:{sequence_value:1}},
new:true
});
return sequenceDocument.sequence_value;
}
# Insert 2documents by using the Javascript Function
>db.products.insert({
"_id":getNextSequenceValue("productid"),
"product_name":"Apple iPhone",
"category":"mobiles"
})
>db.products.insert({
"_id":getNextSequenceValue("productid"),
"product_name":"Samsung S3",
"category":"mobiles"
})
# Result
> db.products.find()
{ "_id" : 1, "product_name" : "Apple iPhone", "category" : "mobiles"}
{ "_id" : 2, "product_name" : "Samsung S3", "category" : "mobiles" }