Before diving into the Aggregation Pipeline, ensure you are familiar with CRUD operations, creating a cluster, connecting to MongoDB, and understanding documents in MongoDB.
An aggregation pipeline consists of one or more stages that process the documents.
Each stage performs an operation on the documents. A stage can filter, group, count, sum the document
first Stages is the source for the other stages. It means whatever the result of the first stage, that result will be the source of the other stages.
Setup
To get started, first create an account on ATLAS, then set up a cluster and connect it to your VSCode. For a better experience, install the "MongoDB for VSCode" extension.
Next, create a collection in your database and name it "users
"
Go to your collection and select "users
"
Insert some data into your collection by clicking on "insert document
" Make sure atleast 3 or 4 documents are present in your users collection. This is not mandatory but helps in understanding the concept.
Above the document, you'll see an option called "aggregation
" Click on that.
Now, all the documents will be displayed.
Just on right top of the document, You see the button of stage and text. By default it select Stage.
You can use stage option but for the better understanding i prefer Text.
After clicking on "Text," the window will split into two parts. On the left side, you'll write the code for the aggregation pipeline, and on the right side, the output will be displayed. We will write the code inside the array in the form of objects.
$match
$match
Stages:Filters documents based on a specified query predicate. Matched documents are passed to the next pipeline stage.
If you want to match a specific field in the documents, this is very helpful. Every stage must be inside an object.
[
{
$match: {
username: "Rim"
}
}
]
This $match will display all the document that having the same username.
The highlighted part is a stage. This is a $match stage.
$count
$count
: Passes a document to the next stage that contains a count of the number of documents input to the stage
[
{
$match: {
username: "Rim",
}
},
{
$count: "foundUser"
}
]
After finding the result for the specific username, the second stage counts how many documents are in the result. It then returns this count as a key-value pair, so we need to provide a key name. I used "foundUser" as the key name.
Similarly
$group
The $group
stage separates documents into groups according to a "group key". The output is one document for each unique group key.
A group key is often a field, or group of fields. The group key can also be the result of an expression. Use the _id
field in the $group
pipeline stage to set the group key. See below for usage examples.
In the $group
stage output, the _id
field is set to the group key for that document.
For example, suppose I need to get all the users who have a favorite fruit. In this case, every user has a different fruit, but some may have the same one. The $group stage will filter all the users and group them according to their unique fruits.
In this document, the total number of users is 55, but when grouped by their favorite fruit, it only shows the unique ones.
Let's explore this further. I want to know how many documents are grouped in MongoDB.
So, we will count that specific one using the $sum
operation.
Every operation starts with $
.
Basically, the Count is a field where we store the value of $sum
, with a default value of 1. For example, when it sees "Banana" in any document, it adds 1 to that field. Each time it sees "Banana" in the user field, it updates the count by 1.
Now i want see only top 5 document then i have to use another Stage.
$sort
Sorts
all input documents and returns them to the pipeline in sorted order.
Basically, the count field does not exist in the original database, but we need to sort based on the count field. As mentioned earlier, stage 1 is the source for the next stage. This means that after we count, we write our $sort
aggregation method right below that stage. For $sort
, the result of stage 1 is what we use.
For descending order, we use -1.
And for ascending order, we use 1.
Now i able to group them, count them, sort them. But what if i want to get only top 5 or top 3. In this case we have to use another Pipeline
$limit
Limits the number of documents passed to the next stage in the pipeline. In this case, if I want to get only the top 2, I need to pass that number to the limit. That's it.
$unwind
Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
{
$unwind:
{
path: <field path>,
}
}
in path you just pass the field name must be array to unwind and yes, Don’t forget to use $
to select a specific field.
Basically, use $unwind
when you encounter an array in the database. This is not the only method; you can also achieve the same result using the $addFields
operator.
$addFields
Adds new fields to documents. $addFields
outputs documents that contain all existing fields from the input documents and newly added fields.
For Example
[
{
$match: {
username: "Rim"
}
},
{
$addFields: {
numberOfValue: {
$size: {
$ifNull:["$tags", []]
}
}
}
}
]
In this example, I match the username "Rim" and then check how many values are in the tags field (which is an array).
First, $addFields
adds an extra field named numberOfValue to the document. Then, the $size
operator counts the values in the tags field.
Don't get confused by the $ifNull
field. The $ifNull
field checks if the size is zero; if so, it returns an empty array [], otherwise, it returns the size of that field. You can also achieve this without using the $ifNull
field. In some case if this tags field doesn’t exist then what ? so for that case i use $ifNull
Q) I want to see the username and age of users whose gender is male and whose favorite fruit is Banana.
In this case, User don’t want to see entire document he just want see the username and age. In this case we use another pipeline named $project
$project
Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
In simple terms, filter the document according to your needs. $project
returns only the fields you specify.
For example
[
{
$match: {
gender: "Male", favoriteFruit: "Banana"
}
},
{
$project: {
username:1,
age:1,
}
}
]
Firstly it filter the document according to your given query in $match
.Then only returns the username and age from the document. As you already know, the meaning of 1 = true, which means to include these fields when returning.
$lookup (aggregation)
This is the most important aggregation, basically all the aggregation are important but this one have the special space.
It Performs a left outer join to a collection in the same database to filter in documents from the "joined" collection for processing. The $lookup
stage adds a new array field to each input document. The new array field contains the matching documents from the "joined" collection. The $lookup
stage passes these reshaped documents to the next stage.
This is used to get data from another collection. Suppose you have a collection named books
and another named authors
, and you want to get details of the author based on the books. In this case, the $lookup
aggregation is helpful.
Syntex:-
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
But the authorDetails are in an array format. What if the frontend requires you not to use an array? What do you do then?
Since you're already familiar with $addField
, we will use it to add a new field and another aggregation to convert the array to an object. This can be done in a couple of ways. I'll show you both.
$first
This operation will get the first element of array. You just have to tell the Field name. Now the Array Converts to an Object.
Another way,
$arrayElemAt
Returns the element at the specified array index. In this you have to pass the Field name and Index number.
It also does the same job.
Hope you will understand the Aggregation pipeline, if don’t let me know which topic bother you.
That’s all
Happy Coding 😊😊