I have an Item collection which could hold thousands to hundreds of thousands of documents. On that collection, I want to perform Geospatial queries. Using Mongoose, there are two options - find() and the Aggregation Pipeline. I have displayed my implementations of both below:
Mongoose Model
To start, here are the relevant properties of my Mongoose Model:
// Define the schema
const itemSchema = new mongoose.Schema({
// Firebase UID (in addition to the Mongo ObjectID)
owner: {
type: String,
required: true,
ref: 'User'
},
// ... Some more fields
numberOfViews: {
type: Number,
required: true,
default: 0
},
numberOfLikes: {
type: Number,
required: true,
default: 0
},
location: {
type: {
type: 'String',
default: 'Point',
required: true
},
coordinates: {
type: [Number],
required: true,
},
}
}, {
timestamps: true
});
// 2dsphere index
itemSchema.index({ "location": "2dsphere" });
// Create the model
const Item = mongoose.model('Item', itemSchema);
Find Query
// These variables are populated based on URL Query Parameters.
const match = {};
const sort = {};
// Query to make.
const query = {
location: {
$near: {
$maxDistance: parseInt(req.query.maxDistance),
$geometry: {
type: 'Point',
coordinates: [parseInt(req.query.lng), parseInt(req.query.lat)]
}
}
},
...match
};
// Pagination and Sorting
const options = {
limit: parseInt(req.query.limit),
skip: parseInt(req.query.skip),
sort
};
const items = await Item.find(query, undefined, options).lean().exec();
res.send(items);
Aggregation Pipeline
Suppose distance needed to be calculated:
// These variables are populated based on URL Query Parameters.
const query = {};
const sort = {};
const geoSpatialQuery = {
$geoNear: {
near: {
type: 'Point',
coordinates: [parseInt(req.query.lng), parseInt(req.query.lat)]
},
distanceField: "distance",
maxDistance: parseInt(req.query.maxDistance),
query,
spherical: true
}
};
const items = await Item.aggregate([
geoSpatialQuery,
{ $limit: parseInt(req.query.limit) },
{ $skip: parseInt(req.query.skip) },
{ $sort: { distance: -1, ...sort } }
]).exec();
res.send(items);
Edit - Example Documented Amended
Here is an example of a document with all of its properties from the Item collection:
{
"_id":"5cd08927c19d1dd118d39a2b",
"imagePaths":{
"standard":{
"images":[
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-aafe69c7-f93e-411e-b75d-319042068921-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-397c95c6-fb10-4005-b511-692f991341fb-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-e54db72e-7613-433d-8d9b-8d2347440204-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-c767f54f-7d1e-4737-b0e7-c02ee5d8f1cf-standard.jpg"
],
"profile":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-standard-profile.jpg"
},
"thumbnail":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-thumbnail.jpg",
"medium":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-medium.jpg"
},
"location":{
"type":"Point",
"coordinates":[
-110.8571443,
35.4586858
]
},
"numberOfViews":0,
"numberOfLikes":0,
"monetarySellingAmount":9000,
"exchangeCategories":[
"Math"
],
"itemCategories":[
"Sports"
],
"title":"My title",
"itemDescription":"A description",
"exchangeRadius":10,
"owner":"zbYmcwsGhcU3LwROLWa4eC0RRgG3",
"reports":[],
"createdAt":"2019-05-06T19:21:13.217Z",
"updatedAt":"2019-05-06T19:21:13.217Z",
"__v":0
}
Questions
Based on the above, I wanted to ask a few questions.
Is there a performance difference between my implementations of the normal Mongoose Query and the use of the Aggregation Pipeline?
Is it correct to say that
nearandgeoNearare pretty much similar tonearSpherewhen using the2dsphereindex with GeoJSON - except thatgeoNearprovides extra data and default limiting? That is, although having different units, both queries - conceptually - would show relevant data within a specific radius from some location, despite the fact the field is calledradiusfornearSphereandmaxDistancewithnear/geoNear.With my example above, how might the performance loss of using
skipbe mitigated but still be able to achieve pagination in both querying and aggregation?The
find()function allows an optional parameter to determine which fields will be returned. The Aggregation Pipeline takes a$projectstage to do the same. Is there a specific order where$projectshould be used in the pipeline to optimize speed/efficiency, or does it not matter?
I hope this style of question is permitted as per the Stack Overflow rules. Thank you.