1. Introduction:
The purpose of this Project is to help people in exploring better facilities around their neighborhood. It will help people making smart and efficient decision on selecting great neighborhood out of numbers of other neighborhoods in Kolkata,West Bengal.
Lots of people come to Kolkata for job or educational purposeses and needed lots of research for good housing prices and reputated schools for their children. This project is for those people who are looking for better neighborhoods. For ease of accessing to Cafe, School, Super market, medical shops, grocery shops, mall, theatre, hospital, like minded people, etc.
It will help people to get awareness of the area and neighborhood before moving to a new city, state, country or place for their work or to start a new fresh life.
2. Data Section
Data Link: https://en.wikipedia.org/wiki/Category:Municipal_wards_of_Kolkata
Will use these links which are listed on wikipedia for each wards. Dataset consisting of latitude and longitude, zip codes.
Foursquare API Data:
We will need data about different venues in different neighborhoods of that specific borough. In order to gain that information we will use “Foursquare” locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.
After finding the list of neighborhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighborhood. For each neighborhood, we have chosen the radius to be 2000 meter.
The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:
1. Neighborhood 2. Neighborhood Latitude 3. Neighborhood Longitude 4. Venue 5. Name of the venue e.g. the name of a store or restaurant 6. Venue Latitude 7. Venue Longitude 8. Venue Category
Map of wards in Kolkata
3. Methodology Section
Clustering Approach:
To compare the similarities of two Postal code neighbourhood, we decided to segment them, and group them into clusters to find similar neighborhoods in a big city like Kolkata. To be able to do that, we need to cluster data which is a form of unsupervised machine learning: k-means clustering algorithm
Using K-Means Clustering Approach
Most Common venues near Neighborhood
Work Flow:
Using credentials of Foursquare API features of near-by places of the neighborhoods would be mined. Due to http request limitations the number of places per neighborhood parameter would reasonably be set to 100 and the radius parameter would be set to 2000.
4. Results Section
Map of Clusters in Kolkata
The Location:
Kolkata is a popular destination for new immigrants in West Bengal to reside. Being home to various religious groups and places of worship. Although immigration has become a hot topic over the past few years with more governments seeking more restrictions on immigrants and refugees, the general trend of immigration into Kolkata has been one of on the rise.
Foursquare API:
This project have used Four-square API as its prime data gathering source as it has a database of millions of places, especially their places API which provides the ability to perform location search, location sharing and details about a business.
5. Discussion Section
Problem Which Tried to Solve:
The major purpose of this project, is to suggest a better neighborhood/place in a new city for the person who are shiffting there. Social presence in society in terms of like minded people. Connectivity to the airport, bus stand, city center, markets and other daily needs things nearby.Sorted list of house in terms of housing prices in a ascending or descending order Sorted list of schools in terms of location, fees, rating and reviews<h2>6. Conclusion Section</h2>In this project, using k-means cluster algorithm I separated the neighborhood into 10(Ten) different clusters and for 103 different lattitude and logitude from dataset, which have very-similar neighborhoods around them. Using the charts above results presented to a particular neighborhood based on average house prices and school rating have been made.
I feel rewarded with the efforts and believe this course with all the topics covered is well worthy of appreciation. This project has shown me a practical application to resolve a real situation that has impacting personal and financial impact using Data Science tools. The mapping with Folium is a very powerful technique to consolidate information and make the analysis and decision better with confidence.
Future Works:
This project can be continued for making it more precise in terms to find best house in Kolkata. Best means on the basis of all required things(daily needs or things we need to live a better life around and also in terms of cost effective.But there is not enough data about schools and hospitals in foursquare api for the location Kolkata. so thinkking to use google maps api for getting the best results.
Libraries Which are Used to Develope the Project:
Pandas: For creating and manipulating dataframes.
Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
Scikit Learn: For importing k-means clustering.
JSON: Library to handle JSON files.
XML: To separate data from presentation and XML stores data in plain text format.
Geocoder: To retrieve Location Data.
Beautiful Soup and Requests: To scrap and library to handle http requests.
Matplotlib: Python Plotting Module.