What to do with Spatial Datasets?

Hritik Mehta
4 min readMar 28, 2021

What are shapefiles, how to manipulate them, and most importantly how to visualize them??

It may happen that your Data Science project involves Spatial Datasets, it requires you to have a good knowledge of the same. I would like to present an example about one of my projects at the Center of Analytical Finance, Indian School of Business. It required me to extensively deal with spatial datasets, manipulate shapefiles, using shapefiles extract nightlight data, rainfall data using Google Earth Engine. and make visualization tools. I started working on this in December of 2020 and I was completely unaware of these terms at the beginning of my internship. I had to struggle a lot because I couldn’t find a proper source to learn all this at once. I wish to put all the information (best of my knowledge) here so that Dec2020-Me can save time and understand all the crucial things that will help him jump-start such a project.

This blog will take you through Spatial DataSets, Shapefiles, GeoDataframes (python), Google Earth Engine code, and DataWrapper.

  1. Spatial DataSpatial data, also known as geospatial data, is information about a physical object that can be represented by numerical values in a geographic coordinate system. Generally speaking, spatial data represents the location, size, and shape of an object on planet Earth such as a building, Pincode, post-office, district, country, etc.
  2. Shapefile It is a geospatial vector data format for geographic information system (GIS) software. The shapefile format can spatially describe vector features: points, lines, and polygons. It works as a collection of files, mandatory files are — “.shp”, “.shx”, “.dbf”. These 3 will be required while using MapShapper ( to manipulate and convert shapefiles), Google Earth Engine Code (to extract data from Google Earth using shapefiles), DataWrapper ( to visualize and create interactive maps, requires GeoJSON file created from all the mandatory files on MapShaper) and Python (if you want to do some EDA then “.shp” will work but advance visualization will require other files too). Softwares like ARC GIS are dedicated to handling shapefiles, I’ll try to discuss them in the future.
  3. GeoPandas Just like “Pandas”, GeoPandas makes our life easy on Python by allowing us to interact with spatial data frames. It allows us to edit and create new shapefiles. It’s just like handling data frames but the records of these data frames have some additional information like location and shape about the entity on the earth's surface. This additional information is present in the “geometry” column of the GeoDataFrame. We can find the area, the centroid of a polygon, create a polygon from given latitudes and longitudes (I’ll write a dedicated blog on this).
  4. MapShaperHonestly I also need to explore its full potential. For now, I turn to it for compressing shapefiles or reducing the sharpness of boundaries of polygons in shapefiles. Also, it’s a great tool to visualize your shapefiles and convert them into GeoJSON or other formats.
  5. DatawrapperThis is the best visualization tool, I have come across. From creating charts to interactive maps, it can do all.
    Why I was talking of compressing shapefiles and converting them to GeoJSON, you can understand it from the image below.
    If you are looking forward to creating a dashboard with many maps and charts, DataWrapper disappoints us a little there because of no in-built feature but you can use embed code generate for each map/ chart and use them for your dashboard webpage.
    You can visit https://www.youtube.com/watch?v=adUpZXL4Ja0 for DataWrapper Tutorial.
Dashboard of DataWrapper
  1. Google Earth Engine CodeYou can’t just rely on census data done once in 10 years or accept manipulated government records for your study. Economists are looking for new proxies with higher frequency for macroeconomic measures like GDP, population, growth in living standard, wildlife, etc. Google through google Earth Engine has done a great job in providing different spatial datasets for research purposes. We can extract Night lights, Rainfall, or forest-cover data on a daily to yearly basis using such datasets and our customized shapefiles. It requires us to have some knowledge of JavaScript for giving instruction to Google Earth Engine.
    You can check out this playlist https://www.youtube.com/watch?v=adUpZXL4Ja0 for Tutorials.

--

--

Hritik Mehta

Exploring Data Science and Finance as career options, keeping my desire to write and interact with others alive. Always eager to learn from peers.