Skip to main content

Geospatial Insights Unlocked: Excel Data Integration into Superset using FME.

Context and Case study

In today’s booming business intelligence landscape, Superset stands out as a versatile, web-based solution ideal for FME users.

Despite the prevalence of BI tools, Excel remains a go-to for its flexibility.

In this tutorial, we’ll show you how to seamlessly integrate Excel data into Superset, unlocking powerful geospatial insights with ease. Using a complex Excel file from Helsinki city as our example, we’ll focus on street winter care survey data, specifically regarding bicycles, to create an insightful dashboard.

Input Data

ANSWERS TO THE SURVEY REGARDING WINTER ROUTES MAINTENANCE OF THE CITY OF HELSINKI

HELSINKI BICYCLE LANES 2025

Main steps

  1. Data Preparation: Read and clean/adapt a complex Excel file
  2. Geospatial Analysis: Enrich the data using geospatial analysis techniques
  3. Superset Preparation: Prepare the data for integration into Superset
  4. Visualization: Publish the data as charts in Superset, utilizing Deck.gl charts for enhanced visualization.

Data Preparation: Playing with a complex Excel file

Extract from study Excel

Excel input

The Excel input comprises data collected from a survey where respondents marked locations they traversed by car, bicycle, or on foot, along with feedback on winter care conditions encountered. Winter care is categorized into three aspects: slippage (stones on ice), snow plowing (pushed to the side), and snow removal. It can considered be good or bad.

Evidently generated from a web map survey tool export, the Excel file encompasses:

  • Characteristics of surveyed individuals stored in one sheet.
  • Reported points categorized by type of winter care, quality and transportation mode (e.g., snow_removal_bad_bicycle), spread across multiple sheets.

This setup presents several common challenges, including special characters (?ä and spaces) in sheet and column names, geometries represented as text, points dispersed across different sheets rather than having multiple attributes, incomplete data for optional fields, and non-standard attribute values.

FME data preparation

FME is perfect for data prep because it:

  • Merges data from various sources
  • Checks attribute values for accuracy
  • Renames attributes easily
  • Filters data to focus on what’s needed
  • Transforms geometries as required

For this task, we employed:

  • AttributeCreator to generate attributes based on input sheet names
  • Connections for merging points across sheets
  • GeometryReplacer to interpret geometry fields accurately
State of data after preperation

Enrich the data using geospatial analysis

FME steps for the Spatial Merge

To enhance insights, we sought to associate survey points with corresponding pathways. Leveraging an open dataset from Helsinki City outlining planned bicycle routes for 2025, we executed the following steps:

  1. Read the shapefile containing pathway data.
  2. Determine the closest pathway to each survey point.
  3. Integrate point details with their respective pathways.

Executing this process with FME involves:

  • Utilizing a reader or Creator + FeatureReader to specify the data source.
  • Ensuring consistency in coordinate systems for both points and pathways using the Reprojector.
  • Assigning unique identifiers to pathways, if absent, facilitated by the UniqueIdentifierGenerator transformer.
  • Identifying the closest pathway to each point via the Neighbor Finder.
  • Merging point attributes back into the pathway dataset using the FeatureMerger transformer.
  1. Add Dataset and Charts: Import your dataset (table) and create charts. Choose from various Deck.gl chart options to suit your visualization needs.
Result after merger

Superset Preparation

Superset accepts various formats, but we find PostgreSQL most convenient due to its compatibility, ability to handle large volumes, and support for multiple users.

For map charts, Superset’s capabilities are somewhat restricted. Most of them use Deck.gl framework. There are three primary geometry representations:

https://deck.gl/docs/api-reference/core/layer

Geometries

There 3 kind of geometry representations, at the moment :

  • Latitude and longitude attributes for points.
  • A JSON-based format for lines, documented as “Paths”: [[X0,Y0][X1,Y1],…,[Xn,Yn]].
  • GeoJSON, sometimes presented in specific objects.
{"type":"Feature","properties":{"name":"West Oakland (WOAK)"},"geometry":{"type":"Point","coordinates":[-122.294582, 37.804675]}}

In FME it translates into:

  • Using CoordinateExtractor to extract latitude and longitude.
  • Employing CoordinateConcatenator and AttributeCreator to generate Path geometry.
  • Good luck with “Geojson”… we could not make it work, no valid example 🤷🤷

Colors

As for colors, basic map charts offer limited options for color maps and conditional values. To address this, there are three potential solutions:

  • Utilize JavaScript in Superset, though this deviates from the no-code approach.
  • Use FME to generate and store color values, then call them with a documented JavaScript snippet.
  • Just wait for a better update..

In FME, the steps to get the right color values in an attribute (let’s call it “color”) are the following:

  1. Using FeatureColorSetter to color data (options include random, rule-based, and ramps).
  2. Employing HTMLColorCalculator to extract colors in the correct format for the “color” attribute.

Example of FeatureColorSetter settings

End result in FME viewer

PostgreSQL

For practical reasons, we opted to store our data in one of our server databases using PostgreSQL. FME’s PostgreSQL writers facilitate table generation and data manipulation seamlessly, all without needing SQL expertise. While more complex tasks may require dynamic SQL code generation, such intricacies are beyond the scope of this article.

In our setup, we created two tables:

  1. snow_plowing_points: This table focuses on a specific aspect of maintenance analysis, namely snow plowing.
  2. snow_bicycle_ways: Here, we store data related to cycling routes along with their maintenance attributes.

Setting up these tables is a straightforward process. With just a few clicks, you can add the PostgreSQL writer, configure your connection details, and define the attributes for your tables. Thanks to FME’s automation capabilities, this setup typically takes around 2 minutes to complete.

Example of table definition in FME

Publish as charts in Superset with Deck.gl charts

Example of Dashboard

Publishing charts in Superset with Deck.gl enhances the visualization of our data. Below are the steps to create an interactive dashboard and explore our geospatial insights:

  • Install Superset

https://www.linkedin.com/pulse/installing-apache-superset-docker-widows-os-lasmart-europe-zgkqf/

On a windows environment, we advise you the docker-compose way, as shown in the above tutorial.

  • Add your mapbox token into the config.py file to be able to see the background map
  • If you feel like playing with your own colors with path data, put to True this in the same config file : “ENABLE_JAVASCRIPT_CONTROLS”: True
  • Add your database as a new dataset. For our setup, where PostgreSQL is hosted on our local machine, we specify its path using host.docker.internal in the SQLAlchemy URI.
  1. Add Dataset and Charts: Import your dataset (table) and create charts. Choose from various Deck.gl chart options to suit your visualization needs.
  • Dashboard Creation: Complete your dashboard by adding classic charts to complement the geospatial insights.
Dashboard Example

Enjoy the interactivity of the dashboard. For instance, clicking on elements of the pivot table chart automatically filters other charts relying on the same dataset.

Conclusion

In summary, integrating Excel data into Superset with FME streamlines the path to valuable insights, especially in geospatial analysis.

By following the steps outlined in this tutorial, businesses can efficiently transform complex data into actionable visualizations. From cleaning and enriching data to creating interactive dashboards, this approach offers a user-friendly solution for extracting meaningful information.

As we embrace these tools, we unlock the potential for smarter decision-making and improved operational efficiency.