Wenjie Wu's Portfolio

Introduction

WindNebula is a web data visualization application designed for an environmental problem provided by IEEE Visual Analytics Science and Technology (VAST) Challenge, 2017. VAST Challenge is an annual contest with the goal of advancing the field of visual analytics through competition. In the summer of 2017, I teamed up with two developers, a designer, a data scientist, as well as two supervisors to take two challenges in that competition. This article will introduce the challenge I was mainly working on. In that challenge, I was the lead developer and designer. To our surprise, we won two awards, Aesthetic Design Award and Sponsor Award at the IEEE VIS Conference 2017.

Click here to see the live website

Process Overview

The project lasted about two months. In the beginning, we spent a few weeks and several meetings to explore and analyze the dataset. At the same time, we came up with a lot of ideas for visualizing the data. Next, we created prototypes to see whether these ideas would work. We had weekly meetings to inspect these prototypes and brainstorm new ideas. After determining which visualization forms and features to use for the final submission, we started to apply visual design, and develop the final version.

The problem and project goal

The problem was that the number of a precious bird, Rose-Crested Blue Pipit, is decreasing rapidly in a large nature reserve. The nature reserve is located in a mid-size city, and there is a small industrial area with four light-manufacturing factories around the nature reserve. Below there is the map of the nature reserve. The industry area is located at the bottom part on the map, and there are also nine sensors around this area, monitoring the chemicals emitted by the factories. These chemicals could be harmful to the birds. The goal of this challenge was to figure out which factories may contribute to the decreasing of the birds. Besides, it also required us to measure the performance of the nine sensors and to find if there was any abnormal sensor.

Datasets

The challenge provided two datasets. The first one contains the chemical readings from the nine sensors. There are 4 different chemicals in total, and the readings were recorded every hour. The second dataset has the wind speed and wind direction data every 3 hours in this area. Both two datasets have 3 months’ data, and they have no common attribute except timestamp.

Iterations

In the beginning, we plotted the sensor data and wind data separately. Box plot was used for sensor data and circular heatmap was used for wind data. We found some patterns such as the fourth sensor behaved differently than other sensors, and there was little northeasterly wind in this area.

There were not many significant patterns we could find on the two graphs above. Therefore, we started to explore another way – contour map. In this plot, we applied the Gaussian diffusion equation and linear regression. The sensor data and the wind data were combined together to figure out which factory produce which chemical. To our disappointment, we failed to find a clear result, which may be because some sensor was not working properly, and with the inaccurate data, we couldn’t reach the right result in this way.

Also, we plotted the result of linear regression. The disordered results also proved that linear regression may not be a good solution to this problem and the data.

Next, we try to draw the sensor scatter plot side by side. The y-axis is the chemical reading, and the x-axis is time. The abnormal sensor was much easier to find compared to the box plot we used at the beginning. But the problem with this plot is that we cannot figure out the relation between factories and sensors on this plot, and it was not connected with the wind data.

So, we figured out another way to connect sensor data and wind data – brushing and linking. The gif picture below shows how it works. The left side is a wind plot using a polar coordinate system: the angle means the wind direction, and the distance from the pole means the wind speed. The right side is a 4 by 9 scatter plot matrix: the four rows indicate the four factories, and the nine columns indicate nine sensors. if we brushed the dots on one side, and the corresponding dots on the other side would be highlighted. In this plot, we found that the large readings of some scatter plots on the right side seems to have the same wind direction, which indicates that they may come from the same factory.

We were excited about the findings from this plot, but it’s still a little bit too obscure, especially for the other people who do not have a deep understanding of the data. Based on this plot, we came up with our final idea: plot polar scatterplot for each sensor using wind direction and chemical readings put them together based on their geographic position, and also draw the four factories in that area. From that plot (the image below), the chemical emission pattern is much more clear and intuitive. With the guidelines, we put on the factories (hover the factory to display), it was clear to see which factories emitted which chemicals (the different color indicate different chemicals).

We also designed a detail view to inspect the data of each sensor. With this view, the users would be able to discover more detailed information, such as in which month which factory emitted the large amount of one kind of chemical.

Video

Check this video to explore the whole visualization system

WindNebula

A Web Visual Analytics System for Environmental Data