I started off this journey knowing that I wanted to work with flight data of some sort. The idea came to me during the Thanksgiving holiday when I heard stories from my friends about how their flights got delayed. I related to their experience — every time I flew out of Pudong International Airport, my flight would have an minimum of an hour’s delay.
I found some inspiration from other projects such as one that analyzed flight patterns of all airlines out of the O’Hare Airport over the course of 1 year. I realized for a project such as mine, that was too large of a scale, too many data points that I didn’t know how to scrape. What I instead decided to do was focus on departing flights from Pudong International Airport within China. I initially decided to narrow it down by focusing on all the flights from one airline, but the data was still too broad — broad in the sense that I wasn’t sure what I wanted to represent with that data. I juggled around with the idea that I could focus on outbound flights from Shanghai to Beijing (all airlines, both airports so that I could compare the average times between 2 airports), and then I could analyzed average delays by the hour.
It at first was difficult trying to find flight data that went beyond that past week’s data. I actually emailed out to flightstats to try to get this data somehow. I was finally able to make an account to then look up the data I needed.
My greatest struggle was to find a way to represent my data. There were so many ideas for the type of data that I could work with and represent, but what it always came back down to that I was stuck with was how to visualize my data. I’ve looked up numerous blogs, and I’ve read that how to begin the process of visualizing: to first know what kind of data I wanted to represent. I had that box checked off, but I just didn’t know what I wanted to show and how to go from there. I tried the idea of using a color scheme with a time table, it definitely would have represented the direct data, but I didn’t have much to show, it wasn’t something interesting, original. I then went back to the drawing table and collected more data. I expanded to flights out of Pudong, but rather than just to Beijing, it was to the top 5 busiest airports in China, which includes Shenzhen, Guangzhou, Chengdu and Hong Kong.
I tried many ideas. Such as programming something on Processing that would be more interactive. I wanted the users to be able to click to bring up a certain week’s data of average delays from Sunday to Saturday, and compare on a week to week basis, but I struggled greatly with just trying to lay the grid down and be able to summon the correct row with color coding using the key press feature. I tried but I was not successful.
I also tried another mapping idea with twine to represent the number of flights out, and delayed flights to Beijing, but the visualization looked too messy. Another was to use push pins on a map, the more twine that was wrapped around the airport, the greater the number of delays, but the hardest point is with representing a great number of delays, I wasn’t able to represent the course of time, and represent it in a way that viewers would want to or need the information.
In the end, I went with a poster like idea that you’d find on a bulletin board. It was supposed to give the statistics for flights out of Pudong to other four airports, noting flights that were on time, delayed and cancelled. The feedback I received was that it wasn’t clear and clean enough. To represent that sort of data, precision and accuracy is key. I should have used a program to compute the pie charts — it also should not have been hand-drawn. There were still questions left open about the data I chose to represent.
I still have ways to go, and more to learn, but I’m not giving up just yet.