Alexandra Gore

How AI relates to cherry blossom season

Sakura (Cherry Blossoms)

The following article: Humans, AI compete over cherry blossom dates, inspired me to dig into sakura data.

Introduction

Flower viewing and how I enjoy Sakura

Flower viewing is a popular pastime in Japan - special trips are often planned to visit prime locations (like city parks or countryside farms) when flowers are in full bloom. Families and coworkers sometimes pack a picnic and spend an afternoon celebrating in the park - perhaps a junior office worker may be tasked with arriving early to stake out a prime picnic spot!

Sakura is the first to bloom in the spring, and is followed by flowers such as lilac and lavender which bloom later in the summer or even early fall.

Living in Japan, sakura season feels like a celebration to me. The blossoms mark a visual transaction out of a long, snowy winter. I also enjoy trying limited-edition, sakura-flavored treats and drinks at restaurants and convenience stores only offered during the sakura blossoming season.

Enough of that, what about the data?

Getting the timing right

If planning a special outing to view the blossoms, arriving a day or two off in either direction can make the difference between mediocre pictures and truly outstanding ones. Furthermore, if you're a foreign tourist planning a trip to Japan, it would be ideal to have an estimate well in advance of the blooming dates so you can best plan your trip dates and itinerary!

Data sources

The main data page from which blossoming data are collected is: https://www.data.jma.go.jp/sakura/data/index.html.

The Github project I created and will reference below is linked here.

The data contained in thedata/raw folder in the Github project were generated by copying and pasting pages from the data file which can be found by searching the main data page for さくら開花.

The data file, さくら開花, (translation: Sakura Flowering), contains yearly information on the first flowering dates of sakura for different locations in Japan.

If not interested in replicating the data cleaning process by running the sakura_flowering_processing.ipynb notebook from the Github project, you may simply use data/flowering.csv which contains tidy flowering data. Each row contains the following columns:

  • day: month * 100 + day, first flowering date
  • year: %Y, year of collection
  • l_code: categorical variable [401-945], point of collection
  • l_name: categorical variable, kanji for point of collection
  • rm: categorical variable [6,7,8], represents different data collection methods (N.B.: this is my understanding after attempting to translate the Japanese explanation test)

NA values for day and rm are marked by -.

Project Ideas

On the main data page, there are similar data files listed for other varieties of flowers. The explanation given above of the data cleaning process and the included data cleaning script may be useful for studying other flower blooming patterns.

Add weather information and create a machine learning model to predict sakura blossoming dates for 2019!

Visualize historical in interesting ways to identify patterns and trends.