18th -19th April 2020
Today’s task is to predict the number of new deaths and cases from COVID-19 for all countries in the world. The data used for this task is from ECDC. You can use all data that we will provide as well as any public data. Thus, we expect that you will be able to find interesting dependencies in the data that will lead to a good predictive model.
You can use all available historical data and should predict the number of new cases and deaths for the next 2 weeks starting from this Sunday.
As a metric for the leaderboard, we will use weighted MSLE (mean squared logarithmic error) for both time-series (cases and deaths) calculated in the following way:
MSLE(new_cases)/mean(log(new_cases)) + MSLE(deaths)/mean(log(deaths))
In our materials, you can find a notebook presenting how to download the dataset from ECDC and add some countries indicators from the World Bank to this data. Also, we prepared simple ARIMA and LSTM pipelines that you can use as a starter.
Sample submission .csv file can be found on our github.
1. Accuracy of the model (60%)
How accurate will the model be for the next 14 days?
2. Originality of the solution (30%)
How original is the solution? Does it have more features than provided by us? Does it include ideas published in recent papers on COVID-19?
3. Presentation (10%)
How good was the presentation delivery? Was it easy to follow? Was it engaging?
We will make a Leaderboard with all Teams’ scores based on their models’ prediction accuracy. Then we will pick 10% of the best Teams and will evaluate their code.