First week a Datern: Looking at CO₂ data

AY Berhe
4 min readJun 19, 2021

By Adonay Berhe , Chun Him Yeung, Peter Castellucci

The three of us are part of the summer cohort at Datern’s internship programme, which provides opportunities to fast-growth companies and students who are all about making a difference with data”.

As part of our first week of training at Datern, the cohort split into groups of three. These groups would have a day to source and analyse a dataset. This analysis would form a presentation to our Datern cohort.

We would showcase our use of data visualization (Power BI) and statistics (coding in Python). More interestingly, we would need to work effectively in groups and improve our data communication for a general audience.

These skills — technical and non-technical — will help us in our later internships.

Our chosen research question was “What countries contribute most to global CO2 Emissions and further analysis?”

After some difficulty and trial-and-error, we decided on two datasets.

‘Global CO2 Emissions’ by ‘worldometers.info’ (available on Kaggle), this covered CO2 Emission per country (tons, 2016), and ‘Data on CO2 and Greenhouse Gas Emissions’ by ‘Our World in Data’ (available on Github).

Using Power BI, we were able to get some informative graphs on the breakdown of global CO2 emission by continent and look at the startling disparity in China’s CO2 emissions from Coal, making up almost 70% of Asia’s emissions!

Figure 1: (Power BI, Stacked Bar Chart) CO2 Emissions per continent, based on 2016 data,
Figure 2: (Power BI, Bar Chart) Coal CO2 Emissions in Asia, based on 2016 data
Figure 3: (Power BI, Line Chart) CO2 Emissions per year (Asia and Europe)

Looking at the visualizations inspired us to use Python and fit ARIMA (Autoregressive Integrated Moving Average) models: this would allow us to unpack projections on CO2 emissions for particular countries.

Figure 4: (Python, ARIMA) Model fit for China’s CO2 Emissions, based on OWID dataset
Figure 5: (Python, ARIMA) Model Fit for UK’s CO2 Emissions based on OWID dataset

Subsequently, in Python, statistical analysis was done from data on CO2 Emissions in 2016 separated by country.

The group’s aims were to analyse some explanatory variables — GDP, life expectancy — to link the economic development to its net emission of global CO2.

  • The correlation between a country’s CO2 and life expectancy, based on the 2016 data, was 0.28
  • The regression coefficient between a country’s CO2 vs (regressed on) both its life expectancy and GDP is 0.65
  • The regression between a country’s consumption-based* CO2 vs (regressed on) both its life expectancy and GDP is 0.87

*N.B. Consumption-based CO2 emissions take into account the effects of trade, encompassing the emissions from final domestic consumption and those caused by the production of its imports.

We offer the following hypotheses based on these results.

  • Life expectancy and GDP are limited predictors of a country’s CO2 but stronger concerning consumption-based CO2
  • As countries develop, they may increase their CO2 contribution
  • Our Time-Series-Analysis shows that China’s CO2 levels per capita will continue to increase exponentially and may double the amount in 2100
  • The same research suggests that the UK’s contribution may decrease in this period

We acknowledge the following problems in our methodology and limitations which we may improve in future work.

  • Time-Series-Analysis depends on the data’s history and not present data; it makes assumptions that underlying factors (population growth; industrialization; environmental policy) would continue
  • In our analysis, we had to leave out data with missing values; this excluded African countries from our research in looking at global trends
  • Future work could also look at other pollutants, for instance methane, and see if they follow similar trends

After presenting this data to our cohort, our fellow Datern colleagues received our work well as far as the technical analysis was concerned.

Provided feedback, we learnt to scope our problem better and narrow our focus on the target audience. For instance, are we looking to inform policymakers in China or speak to the public in the UK?

We are at the start of our journey as ‘Data Scientists’ and our first week with Datern. Such projects are invaluable to have greater commercial awareness and problem structuring.

More practice and group projects help bridge the transition from University graduates into capable Data Science interns!

--

--