Eat less sugar,
you're sweet enough already

cs_team_beta_hw2 - Automn 2018

Open Food Facts

Home
Data Exploration Data Cleaning Data Analysis Data Game


681'568

Products
in Open Food Facts

    

98'663

Brands
in Open Food Facts

    

6'589

Category
in Open Food Facts

    

1'109

Pieces of Sugars
in Open Food Facts CH


#Abstract

Many people consume too much sugar, often even without realizing it. Is is not uncommon for products to contain added or hidden sugar, products such as low-fat yogurt, BBQ Sauce, ketchup or fruit juice. To put into perspective, some sauces may contains more than 50 per cent of sugar. (*Ref1) Based on the "AHA Scientific Statement" (*Ref2), the major sources of added sugars can be found in categories such as regular soft drinks, sugars, candy, cakes, cookies, pies and fruit drinks, dairy desserts and milk products:

Food categoriesContribution to added sugar intake

(% of total added sugar consumed)

Regular soft drinks33.0
Sugars and candy16.1
Cakes, cookies, pies12.9
Fruit drinks

(fruitades and fruit punch)

   9.7
Dairy desserts and milk products

(ice cream sweetened yogurt, and sweetened milk)

   8.6
Other grains

(cinnamon toast and honey-nut waffles)

   5.8

It's easy to consume too much sugar and the drawbacks are many. Oveconsumption can lead to obesity, increase the risk of heart disease, diabetes (*Ref3) and even cancer.

Therefore, it's crucial to limit the consumption of foods with high amounts of added sugars. According to the American Heart Association (AHA) (*Ref4), the maximum amount of added sugars per day are limited to:

  • Men: 150 calories per day

    (37.5 grams or 9 teaspoons or ~6.30 pieces of sugar)

  • Women: 100 calories per day

    (25 grams or 6 teaspoons or ~4.20 pieces of sugar)

The US dietary guidelines advise people to limit their intake to less than 10% of their daily calorie intake. For a person eating 2,000 calories per day, this would equal 50 grams of sugar, or about 12.5 teaspoons or 8.4 pieces of sugar. (*Ref5)

“Keeping intake of free sugars to less than 10% of total energy intake reduces the risk of overweight, obesity and tooth decay,” says Dr Francesco Branca, Director of WHO’s Department of Nutrition for Health and Development.(*Ref6)

Based on the information above, we would like to explore the "Open Food Facts Datbase", apply the knowledge of data analysis technics and best practices acquired during attending Applied Data Analysis course at EPFL in Lausanne in Autumn 2018 and figure out meaninful/insightful relationship between different dimensions used to categorize the data or external metrics.

# Research questions

There are lots of different dimensions/metrics which could be used to analyse/slice the data: dates when the product was added and last modified, produced country, brand, categories (e.g. snack, dessert), country / store where the product was purchased, size and weight of the product, amount of fat, sugar, vitamins, chemicals, ingredients as text etc We will explore the dimensions/metrics above and relate some of those to each other or such generic metrics as country (producer or consumer) GDP, life expectancy (vitamins, salt, sugar, other chemicals), expore which country/store is the biggest produce/consumer of product by brand/ingredients etc. We will also check how some of the metrics evolve over time and perform some Data Cleaning and enrichments.


Main Research Questions

What is the most sugary product, and what is its nutrition score ?

Is there a relation between the amount of fat, sugar and salt in products ?

Does the amount of sugar in a product depend on the brand type (low-cost, high-cost, biological etc) ?

How many products contain more sugar than recommended by OMS, and how does that relate to the category type ?



We were originally very ambitous with our plans for what we would do with the data. However, during the exploratory phase of the project we encountered problems. Categories were lacking, values were outside of valid ranges, the naming system differed between contributors et cetera et cetera - in essence, the data was quite messy. This did indeed match well with what we had heard during lectures, that one of the most time consuming tasks for the Data Scientist was data cleaning, but it nevertheless came as a slight surprise. To that end we decided to spend more time at data exploration and data cleaning that we could have used for analysis, if the database had been more structured in the first place.

Given all the problems we encountered with the data we would recommend some caution when observing the results. After all, if the underlying data is flawed then so will the conclusions drawn from it.

# Sources

Realised in the context of the course of EPFL Applied Data Analysis (CS-401) teached by Robert West. Jupyter Notebook/Sources could be founded here.

Move next to Data Exploration