681'568
Products |
98'663
Brands |
6'589
Category |
1'109
Pieces of Sugars |
Many people consume too much sugar, often even without realizing it. Is is not uncommon for products to contain added or hidden sugar, products such as low-fat yogurt, BBQ Sauce, ketchup or fruit juice. To put into perspective, some sauces may contains more than 50 per cent of sugar. (*Ref1) Based on the "AHA Scientific Statement" (*Ref2), the major sources of added sugars can be found in categories such as regular soft drinks, sugars, candy, cakes, cookies, pies and fruit drinks, dairy desserts and milk products:
Food categories | Contribution to added sugar intake
(% of total added sugar consumed) |
Regular soft drinks | 33.0 |
Sugars and candy | 16.1 |
Cakes, cookies, pies | 12.9 |
Fruit drinks (fruitades and fruit punch) | 9.7 |
Dairy desserts and milk products (ice cream sweetened yogurt, and sweetened milk) | 8.6 |
Other grains (cinnamon toast and honey-nut waffles) | 5.8 |
It's easy to consume too much sugar and the drawbacks are many. Oveconsumption can lead to obesity, increase the risk of heart disease, diabetes (*Ref3) and even cancer.
Therefore, it's crucial to limit the consumption of foods with high amounts of added sugars. According to the American Heart Association (AHA) (*Ref4), the maximum amount of added sugars per day are limited to:
(37.5 grams or 9 teaspoons or ~6.30 pieces of sugar)
(25 grams or 6 teaspoons or ~4.20 pieces of sugar)
“Keeping intake of free sugars to less than 10% of total energy intake reduces the risk of overweight, obesity and tooth decay,” says Dr Francesco Branca, Director of WHO’s Department of Nutrition for Health and Development.(*Ref6)
Based on the information above, we would like to explore the "Open Food Facts Datbase", apply the knowledge of data analysis technics and best practices acquired during attending Applied Data Analysis course at EPFL in Lausanne in Autumn 2018 and figure out meaninful/insightful relationship between different dimensions used to categorize the data or external metrics.
There are lots of different dimensions/metrics which could be used to analyse/slice the data: dates when the product was added and last modified, produced country, brand, categories (e.g. snack, dessert), country / store where the product was purchased, size and weight of the product, amount of fat, sugar, vitamins, chemicals, ingredients as text etc We will explore the dimensions/metrics above and relate some of those to each other or such generic metrics as country (producer or consumer) GDP, life expectancy (vitamins, salt, sugar, other chemicals), expore which country/store is the biggest produce/consumer of product by brand/ingredients etc. We will also check how some of the metrics evolve over time and perform some Data Cleaning and enrichments.
Main Research Questions
What is the most sugary product, and what is its nutrition score ?
Is there a relation between the amount of fat, sugar and salt in products ?
Does the amount of sugar in a product depend on the brand type (low-cost, high-cost, biological etc) ?
How many products contain more sugar than recommended by OMS, and how does that relate to the category type ?
We were originally very ambitous with our plans for what we would do with the data. However, during the exploratory phase of the project we encountered problems. Categories were lacking, values were outside of valid ranges, the naming system differed between contributors et cetera et cetera - in essence, the data was quite messy. This did indeed match well with what we had heard during lectures, that one of the most time consuming tasks for the Data Scientist was data cleaning, but it nevertheless came as a slight surprise. To that end we decided to spend more time at data exploration and data cleaning that we could have used for analysis, if the database had been more structured in the first place.
Given all the problems we encountered with the data we would recommend some caution when observing the results. After all, if the underlying data is flawed then so will the conclusions drawn from it.
Open Food Facts database contains the information on food products from around the world. For each item, the database stores its generic name, quantity, type of packaging, brand, category, manufacturing or processing locations, countries and stores where the product is sold, list of ingredients, ...
Realised in the context of the course of EPFL Applied Data Analysis (CS-401) teached by Robert West. Jupyter Notebook/Sources could be founded here.