Countries Unicorn Olympics – Bar Chart Race

Hello!

I love visualization stuff. That makes things less complicated and easy to understand.🖼️

Yeah, from childhood, we learn by modeling, playing with toys, coloring, and drawing. Visualization helps us understand phenomena and numbers better.👶🏻

Today I’m happy to share that I’ve created my first bar chart race visualization. I was planning to prepare it for a looong time. This type of dynamic graph, often seen across various platforms, has always intrigued me. They are fancy, huh, aren’t they?🤩

Link to Unicorn race

I thought a lot about what it should be, and after a short discussion with myself found it: Unicorn🦄 startups by country. Yes, my passion startups! As we’ve discussed before, a ‘Unicorn’ refers to a startup company valued at over 1 billion dollars.

In this blog post, you will get answer to following to following questions:

Insights about countries🔢
How to prepare a bar chart race?📊
Data cleansing and standardization🧹

Yeah, let’s go!🚀

Congratulations! As this blog follows the brevity principle(quality of being brief), you are at the second level!😊

Insights about Countries🔢

The performance of startups across various countries exhibits dynamic changes over time. Our data covers 1218 unicorns spans from 2012 to 2023, segmented by quarters, providing a comprehensive view of these fluctuations taken from CBinsights. We can see main players are the United States of America and China. The harsh competition lasted until 2020 when the United States emerged as the leader. As per the statistics for the 4th quarter of 2023, the rankings stand as follows:

U.S.A – 653
China – 172
India – 71
United Kingdom – 53
Germany – 31
France – 24
Israel – 23
Canada – 21
Singapore -17
Brazil – 16

How to prepare a bar chart race?📊

There are many tools for preparing, such as (1)Flourish.studio, which I used for preparing that bar chart race. This is a comprehensive suite of visualization tools, including templates for bar chart races that you can fully customize. Even with the free version, you can do a lot except perhaps publish it as an HTML file.

(2)FabDev: This is a straightforward bar chart race generator. You simply upload a CSV file with all the necessary information, add the duration in seconds and the number of bars you want to display. Then, click on the “Generate Bar Chart Race” button to see a preview.

(3) Zoho Sheets: Similar to Google Sheets, but with extra features. One of these is adding animated race charts for easy data visualization. It’s simple to use if you’re familiar with spreadsheets like Google Sheets or Microsoft Excel.

Wow, you are at the third level! I bet you will read till the end! The show must go on!🌠

Data cleansing and standardization🧹

Undoubtedly, one of the most challenging aspects of creating a bar chart race is sourcing valid data and preparing it for visualization. For this project, I utilized CBinsights, which provides a comprehensive list of startups. However, there are numerous other resources available for data collection.

For instance, Statista offers a wealth of statistical data on a wide range of topics, while the World Bank provides access to vast databases on global development indicators.

Crunchbase: This platform provides information and insights on innovative companies and the people behind them.

data.world: This platform has many datasets available on companies. These datasets cover a wide range of topics, including diversity in technology companies, profitability of public companies, annual plastic packing volumes for major companies, and more.

University of Washington: This site provides a list of 21 places to find free datasets for data science projects. The datasets cover a wide range of topics and are suitable for different types of data science projects2.

Kaggle: This platform offers a dataset of over 7 million companies and vast majority of other types of data.

Datarade: This site provides industry data and databases. They represents themselves as “The easy way to find, compare, and access data products from 500+ premium data providers across the globe.”

However, still, these sources can still be invaluable for obtaining accurate and up-to-date data for your visualizations. To process this data, I utilized the Python🐍 programming language along with the Pandas🐼 library, which is excellent for data manipulation and analysis. Here’s the Python code I used to tidy up the data:

import pandas as pd
from google.colab import files
# Upload the Excel file
uploaded = files.upload()

# Read the Excel file into a DataFrame
df = pd.read_excel(io.BytesIO(uploaded['data.xlsx']), sheet_name='Sheet1')

# Dictionary mapping countries to regions
country_to_region = {
    'Argentina': 'South America',
    'Australia': 'Oceania',
    'Austria': 'Europe',
    'Belgium': 'Europe',
    'Bermuda': 'North America',
    'Brazil': 'South America',
    'Canada': 'North America',
    'Cayman Islands': 'North America',
    'Chile': 'South America',
    'China': 'Asia',
    'Colombia': 'South America',
    'Croatia': 'Europe',
    'Czech Republic': 'Europe',
    'Denmark': 'Europe',
    'Ecuador': 'South America',
    'Egypt': 'Africa',
    'Estonia': 'Europe',
    'Finland': 'Europe',
    'France': 'Europe',
    'Germany': 'Europe',
    'Greece': 'Europe',
    'Hong Kong': 'Asia',
    'India': 'Asia',
    'Indonesia': 'Asia',
    'Ireland': 'Europe',
    'Israel': 'Asia',
    'Italy': 'Europe',
    'Japan': 'Asia',
    'Liechtenstein': 'Europe',
    'Lithuania': 'Europe',
    'Luxembourg': 'Europe',
    'Malaysia': 'Asia',
    'Mexico': 'North America',
    'Netherlands': 'Europe',
    'Nigeria': 'Africa',
    'Norway': 'Europe',
    'Philippines': 'Asia',
    'Senegal': 'Africa',
    'Seychelles': 'Africa',
    'Singapore': 'Asia',
    'South Africa': 'Africa',
    'South Korea': 'Asia',
    'Spain': 'Europe',
    'Sweden': 'Europe',
    'Switzerland': 'Europe',
    'Thailand': 'Asia',
    'Turkey': 'Asia',
    'United Arab Emirates': 'Asia',
    'United Kingdom': 'Europe',
    'United States': 'North America',
    'Vietnam': 'Asia'
}

df['Region'] = df['Country Name'].map(country_to_region)

# Base URL of the flag images
base_url = 'https://public.flourish.studio/country-flags/svg/'

# Dictionary mapping countries to codes
country_to_code = {
    'Argentina': 'AR',
    'Australia': 'AU',
    'Austria': 'AT',
    'Belgium': 'BE',
    'Bermuda': 'BM',
    'Brazil': 'BR',
    'Canada': 'CA',
    'Cayman Islands': 'KY',
    'Chile': 'CL',
    'China': 'CN',
    'Colombia': 'CO',
    'Croatia': 'HR',
    'Czech Republic': 'CZ',
    'Denmark': 'DK',
    'Ecuador': 'EC',
    'Egypt': 'EG',
    'Estonia': 'EE',
    'Finland': 'FI',
    'France': 'FR',
    'Germany': 'DE',
    'Greece': 'GR',
    'Hong Kong': 'HK',
    'India': 'IN',
    'Indonesia': 'ID',
    'Ireland': 'IE',
    'Israel': 'IL',
    'Italy': 'IT',
    'Japan': 'JP',
    'Liechtenstein': 'LI',
    'Lithuania': 'LT',
    'Luxembourg': 'LU',
    'Malaysia': 'MY',
    'Mexico': 'MX',
    'Netherlands': 'NL',
    'Nigeria': 'NG',
    'Norway': 'NO',
    'Philippines': 'PH',
    'Senegal': 'SN',
    'Seychelles': 'SC',
    'Singapore': 'SG',
    'South Africa': 'ZA',
    'South Korea': 'KR',
    'Spain': 'ES',
    'Sweden': 'SE',
    'Switzerland': 'CH',
    'Thailand': 'TH',
    'Turkey': 'TR',
    'United Arab Emirates': 'AE',
    'United Kingdom': 'GB',
    'United States': 'US',
    'Vietnam': 'VN'
}

df['Code'] = df['Country Name'].map(country_to_code)

# Create the flag image URLs
df['Flag URL'] = base_url + df['Code'].str.lower() + '.svg'

# Group the data by quarter and country and count the number of occurrences
result = df.groupby(['Date with quarter:', 'Country Name', 'Region', 'Flag URL']).size().reset_index(name='Counts')

# Replace 'Q1-', 'Q2-', 'Q3-', and 'Q4-' with '1Q', '2Q', '3Q', and '4Q' respectively
result['Date with quarter:'] = result['Date with quarter:'].str.replace('Q1-', '1Q').str.replace('Q2-', '2Q').str.replace('Q3-', '3Q').str.replace('Q4-', '4Q')

# Convert the 'Date with quarter:' column to a PeriodIndex
result['Date with quarter:'] = pd.PeriodIndex(result['Date with quarter:'], freq='Q')

# Sort the result by 'Date with quarter:'
result = result.sort_values('Date with quarter:')

# Pivot the result to get a DataFrame with countries as rows, quarters as columns, and counts as values
pivot_table = result.pivot(index=['Country Name', 'Region', 'Flag URL'], columns='Date with quarter:', values='Counts').fillna(0)

# Calculate the cumulative sum over the quarters for each country
pivot_table_cumulative = pivot_table.cumsum(axis=1)

# Write the result to a new Excel file
pivot_table_cumulative.to_excel('pivot_table_cumulative.xlsx')

# Download the new Excel file
files.download('pivot_table_cumulative.xlsx')

So, long journey short, I hope you found it useful.😊 I am planning to prepare more data visualization posts in the future. Stay curious, and if you have any comments, feel free to drop them! Peace!🕊️

Dear Human, you nailed it! Take care, and if you are reading this, have a great day!🌻

Insights about Countries🔢

How to prepare a bar chart race?📊

Data cleansing and standardization🧹

For now, free!

Leave a Reply Cancel reply