sqlbelle

View Original

8 Data Sets for your Tableau / Data / Analytics Projects

If you’re looking to practice your data visualization and analysis skills, here are some data resources you can consider.

Many of the data sets are fairly clean and won’t require too much cleaning/transforming. There are some though that will require you to flex your data wrangling muscles.

1. Kaggle

If you have been working in data for a while, you may have already heard of Kaggle. Kaggle is a machine learning and data science community with over a million members. The platform provides data sets, tools and competitions for its members. It has been acquired by Google in 2017 and at the time of this writing has over 50K public data sets.

Don’t worry, while Kaggle sounds like haggle, you’re not going to have to haggle for data. Once you sign up for an account, the data sets are free to use for learning. Make sure to check out competitions — you can gain not only confidence, bragging rights, but also monetary prizes.

Check it out: Kaggle

2. Tableau Public Resources

When you go to Tableau Public, you will see a tab for Resources. Within this page, there is a link for Sample Data. This page has a list of curated data sets, specifically with Tableau products in mind. There are data sets on:

  • Entertainment (Netflix, the Pokemon Index)

  • Sports (FIFA anyone?)

  • Public Data

  • Education

  • Government

  • Science (CO2 Emissions)

  • Lifestyle (Star Wars, Baby Names)

  • Technology

  • Health

  • Business

Check it out: Tableau Public Sample Data

3. Open Data portals

The concept of “Open Data”, according to the Open Data Institute, means having data that is

data that’s available for everyone to access, use and share

Many countries, states, provinces and cities have their own open data portals. The aim is to democratize the data and allow the public to explore the data — for education, for inspiration, for problem-solving, for social good.

An example of this is the City of Vancouver’s Open Data portal. There are many data sets under different themes — demographics, business and economy, culture and education, and many others.

I particularly like that they provide data sets with geographic data — GeoJSON, KML and SHP files. These are readily readable in Tableau, and you can easily practice the new map layers (and reflect back on times when this was SO HARD to do).

Check out some of these open data portals:

4. Knoema

Knoema is another portal that curates data sets from authoritative data sources. Their data, according to their website, is geared towards

users with interests in statistics and data analysis, visual storytelling and making infographics and data-driven presentations

They also claim to have:

the most comprehensive collection of global decision-making data in the world

Knoema also provides integration with other authoritative providers for categories like: card transactions, clickstream, DNS, energy, and more. Knoema also just recently launched integration with Snowflake Data Marketplace.

Check it out: Knoema

5. data.world

data.world offers a platform where people and communities can collaborate and share data. As of the time of this writing, there are almost 200K data sets in data.world.

data.world’s mission is:

To create the most meaningful, collaborative, and abundant data resource in the world.

Check it out: data.world

6. #MakeoverMonday (data.world)

data.world is also where #MakeoverMonday data sets are hosted. If you are not familiar with #MakeoverMonday, it is a weekly social project founded by Tableau Zen Masters Andy Kriebel and Eva Murray which is aimed to help the community improve their visualization and analysis skills.

Every week, a new data set is published. Anyone is encouraged to participate — to analyze and visualize the data set, to share initial work and perspectives, to see the different ways others analyzed and visualized the same data set, and receive feedback and suggestions from others.

Check it out: MakeoverMonday

7. #RWFD — Real World Fake Data (data.world)

data.world makes a third appearance in this list because this next set of data fulfills a specific need.

One of the challenges with personal projects is we often get data sets that are not reflective of “real life” data sets in different verticals. #RWFD — or Real World Fake Data — is the brainchild of Mark Bradbourne and the inspiration is drawn from MakeoverMonday.

This provides a real opportunity for many to have a peek at the types of data we can expect in different industries and sectors — Human Resources (HR), Insurance, Healthcare, Help Desk, Social Media, Education to name some.

Check it out: #RWFD — Real World Fake Data

8. Mockaroo

If you need to create your own data set using your own fields but with fake data, you can use a service like Mockaroo. The free tier allows you to download 1,000 records at a time (which you can do multiple times).

If you are familiar with other data tools, you can take this to the next level. You can create a database that has purely mocked up data but potentially mimics fields you can find in your own system.

Check it out: Mockaroo

Additional ideas

There are many other resources for data. The list above is just the start. Some additional ideas you can consider are:

  1. Wearables — you can use data that you are generating using your wearable devices. This is fascinating TED Talk by Talithia Williams on owning your body’s data.

  2. Social Media — You can download your own data from social media platforms like LinkedIn, Twitter. Web scraping can be an option if you are familiar with it. FAIR WARNING: Just be aware that there are restrictions what data you can scrape, in what way, and what you can use/store.

What other data sets do you use?

Original Post in Medium