Blog Posts

How to Create Dummy Data in Python

Blog: Think Data Analytics Blog

Dummy data is randomly generated data that can be substituted for live data. Whether you are a Developer, Software Engineer, or Data Scientist, sometimes you need dummy data to test what you have built, it can be a web app, mobile app, or machine learning model.

If you are using python language, you can use a faker python package to create dummy data of any type, for example, dates, transactions, names, texts, time, and others. Faker is a simple python package that generates fake data with different data types.

Faker package is heavily inspired by PHP FakerPerl Faker, and by Ruby Faker.

In this article, you will learn a different way to create Dummy data by using the Faker python package.

How to Install Faker to Create Dummy Data

You can install the package with pip as follows:

pip install Faker

Note: From version 4.0.0, Faker dropped support for Python 2 and from version 5.0.0 it only supports Python 3.6 and above.

Create Faker Generator for Dummy Data

To create and initialize a faker generator use the Faker() method.

from faker import Faker

fake = Faker()

Now you can start creating different dummy data you want.

Create Names

You can use the name()  method to create full fake names.

for _ in range(10):
   print(fake.name()) 

Mathew Brown
Mrs. Julie Chavez
Calvin Little
Manuel Ponce
Alyssa Jackson DVM
Amy Delgado
Matthew Smith
Sarah Rojas
Crystal Werner
Tina Moore

Note: You can also use the first_name() method to create the first name and the last_name() method to create the last name.

Create Dates and Times

If you are working with dates, faker provides different ways to create fake dates and times. In the following examples, you will learn 10 different ways to create dummy dates and times data.

print(fake.date_between(start_date="-3y",end_date="-1y")) # date between 2018 and 2020
print(fake.month())
print(fake.date_time())
print(fake.year())
print(fake.month_name())
print(fake.date_time_this_year())
print(fake.time())
print(fake.timezone())
print(fake.day_of_week())
print(fake.time_object())

2019-05-31
02
2012-05-31 17:53:01
2002
November
2021-06-30 00:34:48
08:17:51
Africa/Gaborone
Thursday
17:59:37

Create a Personal Profile 

If you want to create fake personal and identity information you can use  the profile and simple_profile methods from the faker library. 

The simple_profile method creates a fake basic profile with personal information such as name, gender, mail, and address.

generateProfile = Faker()

generateProfile.simple_profile() 

{‘username’: ‘qfowler’,
 ‘name’: ‘Matthew Greene’,
 ‘sex’: ‘M’,
 ‘address’: ‘USNV LopeznFPO AA 45803’,
 ‘mail’: ‘fwest@yahoo.com’, 
‘birthdate’: datetime.date(1995, 8, 14)}

The profile method creates fake personal profiles and identities such as job, company, residence,blood_group, current_location, and others.

generateProfile.profile() 

{‘job’: ‘Designer, television/film set’,
‘company’: ‘Murillo, Short and Townsend’,
 ‘ssn’: ‘893-14-6729’,
 ‘residence’: ‘6596 Daniel Spring Suite 910nJonesborough, ID 59049’,
 ‘current_location’: (Decimal(‘4.2622025’), Decimal(‘-39.109752’)),
 ‘blood_group’: ‘O-‘,
 ‘website’: [‘https://hardin-johnson.org/’,
  ‘https://patterson.com/’,
 ‘https://george-snyder.info/’],
 ‘username’: ‘samuelbooth’,
 ‘name’: ‘Shawna Spencer’,
 ‘sex’: ‘F’,
 ‘address’: ‘125 Darrell Extension Suite 575nPort Michaelbury, PA 12381’,
 ‘mail’: ‘nicole97@gmail.com’,
 ‘birthdate’: datetime.date(1989, 11, 25)}

You can also create more than one profile and save the profile data into a pandas data-frame for analysis. In the following example, we will create 1000 profiles with just 3 lines of code.

import pandas as pd 

generateProfile = Faker()

# generate 1000 profiles 
data = [generateProfile.profile() for i in range(1000)]

# save profiles in pandas dataframe
df = pd.DataFrame(data)

print(df)
image

Let’s observe the column names of the 1000 profiles created.

print(df.columns)  

Index([‘job’, ‘company’, ‘ssn’, ‘residence’, ‘current_location’, ‘blood_group’,
‘website’, ‘username’, ‘name’, ‘sex’, ‘address’, ‘mail’, ‘birthdate’], dtype=’object’)

We have 13 columns in the dataset. Now you can use the dummy data you generate for data analysis and visualization.

Create Sentence & Paragraph Data

If you are working on a software project, you can use the Faker library to generate fake text data to test some features in your web or mobile app. The Faker library provides 4 different methods to create text data as follows.

(a) Create a Single Paragraph

generateText = Faker()

generateText.text() 

‘Goal everything traditional to. Suggest stage stop international. Hold line south across new charge national.nClose money commercial success force. Five decision even environment notice every.’

(b) Create Multiple Paragraphs

generateTexts = Faker()

generateTexts.texts()

[‘Together require growth wind picture raise. Production task tree consumer recognize personal.’,
‘Be six whose answer. Mr oil successful under particular option.nStep nor once rise. Eye thank try stay only test service. Then senior within capital action. Gun already entire sign garden.’,
 ‘Painting now term direction. Will inside natural bar purpose major.nOther hear subject do their. Institution between education would laugh example on. Real statement kid specific able foreign.’]

(c) Create a Single Sentence

generateSentence = Faker()

generateSentence.sentence() 

‘Pass front responsibility.’

(d) Create Multiple Sentences

generateSentences = Faker()

generateSentences.sentences() 

[‘Maintain take star someone could kitchen employee.’,
‘Pay should own word begin.’,
‘Citizen place although old despite stay.’]

Create Localized Data

Faker library supports the creation of localized data. You need to pass the locale as an argument to the Faker class, by default it supports en_US locale.

You can find a list of localized providers here

In the following example, we will create 10 names from China.

fake_local = Faker('zh_CN')

for _ in range(10):
    print(fake_local.name()) 

李小红
赵桂香
陈小红
罗建华
宋华
刘秀芳
郭秀华
朱秀云
金艳
侯琴

You can also set multiple locales from version 3.0.0.

multiple_fake = Faker(['uk_UA', 'en_US', 'ja_JP'])

for _ in range(10):
    print(multiple_fake.city())

長生郡長生村
Christieland
Rileyshire
長生郡白子町
Port Curtisborough
Pruittview
селище Одарка
хутір Богодар
село Альберт
横浜市都筑区

In the above example, we created multiple cities from 3 different locations.

Create the Same Fake Data

To create the same fake data output, you need to seed the fake generator and then you can run the same code.

myGenerator = Faker()

myGenerator.random.seed(1234)

for i in range(10):
    print(myGenerator.country()) 

Slovakia (Slovak Republic)
Kazakhstan
Brazil
Albania
Bermuda
United States Minor Outlying Islands
Western Sahara
Wallis and Futuna
Sri Lanka
Mozambique

Note: You can use any random number as a seed.

Reference

  1. Faker Github
  2. Faker Documentation

If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post!

You can also find me on Twitter @Davis_McDavid.

And you can read more articles like this here.

Original Source

The post How to Create Dummy Data in Python appeared first on Big Data, Data Analytics, IOT, Software Testing, Blockchain, Data Lake – Submit Your Guest Post.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/how-to-create-dummy-data-in-python/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×