Friday, 9 April 2021

 Sampling

sample() is an inbuilt function of random module in Python that returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement.

Sampling is the process of selecting a random number of units from a known population. It allows obtaining information and drawing conclusions about a population based on the statistics of such units (i.e. the sample), without the need of having to study the entire population.

Sampling is performed for multiple reasons, including:

  • Cases where it is impossible to study the entire population due to its size
  • Cases where the sampling process involves samples destructive testing
  • Cases where there are time and costs constrains.

 In order to solving a sample programming in python we need to follow following procedure :
  • At first  we need a Bayesian network.
  • Then we have to  import pomegranate and counter.
  •  we need to generate_sample functions in this function.
  • In the function, we start a loop over all states, Assuming topological order.
  •  if we have a non-root node. sample conditional on parents then "sample[state.name] = state.distribution.sample(parent_values=parents).
  •  Otherwise "sample[state.name] =state.distribution.sample()". 
  •  Return the sample.
  •  Reject the sample.
  • compute the sample.

 My name's Happy khatun. I am a student of city university. This  blog is the easiest way to learn python programming in Bangladesh. This course is conducted in City University by our most honorable teacher Nuruzzaman Faruqui.

In this blog you will find every single line explanation of python code. Here  every person can gather knowledge about python programming. Also one will be overcome of his fear regarding python programming.

Here we will discuss a sampling problem. The problem is :






From this we will found how many times the train will reach in time and how many time it will be missed.

In order to solve this problem in python we need to generate a model file like below:


from pomegranate import *

# Rain node has no parent
rain = Node(DiscreteDistribution({
"none": 0.7,
"light": 0.2,
"heavy": 0.1
}), name="rain")

# Maintenance node is conditional on rain for that we use
Conditional ProbabilityTable
maintenance = Node(ConditionalProbabilityTable([
["none", "yes", 0.4],
["none", "no", 0.6],
["light", "yes", 0.2],
["light", "no", 0.8],
["heavy", "yes", 0.1],
["heavy", "no", 0.9]
], [rain.distribution]), name="maintenance")

# Train node is conditional on rain and maintenance
train = Node(ConditionalProbabilityTable([
["none", "yes", "on time", 0.8],
["none", "yes", "delayed", 0.2],
["none", "no", "on time", 0.9],
["none", "no", "delayed", 0.1],
["light", "yes", "on time", 0.6],
["light", "yes", "delayed", 0.4],
["light", "no", "on time", 0.7],
["light", "no", "delayed", 0.3],
["heavy", "yes", "on time", 0.4],
["heavy", "yes", "delayed", 0.6],
["heavy", "no", "on time", 0.5],
["heavy", "no", "delayed", 0.5],
], [rain.distribution, maintenance.distribution]), name="train")

# Appointment node is conditional on train
appointment = Node(ConditionalProbabilityTable([
["on time", "attend", 0.9],
["on time", "miss", 0.1],
["delayed", "attend", 0.6],
["delayed", "miss", 0.4]
], [train.distribution]), name="appointment")

# Now create a Bayesian Network and add states

model = BayesianNetwork()
model.add_states(rain, maintenance, train, appointment)

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

# Finalize model
model.bake()
Now, we have to import pomegranate, counter, and the model file where we create a Bayesian Network


import pomegranate

from collections import Counter

from model import model

def generate_sample():

# Mapping of random variable name to sample generated
sample = {}

# Mapping of distribution to sample generated
parents = {}

# Loop over all states, assuming topological order
for state in model.states:

# If we have a non-root node, sample conditional on parents
if isinstance(state.distribution, pomegranate.ConditionalProbabilityTable):
sample[state.name] = state.distribution.sample(parent_values=parents)

# Otherwise, just sample from the distribution alone
else:
sample[state.name] = state.distribution.sample()

# Keep track of the sampled value in the parents mapping
parents[state.distribution] = sample[state.name]

# Return generated sample
return sample

# Rejection sampling
# Compute distribution of Appointment given that train is delayed
N = 10000
data = []
for i in range(N):
sample = generate_sample()
if sample["train"] == "delayed":
data.append(sample["appointment"])
print(Counter(data))

 After running this code in pycharm we get following result:





Knowing how and what to sample can be very useful. If you wish to learn about different sampling methods and how to strategically pick data points to look at, this mission is definitely the place to start! Here we solved a sampling problem.

No comments:

Post a Comment