Probability Puzzles

Problem Statement

Facebook has a content team that labels pieces of content on the platform as spam or not spam. 90% of them are diligent raters and will label 20% of the content as spam and 80% as non-spam. The remaining 10% are non-diligent raters and will label 0% of the content as spam and 100% as non-spam. Assume the pieces of content are labeled independently from one another, for every rater. Given that a rater has labeled 4 pieces of content as good, what is the probability that they are a diligent rater?

Solution

First of all, let's understand what kind of events we have here and what are the probabilities of these events. If we randomly pickup a rater, we'll get a diligent one with the probability 0.9 and a non-diligent with the probability 0.1. Then, depending on a type of rater we'll have different probabilities for the event "classify content as spam". Probabilities for both events are summarised in the table below.

Diligent: 0.9		Non-diligent: 0.1
Spam	Non-spam	Spam	Non-spam
0.2	0.8	0	1

We're asked to find a conditional probability P(A | B), where

event A - "a diligent rater was selected"
event B - "4 pieces of content marked as non-spam"

We can solve it using the Bayes theorem: $$P(A|B) = {P(B|A) * P(A) \over P(B)}$$ In order to do this we need to calculate a few probabilities:

P(B | A) = P("4 pieces of content marked as non-spam, given a diligent rater") = $0.8^4$
P(A) = P("a diligent rater was selected") = $0.9$
P(B) = P("4 pieces of content marked as non-spam) = $0.9 * (0.8)^4 + 0.1 * 1$

Combining all these elements together we get our answer: $$\pmb{P(A | B) = {0.8^4 * 0.9 \over 0.9 * (0.8)^4 + 0.1 * 1} = 0.787}$$ Not convinced? Try the simulation below!

Simulation

Note that for this simulation we're randomly choosing a rater first, then simulate 4 content assessment events for the selected rater. Only if a rater generates 4 non-spam we'll include the trial into the calculation of the final probability - this way we simulate the conditional probability we need.

show / hide simulation code


import numpy as np

nrounds = 1000
four_non_spam_cases = 0
diligent_raters = 0

while four_non_spam_cases < nrounds:
    is_diligent = np.random.binomial(1, 0.9, 1)[0]

    spam_count = 0
    for j in range(4):
        is_spam = 0
        if is_diligent:
            spam_count += np.random.binomial(1, 0.2, 1)[0]

    if spam_count == 0:
        four_non_spam_cases += 1
        diligent_raters += is_diligent

simulated_probability = diligent_raters / nrounds