Problem Statement
Facebook has a content team that labels pieces of content on the platform as spam or not spam. 90% of them are diligent raters and will label 20% of the content as spam and 80% as non-spam. The remaining 10% are non-diligent raters and will label 0% of the content as spam and 100% as non-spam. Assume the pieces of content are labeled independently from one another, for every rater. Given that a rater has labeled 4 pieces of content as good, what is the probability that they are a diligent rater?Solution
First of all, let's understand what kind of events we have here and what are the probabilities of these events. If we randomly pickup a rater, we'll get a diligent one with the probability 0.9 and a non-diligent with the probability 0.1. Then, depending on a type of rater we'll have different probabilities for the event "classify content as spam". Probabilities for both events are summarised in the table below.Diligent: 0.9 | Non-diligent: 0.1 | ||
Spam | Non-spam | Spam | Non-spam |
0.2 | 0.8 | 0 | 1 |
We're asked to find a conditional probability P(A | B), where
- event A - "a diligent rater was selected"
- event B - "4 pieces of content marked as non-spam"
- P(B | A) = P("4 pieces of content marked as non-spam, given a diligent rater") = \(0.8^4\)
- P(A) = P("a diligent rater was selected") = \(0.9\)
- P(B) = P("4 pieces of content marked as non-spam) = \(0.9 * (0.8)^4 + 0.1 * 1\)
Simulation
Note that for this simulation we're randomly choosing a rater first, then simulate 4 content assessment events for the selected rater.
Only if a rater generates 4 non-spam we'll include the trial into the calculation of the final probability - this way we simulate the conditional probability we need.
show / hide simulation code
PREVIOUS | NEXT |