Thursday, 12 March 2009

The human footprint solved


In this post I will give you what I think is the answer to the question in my previous post (The human footprint).


We wondered how to calculate the average frequency of human sexual behavior (in sexually active adults) given a set of answers to the question "how long has it been since the last time you had sex?" Here is an attempt to find the solution.


To simplify things a bit, we assume that the event (sexual intercourse) occurs perfectly periodic for each individual, with period I (interval). Another simplification is that the period is an integer (discrete stochastic variable) and that the minimal period equals one day. The only data we have are the number of days (D) since the event occurred last. Obviously we try to estimate the expected value of the interval period E (I). If N is the size of the data set (the number of respondents), then the estimated value of the average interval equals:



Bayes states that:


We are able to estimate P(D=d | I = i). Here is how. What we do know is that for a person who has sex every day (I = 1) there will be a 'fifty-fifty' distribution between the answers "just today" (D = 0) and "that was yesterday" (D = 1). Even though we cannot know the value of I for any individual, we do know that for all levels of I the chance of D = d equals P(D = d | I = i) = 1 : (i+1).


This yields the following table:








Distribution of P(D = d | I = i)
D number of days since last event
01234...
I
(interval
between
events)
11/21/2000...
21/31/31/300...
31/41/41/41/40...
...



To estimate P (I = i) we have to know P (D = dI = i). At first I had hoped to use the above table for this, but we cannot use it directly. We would like to estimate P (D = dI = i) with our given set of nd and P (D = d | I = i), but we cannot use the table here because if we allow I to be unbound then:



P (D = dI = i) remains unknown. Note, however that for all levels of i (with d < i) all P(D = dI = i) are equal, so P(D=0I=i)=P(D=1∧I=i)=P(D=2∧I=i). This means that the actual number in cell (d, i) should be about equal to the numbers is the non empty cells in the same row (with the same level of I). I will call the estimated number in the non empty cells of the first row .


Now we are getting somewhere; we do know is nd, the number of people in the sample who answered D = d. We would expect n0 n1. The important thing is that , et cetera.


We can now estimate .


In general a (crude) estimation of n(d,i) would be

.


Now we know enough:



thus




This estimation is very crude indeed. At this stage I have not tried to integrate the information of ni+2, ni+3, and so forth in the estimation of n(d,i).

No comments:

Post a Comment