Sunday, January 06, 2013

Self-learner project: Friendship Paradox

Your friends have more friends than you.  It't not you, it's the friendship paradox.  Your friends are not random.  They are more likely than you to be social butterflies or they wouldn't have met you (or maintain ties with you).

One of the professors teaching MITx: 6.00x Introduction to Computer Science and Programming stated that his goal was for us (the students in the class) to learn enough to break down problems and write python programs to solve them.

In the introductory chapter of my PhD thesis, I wrote that one of the reasons for modeling is to help explain the results of physical experiments.  Sometimes, modern experiments (not to mention, nature!) are so complex, it can be hard to interpret the data.  Through modeling, we can break problems down into simpler components and then add effects in until we match the observed data.  Then we can explain what actually likely happened.

Anyway, I wrote a python program to clarify the friendship paradox because it also explains why I, with a BMI of 22.5, am often the fattest person in Pilates or spin class.  Steven Strogatz explained:
For example, imagine going to the gym. When you look around, does it seem that just about everybody there is in better shape than you are? Well, you’re probably right. But that’s inevitable and nothing to feel ashamed of. If you’re an average gym member, that’s exactly what you should expect to see, because the people sweating and grunting around you are not average. They’re the types who spend time at the gym, which is why you’re seeing them there in the first place. The couch potatoes are snoozing at home where you can’t count them. In other words, your sample of the gym’s membership is not representative. It’s biased toward gym rats.
So back to the friends problem.  I started by thinking about the shape of the distribution of the number of friends one might have.
  • The distribution should be self-similar; the shape of the distribution of the friends of friends should resemble the shape of distributions of friends of friends of friends.  
  • The distribution must be positive; you can't have less than 0 friends.
  • The distribution should have a long right tail
A gamma distribution of shape = 2-3 looks about right.
My intuition was not bad.  In 2003, Bronius Grigelionis wrote a paper about the self-similarity of the Euler Gamma Function.  Gamma distributions are useful in modeling computer networks--why not people networks?

I sorted people by number of friends, from hermits on the left to social butterflies on the right. The number of friends for each person is in blue.  The average number of friends of friends for each person is in red.  Note that, for the first 400 or so of the 500 people, their friends have more friends than they do.

I had a heck of a time assigning friends to the social butterflies in a computationally efficient manner.  If you sort people by numFriends (taken from the Gamma distribution) from hermits to butterflies, than go up the sequential list, assigning them more friends from those remaining until they reach numFriends and having those people friend them back, you will eventually reach a problem.

There are simply not enough people leftover to fill up the friend lists for those social butterflies.  In that case, I had them friend everyone to their right and then randomly select friends from those on the left that had not initially selected them.  I assumed that those on the left friended them back out of politeness, which added slightly to the total friend count.

The distribution of number of friends for a sample population of 500 people:
The initial number of friends taken from the Gamma distribution is shown in green.  The distribution of adjusted friend counts is shown in blue.

The commands to run the program:
numf = pickNumFriendsRecur(500, shape=3, scale=5 stretch=8)
flist = assignFriends(numf)
fof = howMany(flist)

The output:
max(numf) > nsubj:  501 500
trying reducing scale to:  4.9
avg numf after iteration:  115.732
Target average friends:  115.732
Adjusted average assigned friends:  121.724
Average friends of friends:  151.410319998

I encourage you (and your kids!) to view and download the sample code from pastebin and play around with it.

2 comments:

  1. You are such a wonderful Renaissance woman. Thanks for explaining why DH and I cal ourselves "country mice" amongst many of our friends while a few of them are in awe of our social life.

    ReplyDelete
  2. Very interesting! This makes total sense, of course, but I had never thought about it that way in the friendship context. It definitely seems like most of my friends have more friends than me by a long shot.

    In dating, it took me a while to realize this phenomenon, but I eventually did figure it out. Sometimes (often?) when I really connected with a guy in a way that is rare for me, I wouldn't hear from him again. I thought that was a crazy waste. How could he pass up that one-in-hundred connection? But then I realized--they are probably the kind of people who connect like that with *everyone.* It is rare for me, but for them it was just another ho hum date. I didn't figure out a solution, but it was some comfort to realize that, for the most part, it wasn't that I had somehow blown it; it had nothing to do with my behavior at all.

    (And I am just now realizing that the reverse happened frequently--I wouldn't understand why guys with whom *I* had a ho hum date would pursue me so vigorously; to them, or at least some of them, it felt like that one-in-a-hundred connection.)

    ReplyDelete