melannen: Commander Valentine of Alpha Squad Seven, a red-haired female Nick Fury in space, smoking contemplatively (Default)
melannen ([personal profile] melannen) wrote2018-12-12 12:07 am

Let's Read A Scientific Paper: Is Social Media Bad For You?

Specifically, this one:

Melissa G. Hunt, Rachel Marx, Courtney Lipson, and Jordyn Young (2018). No More FOMO: Limiting Social Media Decreases Loneliness and Depression. Journal of Social and Clinical Psychology. free preprint on Researchgate.

For quite awhile I've had a recurring desire to start a blog that was just me reading whatever scientific study was hitting the mainstream media that week, summarizing/analyzing the paper for a layperson's perspective, and then pointing out what the media coverage should have been saying instead. I tried doing this on a tumblr sideblog for awhile; what I always forget is that it takes a long time to do. Even if I pick a simple paper that's available open access or preprint, isn't super long or technical, don't look up any of the citations, and just accept any math I can't figure out, it still takes more free time than I seem to consistently have these days (I say, as I look down my long trail of excessively long journal entries from this week...)

But this paper - about how social media is very bad for your psychological health - hit the media a few weeks ago, and the coverage made me so mad I either had to write this up or stew over it all night silently instead, and it seemed like a topic y'all would be interested in, so here we go, let's do this!

Abstract:
Introduction:
Given the breadth of correlational research linking social media use to worse well-being, we undertook an experimental study to investigate the potential causal role that social media plays in this relationship.
Method:
After a week of baseline monitoring, 143 undergraduates at the University of Pennsylvania were randomly assigned to either limit Facebook, Instagram and Snapchat use to 10 minutes, per platform, per day, or to use social media as usual for three weeks.
Results:
The limited use group showed significant reductions in loneliness and depression over three weeks compared to the control group. Both groups showed significant decreases in anxiety and fear of missing out over baseline, suggesting a benefit of increased self-monitoring.
Discussion:
Our findings strongly suggest that limiting social media use to approximately 30 minutes per day may lead to significant improvement in well-being


Okay, let's go down the actual paper point-by-point, looking at what they're actually doing here. You may want to pull up the paper I have linked above and read along, it's pretty readable as these go, but you should be able to follow along without that.

  • The experiment was done on 143 disproportionately female undergraduates who were taking psychology classes at an urban Ivy League university, who were iPhone owners and active users of Facebook, Snapchat, and Instagram, and who chose this social media study out of a list of other possible psychology studies to participate in to earn course credit. It was not a study of “humans in general”. This is such a problem in Psychology studies that it’s become almost a joke, but it doesn’t stop researchers from continuing to do it, and most of the other studies they’re citing did the same thing. I can think of half a dozen ways offhand why a study of psychological health and social media use of self-selected current undergraduates at an Ivy League university might not generalize well to other human beings; I bet you can think of even more.

  • They didn’t test “social media use”, they tested use of the Facebook, Instagram, and Snapchat iPhone apps. This is a very specific subset of social media that, in some cases, we know for a fact have a profit motive that have caused them to specifically (if not intentionally) design their technology to make people feel worse. It’s also specifically use of the mobile apps, which would generally apply to use when someone is away from their computer. I can think of half a dozen ways offhand why a study of mobile use of Snapchat, Instragram, and Facebook might not generalize to all social media use; I bet you can think of even more.

  • They explicitly point out (good for them!) that nearly all previous studies have been correlational – that is, “less happy people use facebook more” – which doesn’t make it clear whether the unhappiness causes the facebook use or vice versa. They list two previous studies that do try to prove causation, but have issues with both of those studies. I’m not going to dig in enough to read those studies too, but based just on what’s in this paper, I have issues with the FOMO authors’ analyses of them.
    • Verduyn et al., 2015 apparently studied passive vs. active engagement on Facebook (reading vs. posting) and found that passive was worse, an observation that backs up my anecdata, but I’m not sure what that says about social media use in general. (I’d particularly like to see passive social media use compared to a) other passive media consumption and b) sitting in a room where other people are talking and you aren’t allowed to join in.) The FOMO authors’ objection, however, is that “that the longer one spends on social media, the more one will be engaging with it in a passive way.” Have they. Have they never lost an entire night to a comment thread before? Do they just… not understand what active use of social media is? Sure, I can lose an hour scrolling Tumblr or Wonkette comments, but I can lose a day in having a discussion or writing a post or taking photos. I'm not even going to think about the amount of time I've spent on this post and I haven't even posted it yet. (Maybe they don’t think “content creation” counts as social media use?)
    • Tromholt, 2016 apparently randomly assigned people to completely abstain from Facebook for a week. I agree with most of what they say about this, but then I get to “many users have grown so attached to social media that a long-term intervention requiring complete abstention would be unrealistic.” Dude. People have done scientific studies requiring abstention from food, I think recruiting people to give up Facebook for a few months is doable. Especially given stats you quoted in this same paragraph that half of US adults don’t even use it regularly anyway. Either you’re saying that giving up Facebook would mess up people’s lives so badly that you can’t ethically do it – in which case you’ve already answered your research question about whether it’s a net good – or you’re saying that you don’t think you could get Ivy League undergraduates to do it for course credit, which is another thing entirely.

  • They chose the "well-being constructs" they were testing based on ones that had been shown in other studies to correlate with social media use. From one side, this is good - it means that they're trying to double-check the results of previous studies. In other ways, this is bad - the "well-being constructs" they're using are subjective self-reported questionnaires; by choosing specific questionnaires that have already been shown to correlate well with social media use, but changing the rest of the experiment, they're basically just double-checking those specific questionnaires. Rather than testing actual well-being, by filtering for questionnaires that they already know get the results they are looking for they may just be testing against certain layout of questionnaire that tends to get certain results from social media users. (Note: self-reported questionnaires in psychology are. not the most rigorously scientifically supported things, in general. They are sometimes the best we've got, but that's often not saying much.)

  • They tested well-being using seven different factors, all of which were measured using standard questionnaires. I’ll leave most questions about the specific questionnaires to people who know their current psychology more than me, but an important thing to note is that self-reported questionnaires are very vulnerable to biased reporting – that is, the study participants think they know what result they should be getting, so without even realizing they’re doing it, they slightly round up their answers toward that result. This is why blinding is important – trying to make sure the participants don’t know what result they should be getting, in order to reduce the effect of this (this study is not even slightly blinded.)

  • They tracked whether the students really did limit their usage by having students send them screenshots of their phone’s battery usage by app. This means any usage not via the app on their own phone wasn’t tracked – which, to their credit, they point out as a weakness of the study. They also had a lot of trouble getting the screenshots sent consistently every day, so halfway through the study they changed the rules to weekly, which isn’t great in terms of the study design, and also meant they lost days’ worth of data whenever anybody’s phone battery ran out and cleared the records.

  • In the first half of the study, they had students do the depression inventory less often than the others, because “it was assumed that depression would not fluctuate much on a week-by-week basis.” Is that… is that a standard assumption in psychology? Because I remember being an undergrad; my perceived level of depression fluctuated wildly based on things like how many hours ago I’d last remembered to make some ramen – and the study authors changed their mind on this halfway through the study and started including the depression inventory a lot more often, so clearly they too realized they were wrong. It worries me that they had such a limited understanding of depression in their study population going into a study about depression that they changed their mind on something this basic halfway through the study.

  • “We found that baseline depression, loneliness, anxiety, perceived social support, self-esteem, and well-being did not actually correlate with baseline social media use in the week following completing the questionnaires.” – that is, people who used social media before the study started more weren't actually less happy. Which is interesting, because it seems to contradict those previous studies which all claimed it did. The paper proposes that this is because the correlation is not with actual social media use, but with perceived social media use – that is, unhappy people think they use social media more, when in fact they don’t – and that self-reported vs. objectively tested usage doesn’t actually match up all that well. This seems to me like the actual most interesting result in the paper, especially given that their well-being data was all self-reported and might be subject to the exact same bias, but they don’t dwell on it much.

  • “Baseline Fear of Missing Out, however, did predict more actual social media use.” Okay. You guys. Fear of Missing Out. This is such a buzzwordy thing. It was invented by ad execs in the late 90s to convince people to buy stuff, and has since become something that is basically used entirely for social media scaremongering. Which is to say, Fear of Missing Out has, in large part of late, been defined by “that thing that young people who use a lot of social media experience”, so of course it correlates with social media use. I’m not saying it’s not a valid psych concept, but I’d like to see a lot more about it outside of the context of social media use and/or kids these days before I stake very much on it, and see more of it in the context of other less-buzzwordy psychological factors.

  • The students were divided into a control group and an experimental group. They don’t seem to provide the numbers for exactly how many students were in each group, but with two groups and a total of 140, that means the number of students who limited their use was only around 70. Because of the changes halfway through the study, the number who limited their use and had their depression scores fully recorded would have been only around 30. This is the level where three or four outliers can drastically change the significance of your results, even if they are far less dramatic as outliers than Spiders Georg (and may not even be obvious as outliers - if a couple of kids in the control group caught the flu and none did in the experimental group, for example, that's within the range of chance for those kinds of numbers, but might have a big effect on your results beyond what your statistics are going to capture.)


  • They found quite high significance for “loneliness”, which convinced me to look at the actual test they used for loneliness, which was the revised UCLA Loneliness Scale; Russell, Peplau, & Cutrona, 1980. What struck me, when I looked up the questionnaire, is that it asks things like “I have a lot in common with the people around me” or “there are people I can talk to”. Those are the kinds of questions that are not designed to acknowledge the existence of long-distance text-based friendships, and given that these are college undergrads away from home, they are likely to have a lot of long-distance friendships, and higher social media use probably corresponds to putting more effort into those long-distance friendships. (There’s discussion to be had about the quality of internet vs. in-person friendship, but this questionnaire, which was designed when email was barely a thing, is not the way to have it.) A difference between interacting more with local vs. long distance friends leading to differences in answers in the questionnaire about people “around me” could explain the entire difference in the loneliness scale. It wouldn't even require the students to feel like their online friendships were more lonely - because the questionnaire automatically assumes that fewer in-person interactions means more loneliness.

  • Mind you, they didn’t make it easy to figure out what the actual difference in loneliness score was. They give the p-values and state that it’s “significant” but “significant” in statistics doesn’t mean “large”, it just means “probably not an accident”. It looks from the poorly-labeled graph like there was an (average?) drop of about 6 points out of 80 possible, which is the equivalent of answering about one and a half of the questions differently – certainly within in the range of what might have happened if even a couple of the questions were biased against online interactions. The only other actual data we get for loneliness is “F (1,111) = 6.896” without any context, and I will confess to not knowing what that means without any further explanation.

  • For depression, they split the results into two groups – “high baseline depression” and “low baseline depression”. They don’t at any point say how many people were in each of these groups. But remember we’re starting with about 70, so if it split evenly, it would be at most 35 in each; it’s unlikely it was split evenly, so now we’re looking at even smaller numbers in each group. Also, remember they changed the way they measured depression halfway through the study, so it’s at most 15 in each group at that point. At that point, just one kid gets the flu (physical illness tends to increase depression scores) and your numbers are off. And at no point do they discuss how they factored the change in methodology into their statistical analysis.

  • There is a graph for depression. I hate this graph. It appears to directly contradict what is said in the text. If it doesn't directly contradict the text, it is an incredibly bad graph that manages to give the impression it does (well, it's an incredibly bad graph either way.) So I have no idea whether to go by what the text says or what the graph says.

  • Either way, they do both agree that even the control group had a significant decrease in their depression scores, and the experiment group improved more. That seems to imply that simply taking part in the study reduced depression levels. This makes sense to me – for my personal depression, feeling like I’m accomplishing something and providing a net good to humanity is one of the best ways to improve my mood a little, as was (when I was an undergrad) successfully finishing homework, so getting a little hit of “I did the study!” every day for a month would probably reduce depression scores (not cure it, just reduce the scores a tiny bit), and in the experimental group, knowing that I had succeeded at meeting the study conditions of limiting my use would have helped even more. And given the other result that perceived social media use is more closely correlated to well-being than actual social media use, it could be that just being mindful of the time spent on social media improved mood, regardless of actual changes in use, because people weren’t overestimating time spent and beating themselves up for it. It would be interesting to see if there was a difference here between the people who reported every day and those who reported every week, or between people who were always successful in meeting the conditions and those who weren’t – but of course they don’t break that down at all. Probably because at that point the number of people in each group would be so small it would be impossible to treat it like good data.

  • They also saw a decline in FOMO and anxiety over the course of the study, but with no significant difference between control and experimental groups, which further backs up the possibility that just participating in the study increased well-being, or the possibility that the results are due to bias in self-reporting due to the study not being blinded (or due to taking the same tests repeatedly changing the way the students responded to the tests), or reversion to the mean (where people are more likely to sign up for a study like this if they’re doing worse than normal, and then get a little closer to normal over the course of the study, because most things to eventually get a little closer to normal.)

  • You’ll note that the paper’s title mentions FOMO but FOMO was not one of the things that showed a significant result for experimental vs. control. See what I meant about it being a buzzword?

  • The other three tests showed no significant difference either between experimental and control groups or between baseline and post-experiment. It seems to me that "social support, self-esteem, and well-being are completely unaffected by social media use" is almost a more interesting result, but for some reason, that part didn't show up in their abstract or press release. Wonder why.



  • Is it time to talk about p-values? I think it is. Almost all of their results are reported mostly as p-values. If you think of experiments as attempts to figure out if something interesting is happening, p-value is basically how most scientists score how interesting it really is.

  • P-values are relied on for this probably a lot more than they should be, for lots of complicated reasons - the word "bayesian" gets thrown around a lot - but statistics are not my favorite thing, so let's just go with P-values are the standard for now. Explain p-values:

    • Flip a coin once, it comes up heads. This is not very interesting. Flip it twice, it’s heads both times, it’s a little bit interesting, but not very, that could just be coincidence. Flip it ten times and it’s heads every time, then either there’s something up with the coin, something up with the way you’re flipping it, or you’re a character in some kind of postmodern theater production and may actually be dead, and regardless, you should probably try to figure it what’s going on.
    • The lower the p-value, the more heads in a row you’ve rolled. A p-value of .05 means you’ve rolled about 5 heads in a row – there’s about a one in twenty, or 5%, or .05, chance of that happening purely by coincidence. .05 is what people usually use as the cut-off for interesting (aka significant). A p-value of .001 means you’ve rolled 10 heads in a row, and you’re in Tom Stoppard territory - something is definitely up.
    • With a coin flip, we have a pretty good idea of what the results would be if nothing interesting was happening (we would get 50% heads if we tried enough times, and get closer to 50% the more often we tried.) With most things in actual science, we don't really know what the probability of something happening by chance really is - some of the numbers plugged into the statistics are always going to be best guesses (otherwise, we wouldn't need to do the science.) For something like a particle accelerator, most of those best guesses are going to be really good best guesses, based on a lot of previous science, and engineering tests, and very hard numbers. For something involving people,
      not so much.

  • For loneliness, where there was a 'significant' difference between control and experiment - their p-value was .01, which is somewhere around seven coin-flips coming up heads in a row. For FOMO and anxiety – where the change was around the same in the control group and the experimental group – they get p-values around 9-10 heads in a row. These are pretty interesting results - something is probably happening, even if we disagree about what it actually is.

  • For depression, they only give the p-value as “<.05” and not the actual value, which seems kind of sketchy, and usually means it was somewhere around .04999.That’s back to about 5 coin flips in a row, or a one-in-twenty chance of it happening by chance.

  • Remember, they tested seven different things, and at least some of those things they cut up into different categories, so they were basically doing more than one test on them. We know they tested change for high depression groups and low depression groups, in both the experimental and control groups, and tested both the experimental and control groups for six other characters, so that’s up to 16 tests, and we don’t know if they tried to divide up any of the other characters and didn’t see anything interesting so they didn’t report it. But already we’re up to the 1-in-20 chance coming up in one test out of 16 – suddenly, a 1-in-20 chance doesn’t sound that interesting. (see this xkcd comic if I lost you on that one.)

  • This is why you need to share more data than just p-values. And why you need to replicate by having someone else perform the tests and see if they get the same results - the same 1-in-20 chance coming up twice in a row is suddenly a lot harder to write off. (This kind of psychology study is notorious for not getting the same results twice in a row.)

  • They wanted to get final follow-up data from the participants several months later. However, only about 20% of them responded, because they’d already earned their course credit and the semester was over (one of many reasons why using undergrads is not always the best choice.) Also, looking at the numbers elsewhere in the papers, it looks like for most of the tests they were only using data from about 110-120 of the 143 undergraduates who signed up. Why? What happened to the other 20-30? Did they drop out of the study, or was their data unusable for some reason, or did they reject it as outliers? Dunno, it’s not in the paper.

  • ALL THAT SAID - and it was a lot - this is not a particularly bad paper, as papers in this field go. It's probably significantly better than most - it's better than the last few I've read. The fact that I could shoot it full of holes is not so much on the scientists involved as on the field as a whole - on the pressure to publish something, on the fact that papers like this do get published regularly, on the fact that most of the things they did that I objected to are things that most psychology papers do.

  • Also, the free version of this paper is a pre-print version, which means the full version that's published in the journal might be slightly improved (not very much improved, though, probably, because this version was on the journal website for awhile.) Some preprints are actual rough drafts, and I wouldn't pick them apart this hard - but this one had a press release to go with it, and if you're putting out a press release, you're saying your paper is ready for prime time.

  • The title and all the press coverage - including in the science blogosphere, which used to know better but has been getting worse and worse - vastly overstated the results and the extent to which the results could be generalized to all people and all social media. They always do this. It's terrible. It builds mistrust and misunderstanding of science, reinforces bad practices in the lab, and creates a space for actual fake science to come in. It needs to stop. I don't know how to stop it.

  • Actual interesting results from this paper that I would have put in an article about it if I was a science journalist:
    • People who are unhappy probably think they are wasting time on social media more than they actually are. (Or, perhaps, happy people think they're using it less.)
    • It's possible that the reason studies like this get positive results is that press coverage of studies like this convinces people that if they're unhappy they must be using more social media which means that when they participate in studies like this they report that they use more social media if they're unhappy which means the studies get press coverage and the cycle repeats, because science doesn't happen in an ivory tower.
    • Tracking your social media use - or possibly just any sort of small, achievable daily task that feels productive, such as tracking your social media use for a study - seems to make people less unhappy.
    • Reducing Facebook and Instagram and Snapchat use on mobile devices among mostly-female ivy league college undergraduates appears to improve their mental health (which, given that Facebook at least was invented in order to make ivy league college girls feel like shit about themselves, is not a surprising result regardless of your opinion of social media in general!) Given the mental health problems being seen among Ivy League college undergrads, this is probably a very useful thing to explore further even if it's kept to that limited scope.


  • ...so that is what I do whenever I see a "new scientific study" being reported in the media. It is probably something you could learn to do as well! Even if you don't want to make a hobby of it as I have, it's useful to remember that pretty much any scientific study that is supposed to be giving "amazing new results" is really just somebody saying "I tried a thing and this result looks kind of interesting and maybe worth following up but I dunno really," only with stats, and with grant money on the line.

    I leave this study apparently claiming that women think with their wombs for someone else to analyze. :P
ratcreature: The lurkers support me in email. (lurkers)

[personal profile] ratcreature 2018-12-12 09:06 am (UTC)(link)
I don't think it even has anything to do with interaction increasing. They told people to essentially fast a bit from an indulgence (the widely held assumption is that it is "good" not to be too attached to screens, which these college kids would have been told since childhood screen time limits imposed by their parents) and gave them a feedback method to perform their commitment and virtuousness (rather than having to chronicle their continued indulgence like the control).

I think a control group reporting they ate green vegetables once a day or drank water instead of a soft drink might have shown the same positive mental health effect, and the study may not say anything about social media as such at all.
recessional: a photo image of feet in sparkly red shoes (Default)

[personal profile] recessional 2018-12-12 08:19 pm (UTC)(link)
Do not get me started on the bs around "screen time" I will scream and scream and scream.

(It particularly gets me because there are actually specific, actionable dangers and potential dangers associated with overstimulation and distress caused by continual access to certain things that increase markedly in situations where there is unmediated access to things like smartphones or tablets etc etc etc but DOING ANYTHING ABOUT THESE REQUIRES ACTUALLY ENGAGING WITH MEANINGFUL ASSESSMENTS OF THESE THINGS AS TOOLS AND NOT WEIRD MORAL-CLASS-MARKER BS ABOUT "SCREEN TIME" THAT OBFUSCATES AND MAKES BULLSHIT OF EVERYTHING . . . .)

(I may have a Thing here.)
nicki: (Default)

[personal profile] nicki 2018-12-13 01:43 am (UTC)(link)
Could be other things. My assumption is based on the very very specific situation their group of participants was sourced from.