Short Attention Span Summary
With the rise of free open access medical education, how can we judge a good blog post from a bad one? One answer is gestalt – subjective, qualitative impression. But can we rely on individual gestalt that a blog post is high quality? They found that individual ratings of 20 blog posts had poor correlation within the various levels of training – medical student, resident, and attending. But if at least 42 people evaluated a blog post’s mean gestalt quality, there was very good correlation among medical students, residents, and attending physicians. Think of it like ratings on Yelp or Amazon. Fans or haters will rate very high or very low, but the crowd tends to find a mean rating you can trust. It’s the same with medical blogs, but you need a crowd size of at least 42 to get a reliable rating.
A group of people, at least 42, can reliably rate the quality of a blog. Individual rating of gestalt quality was not a reliable measure. Ironically, I couldn’t find any other FOAM sites talking about this paper, but the Metriq Study website is pretty slick.
Ann Emerg Med. 2017 Mar 2. pii: S0196-0644(16)31662-6. doi: 10.1016/j.annemergmed.2016.12.025. [Epub ahead of print]
Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study.
Thoma B1, Sebok-Syer SS2, Krishnan K3, Siemens M4, Trueger NS5, Colmers-Gray I6, Woods R4, Petrusa E7, Chan T8; METRIQ Study Collaborators.
1 Department of Emergency Medicine, University of Saskatchewan, Saskatoon, Saskatchewan, Canada; Health Professions Education Program, Massachusetts General Hospital Institute of Health Professions, Boston, MA. Electronic address: firstname.lastname@example.org.
2 Centre for Education Research & Innovation, University of Western Ontario, London, Ontario, Canada.
3 Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.
4 Department of Emergency Medicine, University of Saskatchewan, Saskatoon, Saskatchewan, Canada.
5 Department of Emergency Medicine, Northwestern University, Chicago, IL.
6 Department of Emergency Medicine, University of Alberta, Edmonton, Alberta, Canada.
7 Health Professions Education Program, Massachusetts General Hospital Institute of Health Professions, Boston, MA.
8 Division of Emergency Medicine, Department of Medicine, McMaster University, Hamilton, Ontario, Canada.
Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs.
We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson’s correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality.
One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8).
The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment.
Copyright © 2017 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.