The Influence of Changing Marginals on Measures of Inequality in Scholarly Citations: Evidence of Bias and a Resampling Correction

Scholars have debated whether changes in digital environments have led to greater concentration or dispersal of scientific citations, but this debate has paid little attention to how other changes in the publication environment may impact the commonly used measures of inequality. We demonstrate using Monte Carlo experiments that a variety of inequality measures – including the Gini coefficient, the HerfindahlHirschman index, and the percentage of papers ever cited – are substantially biased downwards by increases in the total number of papers and citations. We propose and validate a resampling-based correction for this “marginals bias,” and apply this correction to empirical data on scholarly citation distributions using Web of Science data covering four broad scientific fields (Health; Humanities; Mathematics and Computer Sciences; and Social Sciences) during 1996–2014. We find that in each field the bulk of the apparent decline in citation inequality in recent years is an artifact of marginals bias, as are most apparent inter-field differences in citation inequality. Researchers using inequality measures to compare citation distributions and other distributions with many cases at or near the zero-bound should interpret these metrics carefully and account for the influence of changing marginals.

A LTHOUGH the structure of citations to scholarly articles has been studied since de Solla Price's (1965) seminal work, this line of research has recently been reinvigorated as a result of publication digitization and new forms of communication and search. These technological developments have led some optimists to claim that increased access to previous research will enhance exposure to new ideas and stimulate scientific discovery. However, others have worried that algorithmically driven tools will concentrate scientists' attention on a small number of "star" articles, leading to more derivative and less ground-breaking research. 1 Which tendency dominates will have important implications for the future of scientific advancement (Hamilton 1990;Eysenbach 2006;Evans 2008;Evans and Reimer 2009;Larivière et al. 2009; Barabási et al. 2012), largely because of the well-recognized advantages of epistemic diversity on innovation (Zollman 2010;Larsen et al. 2019;Weatherall and O'Connor 2019;Hofstra et al. 2020). The empirical evidence put forth thus far in studies of the distribution of scientific citations is contradictory. Focusing on the impact of the rise of online journal access, one study found evidence of increasingly concentrated citations (Evans 2008), whereas other analyses of aggregate trends over time revealed more diversified citations ). '96 '02 '08 '14 '96 '02 '08 '14 '96 '02 '08 '14 '96 '02 '08 '14  . See section S1 of the online supplement for the composition of the four broad categories shown above. All curves are smoothing splines with a span of 0.5. One exceptionally highly cited article in mathematics and computer science is omitted.
We contribute to this discussion by focusing directly on an unrecognized limitation in various inequality measures, including the Gini coefficient and the percentage of ever-cited articles, that are commonly used to gauge the level of concentration. Our specific concern is that when the unit to be distributed is indivisible (as are citations) and on roughly the order of magnitude as the number of targets (as are citations and publications), inequality measures are highly sensitive to changes in the input marginals. We investigate this problem in the context of scientific citations and demonstrate that marked and uneven growth in the number of publications and citations affects measures of inequality and confounds year-over-year and between-field comparisons.
As Figure 1 shows, in each of four broad disciplines, the number of articles published and citations to these articles has increased since 1996, in some cases dramatically. 2 Furthermore, the growth in the two quantities is not proportional, with the number of citations generally increasing more rapidly than the number of publications. This dramatic growth in publications and citations has caught the attention of others who study scientific knowledge production, most notably Wallace et al. (2009), who report that most of the decline in uncitedness between 1900 and 2006 is a result of the increase in subsequent publications (and total citations made by those publications). General discussion of the expansion in publications appears in studies of inflation in journal impact factors and articlebased citation measures (Althouse et al. 2009;Petersen et al. 2019), the aging of the scientific literature (Larivière et al. 2008;Parolo et al. 2015), and the growing myopia of science (Pan et al. 2018).
However, there has been no investigation of how these changes in the volume of publication and citation might bias interpretation of the specific measures of inequality typically used to capture how citations are distributed across the scientific literature. Because fully capturing the shape of a distribution with a single number is impossible, many different approaches to measuring inequality have been proposed. One simple approach is to calculate the share of one value or entry in the total distribution, such as the number of articles never cited (Fleder and Hosanagar 2009;Wallace et al. 2009;Zentner et al. 2013); another approach is to summarize the shape of the distribution with respect to its total deviation from a uniform distribution. The Gini coefficient (Salganik et al. 2006;Brynjolfsson et al. 2011;Varga 2019) and the Herfindahl-Hirschman index (HHI) (Evans 2008) are well-known examples of this latter approach. Each measure of inequality has limitations, most conspicuously that differently shaped distributions may generate the same value (Atkinson 1975) and the possibility of bias in small samples (Deltas 2003). Other, less appreciated problems plague their use in studies of scholarly citations: citations to articles are not divisible; the total number of citations is sometimes less than the number of citable articles; and in most fields, large fractions of articles are never cited, thus mixing large numbers of zeros into citation distributions Wallace et al. 2009;Brynjolfsson et al. 2011). Moreover, changes in the marginal number of articles and citations cause the severity of these problems in citation distributions to vary, which renders comparisons across time and across disciplines difficult. Ignoring these issues, scholars who study population-level citation behavior nevertheless use such inequality measures to draw substantive conclusions about changes over time (Huang et al. 2012;Ranasinghe et al. 2015;Yoon et al. 2017).
And yet, if the aim is to understand whether individual scholars' citing behaviors are changing in ways that aggregate to a different macro-level citation structure, we must be confident that changes in measures of the citation distribution reflect changes in individual decisions rather than other contextual shifts. Because the number of published articles and citations have been steadily increasing (Bornmann and Mutz 2015;Pan et al. 2018;Petersen et al. 2019), the overall volume of articles published and citations made can be treated as largely exogenous with respect to an individual scholar's choice of specific articles to cite. In the case of the structure of scientific citation, dramatic changes in the number of articles published and citations made will lead to substantial year-over-year changes in the theoretically possible levels of concentration or dispersion in citations. A simple example illustrates. If there were 1,000 articles published in a given year and only 500 citations made to those articles, the theoretical maximum in the percentage of articles cited at least once is 50 percent, whereas if there were a total of 1,000 or more citations made to those same 1,000 articles, the theoretical maximum of the percentage of articles cited rises to 100 percent. Similar, but more subtle versions of this problem arise for other measures of inequality. Taken together, these problems suggest that comparisons based on standard measures of inequality may be inadequate or even misleading when the marginals of the distributions of articles and citations change substantially over time. 3 Using data from the Web of Science, we first demonstrate that interyear comparisons of common measures of citation inequality are likely to be biased using a series of Monte Carlo experiments on hypothetical populations of articles. These experiments are constructed to hold patterns of inequality fixed across fields and periods while allowing the total number of articles and citations to follow their empirically observed trends over years and fields. These results reveal that marginal change in publications and citations itself is sufficient to produce dramatic temporal change in inequality measures. Next, we develop a bias-correction for inequality in the presence of changing marginals and show that this correction appears to completely remove the substantial bias created by trends toward higher total publications and citations. Then, we apply this correction to inequality measures of the observed population of citations. Our adjustment reveals that, irrespective of field, the large majority of the apparent decline in citation inequality in recent years is an artifact of bias induced by changing marginals. Rather than declining, citation inequality in the Web of Science database appears to be largely stable over recent decades. Finally, we apply the same correction method to reduce marginals bias when making comparisons between broad fields. After adjustment, most interfield differences in citation inequality are also revealed to be an artifact resulting from differences in the size of fields.

Citation Data and Inequality Measures
We analyze publication and citation data for four broad disciplinary fields that were the focus of Larivière et al. (2009)-health, humanities, mathematics and the computer sciences, and the social sciences-using Web of Science data provided by Clarivate Analytics. 4 We categorize the four broad disciplinary fields following the National Science Foundation's taxonomy of disciplines created by the Integrated Postsecondary Education Data System survey. (See section S1 of the online supplement for further details of categorization.) Within each broad set of fields, we include research articles published in English-language journals between 1996 and 2014 and exclude editorial comments, books, and other nonresearch articles. Because of uneven coverage during much of the twentieth century, we limit our analyses to articles published between 1996 and 2014. 5 We drop one unusually wellcited 2004 article in mathematics and computer science 6 as an effort to understand the general temporal pattern in inequality measures. (See section S2.4 of the online supplement for results that include this outlier.) Generally following Larivière et al.'s (2009) approach, we construct a data structure that includes articles published between 1996 and 2014 and citations toward those articles using a series of two-year moving windows from 1996 and 2016. 7 For example, for all articles published in the social sciences in 2014, we identify citations to these articles from other articles published in the social sciences until 2016. Table 1 reports the total number of articles and citations in each broad discipline.
Using these data, we focus on four yearly, field-specific measures of citation inequality: the Gini coefficient; the proportion of articles published in a given year that received at least one citation; the proportion of articles needed to account for Notes: Compiled from the Web of Science (Clarivate Analytics). See section S1 of the online supplement for the composition of the four broad categories listed above. One exceptionally highly cited article in mathematics and computer science is omitted.
20 percent and 80 percent of the total citations received by articles published in a given year; and the HHI. 8

Monte Carlo Evidence of Bias in Measures of Citation Inequality
Our core claim is that much of the apparent decrease in citation concentration is not the result of changes in the underlying pattern of inequality in citations but instead an artifact of increases in the total number of articles published per field each year as well as growing numbers of total citations sent to those articles from subsequent publications. To demonstrate the theoretical plausibility of this claim, we perform a series of Monte Carlo simulations of four separate time series of hypothetical articles and incoming citations to those articles. In these experiments, we impose a counterfactual, fixed pattern of inequality while varying the total number of articles and citations based on the observed quantities from each field and year of the real-world data. Put simply, our simulations assume that the total number of articles and citations increases as in the real world, but that the distribution of citations follows a simple, fixed "power law-like" pattern that does not vary over time or fields. If the Gini coefficient and other commonly employed measures of inequality were truly unaffected by marginals, they would find the same degree of citation concentration across these experiments. Instead, we show that inequality measures can be dramatically biased when comparing citation distributions with varying total articles and citations. Formally, denote the ith article published in field j and year t as p ijt . Call the set of all such articles P jt , which includes |P jt | total articles. Next, denote as n jtk the number of future articles (over some chosen window of years) citing exactly k members of P jt . (Observe that the sum of kn jtk for all k ∈ {1, 2, 3, ...} is also the total number of citations, N jt , made to articles in P jt over the chosen window.) In the following, we focus on the potential distortion of inequality measures caused by variation in the marginals of these article and citation distributions across time and fields, specifically, variation in |P jt | and n jtk .
Drawing on these marginal quantities from the observed distributions of articles and citations, but no other real-world information, our Monte Carlo simulation consists of the following four steps: Step 1: Define the set of hypothetical citable articles published in year t. For each year t and field j, create a set of hypothetical articles φ ijt ∈ Φ jt , distinct from the empirically observed set of published articles p ijt ∈ P jt . Let |P jt | = |Φ jt |, so that the number of hypothetical articles matches the empirically observed count for that field and year.
Step 2: Define the aggregate number of citations sent back to articles published in year t. Let ν jtk indicate the number of hypothetical future articles citing exactly k articles in Φ jt . Then set ν jtk = n jtk so that the total number of hypothetical citations matches the total citations actually received by articles published in year t and field j. (This degree of specificity is required because each future article must send a discrete number of citations.) Step 3: Define a time-and field-invariant pattern of inequality in the distribution of incoming citations. For simplicity, we assume that articles come in four ranked categories: superstar articles (the top 1 percent of articles published in a field-year), star articles (the next 9 percent of articles), solid articles (the next 20 percent of articles), and weak articles (the bottom 70 percent). When a article sends an additional citation to a article published in year t, we assume that citation is r times more likely to land on a given superstar article than on a given star article. Likewise, that citation is r times more likely to be sent to a particular star article than to a particular standard article, and r times more likely to cite a given standard article than a given weak article. In our simulations, we set r = 4, which implies that when a future article adds a citation to an article in Φ jt , it is 64 times more likely to send that citation to a particular superstar article than to a particular weak article. 9 Step 4: Simulate citations to articles published in year t by articles published in later years. For each field j and year t, simulate a single hypothetical future article's bibliography by sampling without replacement k articles from Φ jt using the probabilities defined in Step 3. We repeat this exercise ν jtk times for each k ∈ {1, 2, 3, . . .} to build up the complete set of citations to articles published in field j and year t. We then count the number of times each article in Φ jt has been cited to create a simulated citation distribution. Finally, we summarize this distribution using each of our measures of inequality. (Step 4 should be repeated several times, averaging each measure of inequality across runs. We found even 10 simulations was sufficient to reduce Monte Carlo error to negligible levels.) The only thing that varies across simulations for different fields and years is the marginal number of articles and citations to articles; we have held constant the underlying structure of inequality in how likely a specific article is to receive a citation. Therefore, if the Gini coefficient (for example) is truly unaffected by field-specific or year-over-year changes in the marginals, we should observe the same Gini coefficient regardless of which field and year of marginals we use in the simulation. We illustrate the logic of our Monte Carlo experiments using the example of the social sciences ( Figure 2). As in other broad disciplines, the number of articles published in the social sciences-and the number of citations sent to those articles-have generally increased each year, with a particularly rapid rise in the first decade of the twenty-first century (left panel of Figure 2). To demonstrate the logic of marginals bias, our Monte Carlo experiments simulate a set of articles published, and citations to these articles, over a period of years. The pattern of inequality for incoming citations to these articles is fixed across years, but the total number of articles published and citations sent is set to match exactly the marginal quantities observed in the social sciences (middle panel of Figure 2). If the Gini coefficient were immune to marginals bias, these results-marked Simulated with Fixed Inequality-would be a perfectly flat line. Instead, the rising marginals of social science publications and citations produces a strong tendency to mistakenly infer declining citation inequality over time, even though the actual level of inequality in these simulations does not change. (As we shall see, this pattern also holds for other disciplines and even other inequality measures.) This result implies that Gini coefficients measured across years and fields with varying marginals are not directly comparable.
Although the simulation results reflect an assumed pattern of citation inequality, it is worth noting how remarkably they resemble-both in terms of average levels by field and changes over time-the actual Gini coefficients obtained from the Web of Science data (shown in the right panel of Figure 2 as Empirically Observed), a pattern that will hold across disciplines and inequality measures. This suggests two hypotheses: first, that the "power law-like" model of citations we adopt in our simulations is a plausible simple model of actual citation behavior; and second, that variation in total articles and total citations may have created the illusion of declining inequality over time when no such trend actually exists.

A Resampling Correction for Bias in Measures of Citation Inequality
Our Monte Carlo experiments suggest the Gini coefficient and other common inequality measures are unreliable guides for comparisons across time and fields and thus should be avoided. However, if the "marginals bias" can be corrected, we think these tools can still be used. To do this, we introduce a resampling correction and an R package, ineqReSample, which allows users to correct inequality metrics computed on their own data. 10 The key idea behind our correction is to choose a base year, for which we observe the total number of articles published and the total number of citations to those articles that follow. For each subsequent year, we resample the articles published in that year and the citations to them to have the same marginals as observed in the base year, thus preserving the underlying time-varying structure of citation inequality but in samples drawn with fixed total numbers of articles and citations.
Inequality measures computed based on resampled citations should be comparable relative to the base year for each field. This suggests that our adjusted measures could be employed in an analogous fashion to other metrics that need adjustment to a base year, such as seasonality or inflation adjustments in economic research (though we emphasize the causes of marginals bias are distinct from the processes underlying inflation and seasonal variation in economic data).
In the simplest case, the number of articles and citations are at their minima in an initial reference year. Adjusting inequality measures in subsequent years to be comparable to the initial year involves four steps: Step 1: Sample to match the original total number of articles. For each year t > 1, sample without replacement |P j,1 | citable articles from P jt ; call this subsample of articles Q jt . 11 Step 2: Sample incoming citations to match the original number of total citations. From all the cites to articles in Q jt , sample without replacement N j,1 citations. 12 Step 3: Compute comparable measures of inequality using the sampled citations to the sampled articles. These might include Gini coefficients, percentage of articles ever cited, quantilebased measures, the HHI, and other metrics.
Step 4: Repeat steps 1-3 and average the results to reduce sampling error. Even a small number of simulations is sufficient to reduce sampling error to negligible levels, though more should be used if the total number of articles and citations is low.
We demonstrate the accuracy of our resampling correction by first applying it to our simulation results, where we know the only potential explanation for varying Gini coefficients across time are changes in the total number of articles and citations. The line marked Corrected for Marginals Bias in the middle panel of Figure 2 shows that the resampling-corrected Gini detects no change in the level of inequality over time. Thus our Monte Carlo experiments show that this procedure successfully removes all of the bias introduced by changing marginals in the social sciences. (The same holds for each broad discipline and measure of inequality considered herein.) Our simulation-based adjustment has rendered the Gini coefficient comparable across years with varying marginals, revealing a common underlying pattern of inequality.
We next apply this adjustment to the empirical citation data for the social sciences Adjusted for Varying Marginals, as shown in the right panel of Figure 2. We expect unadjusted Gini coefficients to be noncomparable because of rising marginal counts of articles and citations, with a bias toward reporting declining inequality even if there is little or no actual reduction in the concentration of citations. Our adjustment shows this concern is warranted: the large majority of the ostensible reduction in the Gini coefficient appears to be an artifact of increasing marginals. Adjusting for these varying marginals reveals only a small reduction in Gini overall and essentially no change in citation inequality after 2005.

Adjusted Measures of Citation Inequality by Field and Indicator
In the remainder of the article, we report Monte Carlo results for each field and inequality measure and explore what happens when real-world citation data from each of the four broad disciplines are adjusted for marginals bias.

Gini Coefficient
We now expand our Monte Carlo simulation of the Gini coefficient across fields as well as years. The lines marked Sim in the top half of Figure 3 show the Gini coefficient of the citation distribution in simulations that assume a fixed pattern of inequality over time and fields but the same marginals as in the articles observed in Web of Science for that field and year. These simulations demonstrate that increasing marginals are sufficient to produce the illusion of declining year-to-year Gini coefficients, even if patterns of inequality remain constant. Moreover, for each field, the simulations track fairly closely with real-world data (marked Obs in the lower half of Figure 3), suggesting that the real-world increase in citations may be an artifact of changing marginals and not an indication of greater diffusion of citations. Once the simulated Gini coefficients are adjusted for changing marginals, they show no change over time in any field (see the lines marked Cor in the top half of Figure 3). Although the fields themselves still appear to have different levels of inequality after correcting for marginals, this is only because we have adjusted each time series of Gini coefficients to be comparable to the base year for that field.
Creating interfield comparable measures would require us to impose the same marginals to all fields in the resampling correction. 13 We now apply this approach to the actual empirical citation data for each field. The lower panel of Figure 3 shows two versions of the Gini coefficient calculated by field and year using the Web of Science data: an uncorrected version (marked Obs) potentially biased by changing marginals, and an adjusted version (marked Adj) that renders the Gini coefficients comparable (across years within the same field only) by resampling articles and citations to match the totals in the first year of each field's time series. Without adjustment, as in the prior literature, there appears to be a trend toward lower concentration of citations in most fields, with the greatest change in the first decade of the twenty-first century. However, adjusting for marginals reveals that this reduction in inequality is mostly a mirage. In the humanities, for example, the Gini coefficient appears to have changed not at all once the dramatic increase in citations over this period is accounted. Likewise, the Gini for the social sciences and for mathematics and the computer sciences appears to have fallen only slightly, with the vast majority of the apparent decrease merely an artifact of growth in articles and citations. Only in health, where the number of articles and citations to articles were already very high in 1996, does the apparent decrease in concentration appear genuine, though it is worth noting that inequality in health publications appears to be essentially constant after 2005.

Proportion of Ever-Cited Articles
The percentage of articles ever cited is both the simplest measure of citation concentration and the measure most likely to be affected by marginals bias. The logic is straightforward: if any given article has a fixed nonzero probability of being cited by each subsequent article, the probability of having at least one citation will increase as the total number of future articles and citations increases.
Here, we examine whether the share of articles cited within two years of publication is subject to marginals bias using both Monte Carlo simulation and the Web of Science corpus. The unadjusted observed articles ever cited (marked Obs in the lower half of Figure 4) are quite similar to earlier estimates from Larivière et al. (2009) and show differences across broad disciplines in the percentage of articles ever cited as well as generally upward trends in articles ever cited (i.e., declining concentration in citations). 14 However, our Monte Carlo experiments reveal that the percentage ever cited is the inequality measure most affected by marginals bias. The lines marked Sim in the top half of Figure 4 show the percentage of articles receiving any citations in simulations that assume a fixed pattern of inequality over time and fields, but the same marginals as in the articles observed in Web of Science. The simulations not only provide an eerily close match with the real-world data, they also show that increasing marginals are sufficient to produce rising percentages of articles ever cited, even if patterns of inequality remain constant. This suggests the realworld increase in the percentage of articles receiving citations may be an artifact of changing marginals, and not an indication of greater diffusion and diversity of citations. When we adjust the observed percentage ever cited for marginals bias  But to what extent are these apparent trends affected by changing total articles published and cited? Because of the well-known robustness of quantile measures of distributions, we expect these metrics to be less affected by marginals bias. Moreover, to the extent marginals bias is driven by the articles at or near the lower zero-bound of citations, we expect bias to be especially small for quantiles that mainly capture concentration at the top of the citation distribution, such as the percentage of articles accounting for 20 percent of all citations.
The top half of Figure 5 presents our Monte Carlo results, which suggest that the degree of marginals bias should be small for the broad disciplines of health, the social sciences, and mathematics and the computer sciences. In these fields, total articles and citations are substantial enough-and the top 20 percent of citations likely concentrated enough-that the presence of varying numbers of articles near the zero-bound is unlikely to substantially bias this metric. Humanities, on the other hand, appears to be subject to considerable bias even in quantile measures of inequality as a result of its small and rapidly shifting total citation count.
The bottom half of Figure 5 confirms these intuitions: the results for health, the social sciences, and mathematics and the computer sciences are largely unaffected by adjustment. However, the appearance of growing equality in the humanities after 2008 proves to be an illusion: adjusting for margins, the percentage of humanities articles accounting for 20 percent of citations has barely shifted since 1996. Overall, then, once adjusted for margins, there is no evidence in any broad discipline for declining inequality in this metric in the most recent decade of available data.
Turning to our second quantile-based measure, the percentage of articles accounting for 80 percent of citations over a two-year window, we find a pattern more similar to that of the Gini coefficient. Our Monte Carlo results (top panel of Figure 6) suggest there may be substantial marginals bias in this measure for the social sciences, mathematics and computer sciences, and humanities, with only health-with its much larger number of total articles and citations-largely immune.

B. Observed inequality and an Adjustment for time−varying marginals
Greater Inequality     This fits the intuition that even quantile-based measures can suffer from marginals bias if they focus on parts of the citation distribution that are likely to be strongly influenced by the proportion of articles at or near the lower zero-bound on citations.
Looking at the Web of Science corpus, we find that the unadjusted percentage of articles receiving 80 percent of citations rises in all fields, though mostly in the earlier years of our data. However, adjusting for marginals eliminates virtually all of the reduction in inequality. Once the changing total number of articles and citations is accounted for, it appears once again that only citations to pre-2006 health articles show evidence of a trend to greater equality. In other fields, particularly the humanities and mathematics and computer sciences, the adjusted percentage of articles accruing 80 percent of citations is essentially unchanging over time.

Herfindahl-Hirschman index
Finally, we apply the same analysis to the HHI. Computing the unadjusted HHI from the observed data from Web of Science suggests declining concentration in all broad disciplines except health, in which HHI is mostly constant with a slight increase since 2008 (see lines marked Obs in the lower half of Figure 7), matching the findings of Larivière et al. However, our Monte Carlo experiments suggest HHI for all four fields may be subject to a substantial degree of marginals bias (see the lines marked Sim in the top half of Figure 7). Applying our adjustment to HHI for the observed data reveals all of the apparent reduction in concentration to be an artifact of increasing total publications and citations over time. The adjusted HHI is essentially constant over time for the humanities, the social sciences, and the mathematics and the computer sciences. And in health, we find evidence that inequality has actually increased since 2007, once changing marginals are taken into account.

Adjustment to Fixed Marginals Across Fields and Time
In the preceding section, for each publication year after 1996, we resampled articles and citations to have the same totals as in 1996 by field. This strategy allowed us to trace within-field changes in citation inequality without being misled by marginals bias. We can also accurately note whether inequality is changing in similar ways across fields. In short, adjusting each field to its own set of references margins allowed us to address our primary research questions. However, interfield comparisons of the average level of inequality predominant in each field are still susceptible to marginals bias unless we adjust the total articles and citations to a common set of margins across fields. In other words, if we wish to assess which fields tend to be more concentrated or diffuse in their citations on average across time, we will need to make further adjustments for varying marginals across fields.
To allow such interfield comparisons for health, the social sciences, and mathematics and computer science, the results reported in this section resample each field-year of published articles and citations to those articles to have the same total counts (30,000 articles and 30,000 citations) regardless of field or year. 16 We refer to metrics computed from these marginals as "fully adjusted." We exclude the hu- The lines marked Cor remove the marginals bias in HHI using a resampling correction. The lines marked Obs in the bottom panel shows HHI over fields and time using the empirical data from Web of Science; these results are subject to marginals bias from differences in total articles and citations by field and year. Lines marked Adj adjust for marginals bias in the empirical data by resampling to the marginals in 1996 by field. Corrections and adjustments are omitted for the humanities in 1997-2002. All curves are smoothing splines with a span of 0.5. One exceptionally highly cited article in mathematics and computer science is omitted. manities (which had far fewer articles and citations, especially in the earlier years) from the fully adjusted comparison to avoid using uncomfortably small marginals, particularly for citations. Throughout this section, we use the same 2-year citation window.
In Figures 8 and 9, we report all five metrics of inequality under fully adjusted marginals. Overall, full adjustment reveals that most interfield differences in inequality levels are due to different marginals between fields. For example, the results reported in Figures 3, 4, 6, and 7 suggest that on most metrics, the health field seemed to have less inequality overall than other fields when margins are adjusted to field-specific reference years. However, this apparent difference is just another example of marginals bias. After we resample all three broad fields to have the same marginals, health and the social sciences have similar levels of concentration and similar trends when measured by the Gini coefficient (Figure 8), the percentage of articles ever cited (Figure 8), and the percentage of articles accounting for 80 percent of citations ( Figure 9). On the same three metrics, we find that citation concentration in math and computer sciences is slightly higher than the other two broad fields regardless of year. However, comparing the fully adjusted HHI ( Figure  8) and the percentage of articles accounting for 20 percent of citations (Figure 9), we find inequality in health may even be slightly higher than in the social sciences, whereas mathematics and the computer sciences appear more similar to health. These differences across metrics likely reflect concentration at different points in the distribution. As HHI and the percentage of articles accounting for 20 percent of citations are more sensitive to concentration at the top of the distribution than our other metrics, we infer that citations in mathematics and the computer sciences as well as health may be slightly more concentrated at the top of the distribution than citations in the social sciences. Looking across the whole distribution, mathematics and the computer sciences may be somewhat more concentrated than either health or the social sciences.
Finally, we see hints that citation concentration at the top of the distribution (as shown by HHI and the percentage of articles accounting for 20 percent of citations) is rising in recent years in mathematics and the computer sciences. However, all of these differences are very small; the key finding is that citation inequality is very similar not only over time but across fields as well. Thus, the results from interfield comparison suggest that full adjustment for varying marginals is essential for meaningful comparison of citation concentration across fields.

Discussion and Conclusion
In this study, we identify the existence of marginals bias that affects inequality measures used to study scholarly citations. We then propose a resampling correction method that removes the bias. After adjusting measures of inequality to account for increasing marginals, we find minimal change over time in the distribution of citations in most fields. Moreover, when we fully adjust marginals to give all fields the same number of articles and citations, there is little interfield difference in citation inequality. This substantive finding is revealed only after adjusting for the substantial changes in the number of articles published and citations made  Figure 8: Gini coefficient, percentage of articles with any citations, and Herfindahl-Hirschman Index for citations within two years of publication, 1996-2014: empirical results with adjustment to fixed margins across fields. Notes: All lines report results using empirical data from Web of Science. Lines marked Obs are subject to marginals bias from differences in total articles and citations by field and year. Lines marked Adj adjust for marginals bias in the empirical data by resampling to a total of 30,000 articles published per year and 30,000 citations sent back to those articles over the following two years, regardless of field. All curves are smoothing splines with a span of 0.5. One exceptionally highly cited article in mathematics and the computer science is omitted. Greater Inequality Greater Inequality Figure 9: Percentage of articles accounting for 80 percent and 20 percent of all citations within two years of publication, 1996-2014: empirical results with adjustment to fixed margins across fields. Notes: All lines report results using empirical data from Web of Science. Lines marked Obs are subject to marginals bias from differences in total articles and citations by field and year. Lines marked Adj adjust for marginals bias in the empirical data by resampling to a total of 30,000 articles published per year and 30,000 citations sent back to those articles over the following two years, regardless of field. All curves are smoothing splines with a span of 0.5. One outlier in mathematics and the computer science is omitted.
during the period we study. Failing to adjust for these changing marginals when using a variety of metrics-including the Gini coefficient, percentage of articles with any citation, various quantile measures, and HHI-has led some previous authors to conclude that there has been a decrease in the level of inequality in citations, and that scientific attention has become more diffuse Huang et al. 2012;Ranasinghe et al. 2015;Yoon et al. 2017). We believe this conclusion is incorrect, as are many of the conclusions based on comparisons of inequality across time and between groups (Evans 2008;Diem and Wolter 2013;Varga 2019). Moreover, we suspect marginals bias may affect other inequality measures not directly addressed in this article. For example, a small amount of Monte Carlo experimentation suggests the Theil index is also subject to substantial marginals bias, which our adjustment appears to correct.
Monte Carlo experiments presented in this article and its online supplement suggest that although increases in the number of publications and citations lead to downward bias in inequality measures, the magnitude of the marginals bias effect varies. What explains this variation? We believe the most likely explanation is the coarseness of discrete measures, especially near the lower zero-bound for citations. As the total number of articles and citations rises, a smaller proportion of articles are likely to fall at or near the lower zero-bound, and citation counts in general are likely to be more informative. This fits with the smaller downward bias that appears when the marginals from the health field are used in simulations: the health field in general had the greatest number of articles and citations as well as the smallest proportion of uncited articles. Similarly, the 6-year citation window, which accumulates more citations and reduces the share of articles receiving zero citation, is less vulnerable to this bias than the 2-year window. This logic also suggests that measures of inequality that are more sensitive to the extent of uncited or rarely cited articles-most obviously the percentage-ever-cited, but also Gini and HHIwill be more affected by varying marginals. In contrast, more robust measures of inequality based on quantiles-such as the percentage of articles receiving m% of citations-should be less sensitive, particularly when they measure regions of the distribution that contain articles far from the lower zero-bound of citations.
Our results comparing adjusted inequality measures again highlight the fact that different measures of concentration and inequality capture different aspects of distributions (Piketty 2014). For example, although it is empirically rare, it is theoretically possible for a distribution to be both highly concentrated and have a long tail. This is in fact what we observe in the health field. As measured by HHI and the percentage of articles needed to account for 20 percent of citations, inequality in health citations has increased since the mid-2000s. Yet over the same period, the percentage of health articles ever cited and the Gini coefficient for health citations show a weak pattern of falling concentration. These differences between inequality measures imply that concentrated scientific attention on a small number of very highly cited articles may go hand in hand with a longer tail in the citation distribution. Thus, even after adjusting for marginals bias, scholars should carefully select inequality measures depending on what aspect of inequality is of most interest, or consider using a variety of measures to capture subtle differences in the pattern of concentration. For example, if concentration of citations to a very few highly cited articles is suspected, HHI or the percentage of articles needed to account for 20 percent citations (or an even smaller percentage) may be helpful. However, if the purpose of analysis is to measure a long tail, either the proportion of ever-cited articles or the percentage of articles needed to account for 80 percent of citations (or some other large percentage) would be most effective. The Gini coefficient essentially averages these tendencies and therefore is less useful for investigating the specific nature of inequality.
Our conclusion challenges previous studies claiming that the scope of science has either narrowed (Evans 2008) or broadened ). Instead, we found that the level of concentration in citation inequality has remained relatively stable. On the one hand, this stability could reflect a lack of fundamental change. Although that would be consistent with our results, it is not the only possible explanation consistent with the evidence. If citation inequality is the product of several components, it could also be the case that stability is the result of wellbalanced opposing forces. We consider two candidate forces: one social, and the other technological.
First, although we identify a method that effectively adjusts for the growth of publication and citation counts, we recognize that the increased volume of scientific articles itself is the result of important changes in the incentives, norms, and practices concerning the production and consumption of science. From the perspective of a producer, the current generation of young scientists is under greater pressure to publish and be cited than prior generations (Warren 2019) and an overreliance on production metrics (Fire and Guestrin 2019). From the perspective of a consumer of knowledge, scholars must adapt to the environment by allocating their limited time and energy to digesting the ever-growing volume of prior research (Parolo et al. 2015;Pan et al. 2018). Ultimately, the rising pressure to publish could result in an increase in the fraction of low impact publications, a social force that could lead to greater concentration in scholarly citations. 17 Second, there have been dramatic changes in the digital environments in which scholars search, read, and organize literature-in particular, technological innovations that, in principle, make it easier for researchers to keep up with a growing literature without devoting more time and effort to the task. If true, this could result in them citing a broader set of articles. Thus one possible explanation for the lack of change in the level of inequality in citation distribution is that scientists are using technological change to compensate for social change in the production of scientific articles. But even if this is the case and the currently stable level of inequality is based on a balance of opposite effects, nothing guarantees these forces will remain balanced-especially if tighter academic labor markets accelerate scientific publication rates in the coming years. However, it is also possible to speculate that technology might encourage greater concentration in scholarly attention in response to increasing pressure to publish, particularly in fields that move quickly, such as computer sciences. Fast-moving fields frequently involve mass production of research results or strict conference deadlines, either of which may limit scholars' ability to read broadly. Our interfield analysis in Figures 8 and 9 supports this conjecture by revealing that the mathematics and the computer sciences field has a slightly higher level of inequality than health and the social sciences. The analysis also shows that the increase in concentration of citations toward the top of the citation distribution began around 2008, suggesting that computer scientists' early adoption of digital search tools, in combination with field-specific deadline pressures, may have contributed to the concentration of academic interest toward a narrow set of highly cited articles.
Although the empirical context for this study concerns scholarly citations, the methodological problem we identify extends to any context in which inequality measures are applied to indivisible count distributions containing many zeros. This pattern occurs when gatekeepers distribute scarce rewards across a large population; for instance, in the awarding of grants to investigators, and offers of admissions or jobs to candidates. In these examples, there are so few rewards per subject that comparison of inequality measures are vulnerable to the biases we identify in this article. In a similar vein, we expect to find evidence of this bias in rapidly expanding markets for songs, movies, or books, especially if the volume of consumption is relatively stable. As we demonstrate, adjustment is particularly important in contexts in which the target of behavior is discrete (as in citations or purchases) and many targets are rarely or never selected. To facilitate use of this method, we have created an open source R package, ineqReSample, that adjusts inequality measures with the resampling correction. More details on the package can be found at https://github.com/lanukim/ineqReSample.
Of particular interest for future research is the impact that information retrieval technology (e.g., search engines and recommender systems) is having on what is found, read, and cited in the scientific literature. Is technology narrowing or expanding our collective view of the literature? And what impact is this having on collective sense-making and, ultimately, on the success of science? In order to address these questions and related policy questions, we need measures that are unbiased, comparable over time and across fields, and reliably interpretable. We hope that our results of revealing and correcting marginals bias will help advance research around these important questions.
Notes 1 Studies investigating whether algorithmically driven online portals concentrate or broaden exposure are not limited to scientific citation behavior but also include consumer decisions in online clothing markets (Brynjolfsson et al. 2011), video rentals (Zentner et al. 2013), and music consumption (Salganik et al. 2006 7 We performed additional robustness checks using four-year and six-year citation windows, the results of which are provided in sections S2.2 and S2.3 of the online supplement. Temporal trends found in the longer citation windows are largely consistent with our findings in the main text.
8 HHI is a commonly used measure of market concentration computed by summing the squared market share of each firm. In our context, the market share is the citation count that one article receives divided by the total citation count. Usually, when the HHI is smaller, it means the market is more decentralized; however, HHI also tends to decrease as the number of participants rises. For example, when 10 companies equally share a market, HHI is 0.1 2 × 10 = 0.1, but when 100 companies equally share, HHI is 0.01 2 × 100, or 0.01. As this illustration shows, ceteris paribus, HHI will decrease if total publication counts increase.
8 and 9, we remove marginals bias that hinders interfield comparison by adjusting marginals to have the same number of articles and citations to all fields except humanities. We exclude humanities from these comparisons because its smallest marginals-for the early years of the humanities-are so much lower than other fields as to make cross-field comparison particularly difficult.
14 When we compare our results with those of Larivière et al., we focus on the years 1996-2005 and the two fields (social sciences and humanities) that most closely mirror Larivière et al.'s analyses.
15 The impact of increased publications and longer reference lists in newer publications on the proportion of ever-cited articles has been also found by Wallace et al. (2009).
16 The choice to set both articles and citations to the same number-30,000-is a coincidence driven by the minima of the observed distributions of articles and citations across these fields and years. It would be perfectly reasonable to set the total number of articles to a different common marginal than the total number of citations, so long as each marginal was kept the same across fields and years.
17 One could imagine the opposite direction as well. As more articles are published and as more subcommunities form in the literature, there may be a decrease in citation concentration. However, we think the argument for greater concentration is more plausible, given the likelihood of the Matthew effect in science (Merton 1968). We encourage future research to sort these alternative hypotheses out, taking care to adjust inequality measures for changing marginals.