Why Government-Provided Data May Actually Be Bad For The Economy

Date:

Share post:

Government data are usually described as a public good. A dataset collected once can be used by many researchers for many projects. Making data available lowers the cost of research and can also make government policy more transparent. But that argument misses what cheap data does to career incentives. When government supplies the raw material for publishable research, academia starts to look more attractive to people who might otherwise take their skills elsewhere.

A useful dataset can do more than answer a clever research question; it can make an academic career more viable. The right data might give a dissertation enough traction to become a whole sequence of papers, and for a young researcher with strong quantitative skills, that can make academia look safer than a job on a product team. In those cases, taxpayers have done more than pay for data collection. They have helped tilt a career choice toward academia.

Government data can become a public bad at the margin. It lowers the private cost of academic production while the social value of the marginal paper is often trivial. The worker who might have built software, improved logistics, or evaluated private investments instead produces another study designed for referees. While that may look like knowledge creation, much of the time it moves talent into a market that rewards citations and prestige instead of profit and customers.

Incentives Matter

The scale of the data subsidy is no longer small. Data.gov reports 361,525 datasets in its catalog. IPUMS reports 2.6 billion records, more than 2,500 datasets, and a user community above 340,000. In 2025 alone, IPUMS says it delivered 868 terabytes of data and more than 1,300 data requests each day. A large research infrastructure now exists for people who turn these records into scholarly output.

The academic reward system has increasingly favored this kind of work. The American Economic Association says its journals publish papers only when data and code are clearly documented and access is not exclusive to the authors. That rule is sensible as a transparency measure. It also signals that the dominant scholarly product is now a data product. A researcher who can assemble data and code has a better shot at publication than a researcher armed mainly with theory or practical experience.

Research itself is changing as a result. Economists Prashant Garg and Thiemo Fetzer analyzed more than 44,000 working papers from the National Bureau of Economic Research and the Centre for Economic Policy Research from 1980 to 2023. They found that the share of causal claims per paper rose from 7.7 percent in 1990 to 31.7 percent in 2020. Economics is the clearest case, but the pattern is wider. Political science, sociology, psychology, public policy, business, and economic history increasingly reward the person who can use data to claim identification.

That is evidence of a major change in talent allocation. Economics teaches that people respond to prices. When government lowers the price of producing publishable empirical work and journals raise the return to that work, more capable data analysts will enter or remain in academia than would have done so otherwise.

Industry values researchers more

One indication that talent is being drawn away from higher-valued market uses is the pay gap. The National Center for Science and Engineering Statistics reports expected 2024 salaries for new doctorate recipients with definite commitments. In social sciences, the median industry salary was $129,000, compared with $75,000 in academia. In mathematics and statistics, the comparison was $150,000 to $68,000. In computer and information sciences, it was $180,000 to $100,000.

Granted, pay is an imperfect measure of social value. Markets make mistakes and universities produce some public benefits. Still, wages are useful evidence. Firms pay when workers are expected to create value for customers and investors. When the same analytical talent earns much less in academia, academia is buying labor at a discount, perhaps because it offers nonmarket benefits such as status and autonomy.

Another factor is that academia offers access to the body of scientific peer-reviewed research, as well as easier access to inputs in research such as public data. Economist Scott Stern made a related point two decades ago, finding that scientists accepted substantial wage discounts for jobs that allowed publication and scientific freedom. That can be good for science when the work is genuinely helpful. It is less desirable when the subsidized activity is yet another clever regression on data that few people outside the citation mill economy will ever use.

Academia is a poor allocator of talent

Academia does not allocate talent especially well once people enter it. Hunter Wapman and coauthors studied 295,089 tenure track faculty at Ph.D. granting universities from 2011 to 2020. They found that 80 percent of domestically trained faculty came from only 20.4 percent of universities. The top five training institutions alone produced 13.8 percent of faculty. The typical professor worked at a university 18 percent lower in prestige than the institution where the doctorate was earned. Self hires were 9.1 percent of faculty.

Academia is a system based on hierarchy. Prestigious departments generate more future professors mostly because they are prestigious. The system then tells itself that the outcome reflects merit. A country that subsidizes this pipeline with ever richer public data should expect more talented people to line up in the queue for status rather than test their skills in the marketplace.

Talent misallocation also stems from family background. Allison Morgan and coauthors surveyed 7,204 tenure track faculty across eight disciplines and found that 22.2 percent had a parent with a Ph.D. Faculty were 12 to 25 times more likely than the general adult population to have a Ph.D. parent. Their childhood ZIP codes had median incomes 23.6 percent higher than the national ZIP code median. A system that over represents the children of academics is unlikely to be finding all of the highest value talent.

The tenure system does not cure the problem either. Theodore Masters-Waage and coauthors studied 1,571 promotion and tenure cases at five universities and found that underrepresented minority faculty received 7 percent more negative votes and were 44 percent less likely to receive unanimous positive votes. The penalty was concentrated among candidates with lower h-index values. The message is grim. When signals are noisy, academia falls back on bias, pedigree, and crude measures of impact like citation counts.

More papers do not mean more progress

A country also needs to ask what it gets for the extra academic labor. Nicholas Bloom and coauthors argue that ideas are getting harder to find. They point to semiconductors, where by 2014 the number of researchers needed to maintain Moore’s law-style progress was more than 18 times larger than in the early 1970s. More research labor is being poured into the system for less progress at the margin.

A broader study by Park, Leahey and Funk examined 45 million papers and 3.9 million patents over six decades and found that both papers and patents have become less disruptive. While much good research goes on, the marginal output of the knowledge industry is clearly diminishing. In fact, in empirical social science, the danger is not just low value, but negative value. A weak causal claim can justify a new regulation, a subsidy, or a tax increase. When data helps produce papers that are then used to expand government control over the economy, the cost goes beyond salaries to include all the bad policy that is made more credible by the tables, reports, and other support researchers lend to policy to give it the aura that it is backed by hard science.

The Case for Fewer Public Datasets

Census data, budget data, crime data, health data, and regulatory data can be necessary for oversight. Citizens should be able to see what government is doing. But there is a difference between transparency for the public and research fuel for academic production.

Data releases should be treated as subsidies with opportunity costs. Recent federal guidance points in this direction. The Trump administration has proposed that federal agencies evaluate data collection and dissemination activities more carefully, emphasizing whether information products serve a clear public purpose and can justify their costs. That is a sensible starting point. The same logic should apply not only to data collection but also to the creation and release of large research datasets. Agencies should ask whether a dataset advances transparency, accountability, or a concrete operational need, or whether its primary effect is simply to lower the cost of producing another round of academic publications.

Large research datasets should face sunset review. Fewer public sector datasets should be constructed in the first place. Where data are produced, agencies should impose user fees to better align private incentives with social costs. Agencies should also track who uses the data, what gets produced, and whether the work changes any decisions outside the academy. Where the main output is more publication in an already crowded literature, a public benefit should not be assumed.

AI Supercharges Overproduction of Research

Artificial intelligence is likely to reshape the landscape again. Large language models can already clean messy administrative files, write and debug code, and produce a serviceable literature review in an afternoon. For academia, some of that disruption will be beneficial. It will remove much of the drudgery that once justified another graduate student and shortens the timeline from question to paper. A dissertation that once required years of painstaking data construction may soon take only weeks. If so, talented people will have more freedom to direct their efforts toward work that creates value elsewhere in the economy.

But AI cuts both ways. When a model can pair newly released data with an empirical strategy in seconds, the supply of publishable findings expands dramatically. Research effort becomes less constrained by time, coding ability, or data cleaning, and more constrained by access to raw material. If the result is an ever-growing stream of papers that primarily generate academic prestige rather than socially useful knowledge, then data itself becomes the relevant policy margin. In an age of nearly frictionless analysis, limiting the creation and release of public datasets may be one of the few practical ways to constrain the subsidized production of academic research.

Starve the Beast

“Starve the beast” is a fiscal slogan associated with limiting government growth by restricting the revenue available to it. Here, the beast is not government spending but the publicly subsidized production of academic research. If AI makes it possible to turn almost any dataset into a stream of papers, then controlling the supply of new datasets becomes one way to impose efficiency on the system. Starving the system of data is a powerful tool because it works at the source.

Free government data looks harmless because the lost alternative is invisible. The public sees the dataset, the publication, or the conference panel. It does not see the analyst who never joined a firm, the product that never got built, or the private investment that never got evaluated. A country that subsidizes publishable low-value findings should not be surprised when many smart people spend their lives producing them.

The next debate over government data should ask what each dataset is actually for. Does it help the nation achieve a genuine public purpose, or does it mainly manufacture academic careers? Where the answer is the latter, data provision becomes raw material for a public bad.

Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Related articles

Bruno Mars Ties One Of The Top Female Artists Of The Past Two Decades

Bruno Mars keeps "I Just Might" at No. 1 on Billboard's Radio Songs chart for a sixteenth week,...

Athletics Send Braves’ Matt Olson Message As They Recruit For Las Vegas Move

ATLANTA, GEORGIA - APRIL 15: Matt Olson #28 of the Atlanta Braves in the dugout before a game...

Obamacare’s Enrollment Figures Deserve A Closer Look

“Declining enrollment is not necessarily evidence of a failing insurance market,” says health expert Sally Pipes. “It may...

San Francisco Giants’ 3-Year Outfielder Leaves MLB After Slow Start

SAN FRANCISCO, CALIFORNIA - SEPTEMBER 27: Jerar Encarnacion #59 of the San Francisco Giants reacts after hitting a...