Introduction
Shared Research Resources (SRRs),1 also known as Core Facilities or Research Service Centers, are an integral part of providing research resources like access to advanced equipment, innovative research technologies, software solutions, and expert personnel. Additionally, SRRs offer hands-on research training to undergraduate and graduate students, postdoctoral fellows, faculty, and other personnel. SRRs provides these products and services using the latest technologies under a variety of platforms at reasonable cost to users or free, depending on the support sources due to the efficiencies of scale2 and quick experimental turnaround times enabled by the experience of personnel running these facilities. In addition, SRR personnel maintain and calibrate the equipment, provide scientific project consultation, interpret results, and facilitate research collaboration, as they possess the expertise, education, and years of experience in the specific area. Shared Research Resources fulfill several roles needed for the development and advancement of innovative science and reproducible research conducted by research enterprises in the US and in the international scientific community.3
SRRs are typically thought of as those housed in R1 academic institutions, such as public and private universities, or non-profit research organizations. However, in addition to the core facilities located in the educational and non-profit institutions, national laboratories like the National High Magnetic Field Laboratory (https://nationalmaglab.org), Brookhaven National Laboratory (https://www.bnl.gov), as well as regional research hubs like New York Structural Biology Center (https://www.nysbc.org), Pacific Northwest Center for Cryo-EM (https://pncc.labworks.org) also have several SRRs. Like the SRRs located in the academic and non-profit institutions, the regional and national SRRs provide in-person and remote services that cater to large cross sections of users throughout the country4 and to the international scientific community.5
Providing the latest research resources and training to a significant user base leads to many exciting opportunities for discovery, novel solutions, and collaboration. Regional and national facilities can obtain federal, state, and local funding for purchasing state-of-the-art equipment and the latest technologies while attracting and retaining high-caliber scientists. In addition, these facilities are home to instruments that set world records in terms of power and utility. At the same time, as the number of users increases, new challenges emerge. These include managing and working with users from various research disciplines with differing levels of training, handling and archiving vast amounts of data, scheduling experiments, maintaining, and calibrating equipment during heavy and constant usage. In this session, speakers from three different SRRs who provide resources to both biological and non-biological research teams will share their experience in running these scientific core facilities. They will also discuss how they manage some of the challenges associated with running such a large operation.
I. Some challenges and solutions in operating The New York Structural Biology Center
The New York Structural Biology Center (NYSBC) houses top-end instrumentation and expertise that serves regional and national users, providing access to innovative integrated structural biology equipment and training. NYSBC’s capabilities include x-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other supporting technologies. The center houses several NIH-funded national facilities, with users from across the United States who send samples or visit for data collection and training.
Before joining NYSBC, I directed a set of core facilities that primarily served the local community of a large biomedical research campus. In those facilities, users were familiar faces; facility staff could provide personalized service, and throughput was low. Centers such as NYSBC face challenges that local-serving facilities do not typically encounter. This includes a diverse user base in terms of expertise, expectations, and preparation. Some users may be experts, while others have no experience but need to access the technology. Regional facilities must also be efficient in throughput and handle more complex logistics. These demands create an inherent tension: the diverse needs of the user base are best met with personalized service, while efficiency means staff do not have the time to provide a “boutique experience.” These challenges are not unique to NYSBC but are likely to be faced by any regional or national facility, but can be partially met by adjusting operations in specific ways:
Standardize access, input, and output. The higher volume that comes from more users can overwhelm a facility if each user’s expectations and experimental input/sample are different. This is true both scientifically and logistically. Therefore, it is beneficial to establish standard protocols for how the user must prepare samples, how samples will be handled and processed internally, and how data will be delivered to the user. This model necessarily makes each user’s experience less tailored, but it increases efficiency and the overall quality of service.
Get users to a minimal level of knowledge. With increased volume, facility staff have less time to devote to the intellectual aspects of a project; thus, the responsibility for understanding the nature of the experiment, identifying important variables to consider, and interpreting results must fall back on the user. It is beneficial, therefore, to have structured training or educational materials available to users to bring them to a minimum level of theoretical and practical knowledge. This training could be short, organized training sessions or referrals to online materials that established centers often have available.
Consider the need for some specialized staff. Local facilities may only require 1-2 staff members to handle the full range of responsibilities, including scheduling, ordering/budgeting, instrument operation and maintenance, training, and other related tasks. As demands increase, staff can become overwhelmed and poorly suited to the required tasks. Specialized staff to handle logistics, administration, finances, training, maintenance, etc., becomes necessary to keep operations smooth while not degrading scientific quality. This is a challenge when financial resources are limited but is critical; a hidden source of decreased facility performance is when the time of skilled scientific staff is diverted into administrative tasks.
Enable some staff research and collaborations. A regional or national facility will benefit from standardized procedures, but within a more factory-like operation, it is essential to find ways for scientific staff to use their skills in meaningful and creative research. This can take the form of collaborations with select users when the project requires a deeper intellectual or experimental investment. Staff may also be engaged in self-initiated research projects that utilize the facilities’ capabilities; in some cases, this can lead to new grant proposals. This has several benefits for the staff, facility, and users. First, staff can utilize these research projects to stay technically up to date and develop new methods and protocols, which helps the facility remain at the forefront of science. Second, this guards against staff boredom and underutilization of their scientific skills, which benefits engagement and retention efforts.
In short, expansion of a local-serving research facility to one that is considered regional or national presents exciting opportunities and the ability to have a greater impact on a larger scientific community. It also presents new challenges that extend beyond size and scale, demanding novel approaches and careful consideration and planning for the inevitable changes that follow.
II. Data management challenges faced by an NSF major facility
The National High Magnetic Field Laboratory, or MagLab, is a major research facility primarily funded by the National Science Foundation (NSF) with the mission of providing the highest magnetic fields for research for users from a broad variety of research disciplines. It is home to seven user facilities split across three locations and host institutions. The MagLab has been in operation since 1994, and in that time, it has served tens of thousands of users and deployed new record-breaking instruments that have enabled countless scientific breakthroughs.6 Concurrently, innovations have emerged in the realm of data management standards and practices. Unfortunately, unlike its magnets, the MagLab’s approach to user data management has not kept up with the times.
Historically, the MagLab has taken a laissez faire approach to data acquisition and management which put almost complete control in the hands of a user project’s principal investigators (PIs). While the details differ across the seven user facilities, the responsibilities of data management, and where appropriate, ensuring public accessibility have fallen to PIs with minimal involvement from the MagLab’s user support scientists. This differs from other NSF major facilities, such as the IceCube Neutrino Observatory or the National Ecological Observatory Network, where centralized storage and processing are often necessary to provide a science-ready data product to the user.7 On the contrary, at the MagLab, it has been routine for users to acquire data using their own acquisition systems and software or to be provided with the raw data acquired by an instrument, after which the MagLab has no further involvement. This approach is incompatible with expectations for data management expressed in the NSF’s recent policy documents. These policies apply to all NSF-funded researchers and necessarily include all users of NSF-funded research facilities.
The NSF’s Public Access Plan 2.0 (PAP)8 released in 2023 in response to a 2022 memo from the White House Office of Science and Technology Policy (known as the Nelson memo)9 describes how the NSF will implement the memo’s requirements regarding peer-reviewed scholarly publications and their associated scientific data. This includes requiring that publications be made available immediately in the NSF Public Access Repository (NSF-PAR) and that research data is made publicly available in an appropriate repository at the time of publication. At least part of the motivation for emphasizing immediate and public accessibility of research data is the desire to expose these resources for use in training artificial intelligence models, an initiative that has become a major federal government priority over the last two years. Furthermore, a newly revised draft of the NSF’s Research Infrastructure Guide (RIG) for major facilities contains a revised section calling for a cyberinfrastructure (CI) plan that addresses requirements related to open data principles, including the FAIR principles that ensure findability, accessibility, interoperability, and reusability.10Taken together, these documents place a greater responsibility of ensuring proper data management and dissemination on NSF major facilities, which they then must pass on to their users. The MagLab faces various challenges in meeting these new requirements.
First, the MagLab must address the fragmented and inconsistent data management practices across its user facilities. Currently, data and metadata are not routinely collected in ways that implement the FAIR principles. Each user facility utilizes different instrumentation, including custom and vendor-built software, which is used independently or in combination to generate a broad diversity of data types. Understanding and utilizing the data is often dependent on the expertise of the user or facility support staff due to the lack of metadata that would provide the necessary context. While a simple columnar, plain-text data file format is common, there are also vendor-specific and proprietary file formats at play, some of which can only be generated or read using software solutions that are no longer maintained by their developers, some of which no longer exist. Specific MagLab user facilities also face unique challenges. For example, in the MagLab’s Ion Cyclotron Resonance (ICR) user facility, the sizes of datasets collected by users often exceed those of other facilities by several orders of magnitude, placing a greater strain on the MagLab’s local data storage infrastructure and that of external repositories such as Open Science Framework that the lab uses to enable public accessibility.
An additional challenge is the lack of appropriate CI to facilitate the desired data management practices. Unlike many other major facilities, the MagLab does not currently have a data management system that catalogs the acquired data, allowing it to be associated with a particular user project and easily located. The files that the lab retains are managed in an ad hoc manner, typically by in-house personnel hosting a user, to varying degrees of effectiveness. Based on the examples of other NSF- and DOE-funded major facilities, many of which have multiple full-time positions dedicated to data management, addressing this problem would require a significant investment. Unfortunately, obtaining the financial resources for new CI for data management has never been a primary goal of funding requests. At the time of writing, the outlook for NSF funding in the upcoming fiscal year suggests that flat funding of the agency is the best-case scenario, making it unlikely that additional funds requested for new CI would be granted. Major facilities, such as the MagLab, may be able to address this challenge to some extent by triaging the desired features of their data management solutions and implementing them to the minimum extent required by funding policies. The MagLab is working to form collaborations with other NSF- and DOE-funded user facilities so that common solutions can be developed and adopted, decreasing the cost of implementation for all involved.
A further confounding factor is the difficulty of implementing cultural changes at the MagLab, which has a research culture that has become well-established and entrenched after over thirty years of operation. The various disciplines of research conducted at the lab have radically different attitudes and practices around data sharing. While some embrace the value of organizing data and metadata and making them available in a public repository, for others, openly sharing the raw data underlying the research and its associated metadata is far less common. Although new NSF requirements will make it mandatory for users to share their raw data, it is challenging for the MagLab to follow up with and ensure compliance from all users. Fortunately, many users are beginning to comply with open sharing of research data of their own accord, either because they see the value in it or because their own funders, institutions, or publishers also require it.
However, the goal of encouraging open data sharing is hindered by the lack of a definite timeline and method for implementing the NSF’s Public Access Plan 2.0. As shown by snapshots from the Internet Archive’s Wayback Machine, as late as July 8th, 2025, the NSF’s Public Access Initiative website11 featured a graphic with a tentative timeline implying that the NSF would implement its Public Access Plan 2.0 by January 1st, 2025, a date which was already in the past at that time. As of the time of writing, this graphic has been removed from the NSF site, and no alternate implementation date has been offered. The Nelson memo mandated the implementation of a new public access policy by December 31st, 2025. If the NSF intends to officially implement and enforce its new policy, then time is running short, as implementation will require a revision of the Proposal & Awards Policies & Procedures Guide, the official document governing the administration of NSF awards.12 Of course, given the potential for significant budget cuts,13 the abolition of the long-standing division structure,14 and loss of the building which was formerly their headquarters,15 it may not be a high priority for the agency. In the meantime, NSF major facilities must continue to prepare to implement and enforce these new data management practices to the best of their ability using the currently available resources.
III. High Throughput Scientific Discovery at the Micro-Scale: Laboratory Miniaturization and Beamline Integration
The CBMS Macromolecular Crystallography (MX) beamlines, FMX16 and AMX,17 are advanced facilities focused on accelerating high-throughput scientific discovery through macromolecular crystallography. These mature resources are supported through the P30 funding mechanism, which combines funding from the National Institutes of Health (NIH) and the Department of Energy (DOE), with contributions from each agency at about equal levels. Traditionally, these beamlines have provided robust, user-driven data collection services for structural biology researchers relying on crystallography methods. Recent demand shifts have introduced both challenges and opportunities: a spike resulting from the upgrade of the Advanced Photon Source (APS) superimposed on a gradual decline due to competing structural methods.
Rather than serving solely as a data collection point, the beamlines now offer upstream support for sample preparation and downstream support for data interpretation. Sample preparation support uses Acoustic Droplet Ejection (ADE) for fragment screening (to identify ligands) and for combinatorial crystallization (to identify crystallization conditions). The fragment screening program produces streams of structures and data to support user efforts in structural biology and drug discovery.18 The combinatorial crystallization facility has yielded new science in fields such as drug resistance19 and bioenergy.20
We increased accessibility through automation. Using raster scanning and automated data collection scripts, the system can identify crystals and collect data based on user-defined parameters provided in spreadsheet format. This allows researchers to perform complex experiments with minimal manual intervention, increasing throughput and efficiency. A complementary access mode has also been introduced through a “stand-by” data collection service. In this model, users send their prepared samples to the beamline without reserving specific time slots. This flexible, asynchronous model is designed to accommodate a broader user base that does not require real-time experimental oversight.
CBMS MX also provides an integrated suite of data analysis and interpretation tools. Automated data reduction pipelines, including fastDP and autoProc, enable rapid processing of diffraction data. Structure solutions are supported through automated workflows using DIMPLE, and data clustering tools facilitate the analysis of large numbers of similar yet non-identical datasets. Additionally, users receive detailed static reports that summarize key metrics from each data collection session, facilitating swift evaluation and informed decision-making.
In combination, these developments represent a significant evolution in the CBMS MX model—from a traditional, shift-based crystallography resource to an integrated, miniaturized, and highly automated scientific platform. These innovations are designed to meet the challenges of a changing structural biology landscape and to ensure that crystallography remains a robust and accessible tool for high-throughput bioscience discovery.
Conclusions
Shared Research Resources at the regional centers and national laboratories not only provide services to a large cross-section of users, but they also tend to offer certain unique opportunities that may not be available at the institutional level SRRs. For example, to study conformational changes that happen in biological macromolecules in sub-nanosecond time frames, scientists need to use X-ray diffraction produced by X-ray Free Electron Laser (XFEL), which are available in only national laboratories.21 However, like traditional SRRs, these cores also face challenges in dedicating substantial amounts of time for individualized training to all the users due to the sheer number of projects they handle. In addition, they face challenges in making the data immediately available to all the users and in data archiving and maintenance of data under FAIR principles. Despite these challenges, traditional, regional, and national SRRs play a crucial role in advancing science worldwide.
Acknowledgement
One of the authors (TS) would like to acknowledge the experience gained by managing the X-Ray Crystallography Facility (RRID:SCR_017922).