Data descriptors

The dataset for extending EMNIST evaluation

Julian Szymański, Kacper Skarżyński, Błażej Szutenberg, Klaudia Ratkowska, Szymon Drywa
DOI: 10.1038/s41597-025-06291-z

The paper describes the dataset for a deeper evaluation of the machine learning models for handwritten character recognition. For that purpose, we build a dataset that, combined with existing NIST Databases, offers possibilities for additional analysis of the models built on these data. The paper summarizes the most popular publicly available machine learning models, trained on the EMNIST-letters dataset. We discuss issues related to the evaluation of state-of-the-art results that have been made by comparing accuracy achieved on the test set built in cross-validation setting. We propose additional evaluation on new, independently constructed data, unaffiliated with the NIST database authors. The dataset and source codes have been made available using Gdansk Tech University repository Most Wiedzy.

Scientific Data

Gold standard, multi-genre dataset for named entity recognition and linking

Szymon Olewniczak, Julian Szymański
DOI: 10.1038/s41597-025-05274-4

In our study, we introduce a new dataset designed for the evaluation of entity-linking systems. Entity Linking (EL) involves identifying specific segments in a text so-called mentions and linking them to relevant entries in an external Knowledge Base (KB). EL is a challenging task with numerous complexities, making it vital to have access to high-quality data for testing. Our dataset is unique in that it encompasses texts from various domains, contrasting with the common focus on single domains, such as newspaper news, in most current datasets. Furthermore, we have annotated each identified text segment with its corresponding entity type, enhancing the dataset’s usefulness and reliability. This dataset employs Wikipedia as its Knowledge Base, which is the prevalent choice for general domain entity linking systems. The dataset is available to download from https://doi.org/10.34808/f3q9-9k64.

Scientific Data

Effects of raw and thermally processed spent coffee grounds on Miscanthus × giganteus plantation: Data description

Nicole Nawrot, Jacek Kluska
DOI: 10.1016/j.dib.2025.112432

Cultivating Miscanthus × giganteus (M × g) energy crop on marginal soil supports phytoattenuation and provides high-energy biomass for biofuel production. Improving nutrient-poor soil with low-cost recovered organic amendments, such as spent coffee grounds (SCG) and SCG-derived biochar (BC) offers sustainable benefits. This data article presents the findings from a medium-term greenhouse experiment at the Gdansk University of Technology assessing M × g cultivation on marginal soil with SCG and BC amendments into soil. In a pot-scale experiment the medium term-effect on M × g biomass growth, photosynthesis parameters, root tissues development, as well as final elemental composition was examined. Soil pH and elemental composition were also determined. As global coffee consumption increases, large quantities of SCG are generated and often landfilled. Their beneficial reuse aligns with circular economy principles and Sustainable Development Goals (SDGs 7 and 13), providing both a short-term nutrient source and a means of improving soil quality and resilience. The article compiles five datasets detailing: (1) M × g growth parameters, tissue development, and photosynthetic indices, (2) nutrient and caffeine leaching behaviour; and (3) elemental composition of plants and soils following exposure. These datasets, available in the Bridge of Knowledge Gdansk University of Technology repository, provide a resource for environmental researchers, soil and plant scientists, biochar specialists, and decisionmakers working to restore marginal soil usability. This study promotes sustainable land management by demonstrating how organic wastes and biochar can be combined to improve crop performance, sequester carbon, and reduce nutrient losses while minimizing external fertilizer inputs.

Data in Brief

Analysis of the Structure of the Register of Immovable Monuments: Dataset Series

Marta Kuźma
DOI: 10.34808/dd2025/pds/aup-1

This article describes the dataset series titled “Analysis of the Structure of the Register of Immovable Monuments”, which analyses data published by the National Heritage Institute in Poland. The series focuses on selected years — 2003, 2016, 2024, and 2025 — based on the availability of official records. It includes detailed measurements and classifications of immovable monuments according to various categories: by voivodeship, primary function, date of construction, ownership status, construction materials (e.g., timber-framed), and other specific characteristics of the collection, such as large-scale complexes. The primary objective of the series is to evaluate changes in the condition and structure of immovable monuments in Poland over time, using preserved registry data as a basis for longitudinal analysis.

MOST Wiedzy

Polish State Forests Database (2009–2019): A Temporal Forest Inventory with Calculated Biodiversity and Productivity Metrics

Piotr Krajewski
DOI: 10.34808/dd2025/pds/for-2

This dataset presents a comprehensive compilation of Polish State Forest inventory data spanning 2009-2019, encompassing approximately 7 million hectares of managed forest land. The database integrates measurements from annual forest inventories published in Aktualizacja reports by Polish State Forests (Państwowe Gospodarstwo Leśne Lasy Państwowe) with calculated ecological metrics derived through AI-assisted computational methods under human oversight. The compilation includes 1,260 primary records organized across nine analytical domains: basic metrics, species-specific measurements, biodiversity indices, forest composition, growth and productivity indicators, age structure analysis, and temporal trends. Data encompasses ten dominant tree species (Pine, Spruce, Fir, Oak, Beech, Birch, Alder, Aspen, Hornbeam, and Poplar) representing major ecological and economic groups in Polish forests. Each record contains measurements of forest area (hectares), timber volume (cubic meters), age class distributions, and derived metrics following established forestry science methodologies. The database employs a transparent data provenance system distinguishing between original measurements and calculated derivatives. All calculated metrics utilize documented formulas froForestry and Environmental Sciencem peer-reviewed literature, including Shannon diversity index, Simpson index, Pielou evenness, mean annual increment, and current annual increment. Computational processing employed Claude AI (Anthropic, claude-sonnet-4-20250514) for metric calculations and database organization, with all outputs verified through human oversight and validation procedures. The dataset supports temporal analysis of forest dynamics, biodiversity assessment, sustainable management evaluation, and comparative studies within European forest monitoring systems.

MOST Wiedzy

Electrical Conductivity Relaxation for the Analysis of STF Perovskite Properties

Aleksander Mroziński
DOI: 10.34808/dd2025/pds/me-3

The development of electrodes used in ceramic fuel cells is crucial for reducing operating temperatures and consequently operating costs. One of the key parameters influencing the electrochemical performance of solid oxide fuel cells (SOFC) is the rate of oxygen ion adsorption and incorporation or transfer by the oxygen electrode. Among the most promising SOFC oxygen electrode materials are those with mixed ionic and electronic conductivity. One of techniques for determining the chemical diffusion coefficients (D*) and chemical oxygen surface exchange coefficient (k*) in mixed ionic and electronic conductors is relaxation technique. This descriptor precisely describes how the published dataset of electrical conductivity relaxation (ECR) measurement data for the SrxTi0.30Fe0.70O3-d (STF70-x) electrode material was prepared. The described results are important because the influence of non-stoichiometry in the strontium sublattice on D* and k* has not been previously presented in the literature. As can be seen, the technique is reliable and relatively simple to apply, and such measurements should be performed for most studied materials to better understand them.

MOST Wiedzy

Comparison of research data collection processes: Marketing orientation in medical facilities and the assumptions of the tax system in the SME sector in Poland

Piotr Kasprzak, Wioletta Kukier
DOI: 10.34808/dd2026/pds/econ-4

Collecting research data is a fundamental step in the scientific process, providing the basis for empirically verifying hypotheses and developing knowledge. This article aims to compare the differences and similarities in the data collection process using two distinct disciplines: the marketing orientation of healthcare facilities (non-profit organizations) and the assumptions of the tax system in Poland in the context of SMEs. The tax research, illustrated by the doctoral thesis, was designed as a case study due to sample limitations resulting from the COVID-19 pandemic. It utilized a pilot study, a survey (22 questions, Likert scale), and in-depth interviews, initially planning random sampling but ultimately collecting responses from 274 companies. Marketing research, on the other hand, requires a multi-method approach (mixed methods), combining a Systematic Literature Review (SLR) with quantitative research (surveys, primarily 7-point Likert scales, questions about multidimensional effectiveness). Key methodological differences stem from the distinct nature of the variables involved: in marketing, these are behavioral and subjective constructs (e.g., loyalty), measured using psychometric scales, whereas in taxation, objective and financial constructs predominate (e.g., tax relief amount, accounting profit), leading to the predominance of econometric analysis and archival data. The article emphasizes that, despite these differences, both processes strive to reach conclusions based on reliable empirical material, albeit with varying degrees of extrapolation of results to the population.

MOST Wiedzy

Data descriptors | Politechnika Gdańska

Treść strony

Nawigacja Działania IDUB

The dataset for extending EMNIST evaluation

Gold standard, multi-genre dataset for named entity recognition and linking

Effects of raw and thermally processed spent coffee grounds on Miscanthus × giganteus plantation: Data description

Analysis of the Structure of the Register of Immovable Monuments: Dataset Series

Polish State Forests Database (2009–2019): A Temporal Forest Inventory with Calculated Biodiversity and Productivity Metrics

Electrical Conductivity Relaxation for the Analysis of STF Perovskite Properties

Comparison of research data collection processes: Marketing orientation in medical facilities and the assumptions of the tax system in the SME sector in Poland