{"success":true,"database":"eegdash","data":{"_id":"6953f4249276ef1ee07a3313","dataset_id":"ds004078","associated_paper_doi":null,"authors":["Shaonan Wang","Xiaohan Zhang","Jiajun Zhang","Chengqing Zong"],"bids_version":"1.6.0","contact_info":["Xiaohan Zhang"],"contributing_labs":null,"data_processed":true,"dataset_doi":"doi:10.18112/openneuro.ds004078.v1.0.4","datatypes":["meg"],"demographics":{"subjects_count":12,"ages":[25,30,26,28,26,25,27,23,24,25,23,24],"age_min":23,"age_max":30,"age_mean":25.5,"species":null,"sex_distribution":{"m":8,"f":4},"handedness_distribution":{"r":12}},"experimental_modalities":null,"external_links":{"source_url":"https://openneuro.org/datasets/ds004078","osf_url":null,"github_url":null,"paper_url":null},"funding":[],"ingestion_fingerprint":"68458c4c3f8b17cfc1c2f1a405f6468ec88543a21f94eb8c76821eb98252be59","license":"CC0","n_contributing_labs":null,"name":"A synchronized multimodal neuroimaging dataset to study brain language processing","readme":"#### Overview\nThis synchronized multimodal neuroimaging dataset for studying brain language processing (SMN4Lang) contains:\n1. fMRI and MEG data collected on the same 12 participant while they were listening to 6 hours of naturalistic stories;\n2. high-resolution structural (T1, T2), diffusion MRI and resting-state fMRI data for each participant;\n3. rich linguistic annotations for the stimuli, including word frequencies, part-of-speech tags, syntactic tree structures, time-aligned characters and words, various types of word and character embeddings.\nMore details about the dataset are described as follows.\n#### Participants\nAll 12 participants were recruited from universities in Beijing, of which 4 were female, and 8 were male, with an age range 23-30 year. They completed both fMRI and MEG visits (first completed fMRI then MEG experiments which had a gap of 1 month at least),  All participants were right-handed adults with Mandarin Chinese as native language who reported having normal hearing and no history of neurological disorders. They were paid and gave written informed consent. The study was conducted under the approval of the Institutional Review Board of Peking University.\n#### Experimental Procedures\nBefore each scanning, participants first completed a simple information survey form and an informed consent. During both fMRI and MEG scanning, participants were instructed to listen and pay attention to the story stimulus, remain still, answer questions on the screen after each audio was finished. Stimulus presentation was implemented using Psychtoolbox-3. Specifically, at the beginning of each run, there was instruction of \"Waiting for the scanning\" on the screen followed with 8 seconds blank. Then, the instruction became \"This audio is about to start, please listen carefully\" which lasted for 2.65 seconds before playing the audio; during audio play, a centrally located fixation cross was presented; finally, two questions about the story were presented each with four answers to choose from during which time was controlled by participants. Auditory story stimuli were delivered via S14 insert earphones for fMRI studies (with headphones or foam padding were placed over the earphones to reduce scanner noise) and Elekta matching insert earphones for MEG studies.\nThe fMRI recording was split into 7 visits with each lasting 1.5 hours in which the T1, T2, resting MRI were collected on the first visit, fMRI with listening tasks was collected from 1 to 6 visits, and the diffusion MRI were collected on the last visit. During MRI scanning including T1, T2, diffusion and resting, participants were instructed to lie relaxed and still in the machine. The MEG recording was split into 6 visits with each lasting 1.5 hours.\n#### Stimuli\nStimuli are 60 story audios with 4 to 7 minutes long, comprising various topics such as education and culture. All audios were downloaded from Renmin Daily Review website read by the same male broadcaster. The corresponding texts were also downloaded from the Renmin Daily Review website in which errors were manually corrected to make sure audio and texts are aligned.\n#### Annotations\nRich annotations of audios and texts are provided in the derivatives/annotations folder, including:\n1. Speech to text alignment: The onset and offset time of each character and words in the audio are provided in the \"stimuli/time_align\" folder. Note that the onset and offset time were added by 10.65 seconds to align with the time of fMRI images because the fMRI scan was started 10.65 seconds before playing the audio.\n2. Frequency: Character and word frequencies in the \"stimuli/frequency\" folder were calculated from the Xinhua news corpus and then log-transformed.\n3. Textual embeddings: Text embeddings computed by different pre-trained language models (including Word2Vec, BERT, and GPT2) are provided in the \"stimuli/embeddings\" folder. Both the character-level and word-level embeddings computed by Word2Vec and BERT model and the word-level embeddings computed by GPT2 model are provided.\n4. Syntactic annotations: The POS tag of each word, the constituent tree structure, and the dependency tree structure are provided in the \"stimuli/syntactic_annotations\" folder. The POS tags were annotated by experts following criterion of Peking Chinese Treebank. The constituent tree structure was manually annotated by linguistic students following PKU Chinese Treebank criterion with the TreeEditor tools and all results were double checked by different experts. The dependency tree structure was transformed from the constituent tree using Stanford CoreNLP tools.\n#### Preprocessing\nThe MRI data, including the structural, functional, resting and diffusion images, were preprocessed using the “minimal preprocessing pipelines (HCP)” .\nThe MEG data was first preprocessed using the temporal Signal Space Separation (tSSS) method and the bad channels were excluded. And then the independent component analysis (ICA) method was applied to remove the ocular artefacts using the MNE software.\n#### Usage Notes\nFor the MEG data of sub-08_run-16 and sub-09_run-7, the stimuli-starting triggers were not recorded due to technical problems. The first trigger in these two runs were the stimuli-ending triggers and the starting time can be computed by subtracting the stimuli duration from the time point of the first trigger.","recording_modality":["meg"],"senior_author":"Chengqing Zong","sessions":[],"size_bytes":677637752173,"source":"openneuro","study_design":null,"study_domain":null,"tasks":["RDR"],"timestamps":{"digested_at":"2026-04-22T12:26:02.614378+00:00","dataset_created_at":"2022-03-20T08:43:39.462Z","dataset_modified_at":"2023-10-16T02:05:05.000Z"},"total_files":720,"storage":{"backend":"s3","base":"s3://openneuro.org/ds004078","raw_key":"dataset_description.json","dep_keys":["CHANGES","README","participants.json","participants.tsv"]},"tagger_meta":{"config_hash":"4a051be509a0e3d0","metadata_hash":"2ca9e3d0efa083e0","model":"openai/gpt-5.2","tagged_at":"2026-01-20T10:26:42.745825+00:00"},"tags":{"pathology":["Healthy"],"modality":["Auditory"],"type":["Other"],"confidence":{"pathology":0.8,"modality":0.9,"type":0.8},"reasoning":{"few_shot_analysis":"Most similar few-shot example by stimulus channel is the dataset titled \"Subcortical responses to music and speech are alike while cortical responses diverge\", which is labeled Modality=Auditory and Type=Perception; it establishes the convention that continuous listening paradigms with sound stimuli map to Auditory modality. Another relevant convention example is \"EEG, pupillometry... digit span task\", where auditory presentation is labeled Modality=Auditory and Type=Memory, showing that Type depends on the primary cognitive construct (working memory vs other). For this dataset, the task is naturalistic story listening for language processing rather than memory span or basic auditory perception, which pushes Type toward Other (language) rather than Memory/Perception.","metadata_analysis":"Key population facts: (1) \"All participants were right-handed adults with Mandarin Chinese as native language who reported having normal hearing and no history of neurological disorders.\" (2) \"All 12 participants were recruited from universities in Beijing\" with age range \"23-30\".\nKey task/stimulus facts: (1) participants were \"listening to 6 hours of naturalistic stories\" and \"instructed to listen and pay attention to the story stimulus\". (2) \"Auditory story stimuli were delivered via S14 insert earphones for fMRI studies ... and Elekta matching insert earphones for MEG studies.\" (3) visual elements exist but are secondary: \"answer questions on the screen after each audio was finished\" and \"a centrally located fixation cross was presented\".\nKey study aim facts: (1) dataset is \"for studying brain language processing (SMN4Lang)\". (2) includes \"rich linguistic annotations for the stimuli, including word frequencies, part-of-speech tags, syntactic tree structures... embeddings.\"","paper_abstract_analysis":"No useful paper information.","evidence_alignment_check":"Pathology: Metadata says a normative cohort (\"normal hearing and no history of neurological disorders\"). Few-shot convention: such participants are labeled Healthy. ALIGN.\nModality: Metadata says primary stimulus is auditory stories (\"listening to... stories\"; \"Auditory story stimuli were delivered...\"). Few-shot convention (music/speech listening example) maps listening paradigms to Auditory. ALIGN.\nType: Metadata says purpose is language processing (\"studying brain language processing\"; extensive linguistic annotations). Few-shot suggests possible mappings: listening can be Perception (music/speech example) or Memory (digit span example) depending on goal. Here the goal is neither basic auditory perception nor an explicit memory construct; thus selecting Type=Other (language) follows the 'Type reflects research purpose' rule. PARTIAL CONFLICT with the generic 'listening -> Perception' tendency, but metadata-specific purpose wins.","decision_summary":"Pathology top-2: Healthy vs Unknown. Healthy supported by: \"normal hearing\" and \"no history of neurological disorders\"; no clinical recruitment stated. Winner: Healthy (ALIGN with few-shot Healthy conventions). Confidence=0.8 (2 explicit quotes).\nModality top-2: Auditory vs Multisensory. Auditory supported by: \"listening to... stories\" and \"Auditory story stimuli were delivered via... earphones\"; visual questions/fixation are secondary/ancillary. Winner: Auditory. Confidence=0.9 (3+ explicit stimulus quotes + strong few-shot analog to auditory listening).\nType top-2: Other vs Perception. Other supported by: \"studying brain language processing\" and presence of \"rich linguistic annotations... syntactic tree structures... embeddings\" indicating language comprehension/processing as the construct. Perception supported only weakly by the fact of listening. Winner: Other (metadata goal is language processing rather than generic perception). Confidence=0.8 (2+ explicit purpose/annotation quotes; few-shot analog is weaker)."}},"nemar_citation_count":4,"computed_title":"A synchronized multimodal neuroimaging dataset to study brain language processing","nchans_counts":[{"val":328,"count":720}],"sfreq_counts":[{"val":1000.0,"count":720}],"stats_computed_at":"2026-04-21T23:17:03.729259+00:00","total_duration_s":245131.73500000002,"author_year":"Wang2022_StudyBRAIN","canonical_name":null}}