{"success":true,"database":"eegdash","data":{"_id":"69d16e04897a7725c66f4c55","dataset_id":"ds007602","associated_paper_doi":"10.1088/1741-2552/ae54d0","authors":["Motoshige Sato","Masakazu Inoue","Kenichi Tomeoka","Ilya Horiguchi","Eri Hatakeyama","Yuya Kita","Atsushi Yamamoto","Ippei Fujisawa","Shuntaro Sasai"],"bids_version":"1.9.0","contact_info":null,"contributing_labs":null,"data_processed":false,"dataset_doi":"doi:10.18112/openneuro.ds007602.v1.0.1","datatypes":["eeg"],"demographics":{"subjects_count":3,"ages":[],"age_min":null,"age_max":null,"age_mean":null,"species":null,"sex_distribution":{"m":4},"handedness_distribution":{"r":4}},"experimental_modalities":null,"external_links":{"paper_url":"https://doi.org/10.1088/1741-2552/ae54d0"},"funding":["JST, Moonshot R&D Grant Number JPMJMS2012"],"ingestion_fingerprint":"927c8a4c5e6e294c9fe58b52b840cd14a42c06d7debae33f136b8e701e42c9ee","license":"CC0","n_contributing_labs":null,"name":"EEG-Speech Brain Decoding Dataset","readme":"# EEG-Speech Brain Decoding Dataset\n## Overview\nThis dataset contains EEG recordings and audio data.\n## Sessions\nSessions are labeled by recording date in YYYYMMDD format.\n- Example: `ses-20240401` = recorded on April 1, 2024\nMultiple recordings on the same day are distinguished by run numbers:\n- `run-N`: Nth recording of the day\n## Tasks\n- **speechopen**: Overt speech production task\n  - Participants vocalize visually presented text\n## File Format Notes\n### EEG Data\nRaw EEG data is stored:\n- **Path**: `sub-*/ses-*/eeg/*_eeg.edf`\n- **Note**: EDF format is not officially part of BIDS-EEG specification\n- Files are excluded in `.bidsignore` but documented here for reference\n- Future releases may include EDF conversions for full BIDS compliance\n### Behavioral Data (Audio)\nVocal recordings are stored in `beh/` directories:\n- **Path**: `sub-*/ses-*/beh/*_recording-vocal_beh.wav`\n- **Note**: Not officially part of BIDS-EEG spec, but included for analysis convenience\n- Excluded in `.bidsignore`\n## Directory Structure\n```\ndataset_root/\n├── README                          (this file)\n├── CHANGES                         (version history)\n├── dataset_description.json        (dataset metadata)\n├── participants.tsv                (participant information)\n├── participants.json               (participant column descriptions)\n├── task-speechopen_eeg.json        (task-level EEG metadata)\n├── task-speechopen_events.json     (events column descriptions)\n├── .bidsignore                     (files to ignore in validation)\n│\n├── code/                           (analysis and preprocessing code)\n│   ├── preprocessing/              (EEG and audio preprocessing)\n│   ├── training/                   (model training scripts)\n│   ├── evaluation/                 (evaluation metrics)\n│   └── bids/                       (BIDS conversion scripts)\n│\n├── sub-01/                         (participant data)\n│   └── ses-YYYYMMDD/              (session by date)\n│       ├── eeg/                    (EEG recordings)\n│       └── beh/                    (behavioral/audio data)\n│\n└── derivatives/                    (processed data)\n    └── pipeline-standard/          (standard preprocessing)\n```","recording_modality":["eeg"],"senior_author":null,"sessions":["20230829","20230830","20230831","20230901","20230904","20240821","20240822","20240829","20240902","20240906","20250522","20250523","20250526","20250527","20250528"],"size_bytes":53310633094,"source":"openneuro","storage":{"backend":"s3","base":"s3://openneuro.org/ds007602","raw_key":"dataset_description.json","dep_keys":["CHANGES","README","participants.json","participants.tsv","task-speechopen_acq-pangolin_eeg.json"]},"study_design":null,"study_domain":null,"tasks":["speechopen"],"timestamps":{"digested_at":"2026-05-31T16:31:38.951840+00:00","dataset_created_at":null,"dataset_modified_at":null},"total_files":113,"computed_title":"EEG-Speech Brain Decoding Dataset","nchans_counts":[{"val":134,"count":113}],"sfreq_counts":[{"val":1200.0,"count":113}],"stats_computed_at":"2026-05-31T19:34:32.603735+00:00","total_duration_s":159071.0,"tagger_meta":{"config_hash":"3557b68bca409f28","metadata_hash":"153532cb5c2c0710","model":"openai/gpt-5.2","tagged_at":"2026-04-07T09:32:40.872789+00:00"},"tags":{"pathology":["Healthy"],"modality":["Visual"],"type":["Motor"],"confidence":{"pathology":0.7,"modality":0.8,"type":0.7},"reasoning":{"few_shot_analysis":"Closest few-shot by task purpose is the \"EEG Motor Movement/Imagery Dataset\" example (labeled Modality=Visual, Type=Motor): it uses a visually presented cue and the research focus is movement execution/imagery. This guides mapping overt speech production (a motor act) to Type=Motor, with Visual modality because the prompt is shown on-screen. For modality conventions, the schizophrenia visual discrimination example (Modality=Visual, Type=Perception) reinforces that when stimuli are displayed on a screen, modality is labeled Visual even if responses include actions/clicks.","metadata_analysis":"Key task/stimulus facts from metadata:\n- Task is overt speech production: \"**speechopen**: Overt speech production task\".\n- Stimulus is visual text: \"Participants vocalize **visually presented text**\".\n- Dataset includes audio recordings but they are behavioral recordings of the vocal output: \"This dataset contains EEG recordings and **audio data**\" and \"Vocal recordings are stored in `beh/` directories\".\n- No clinical recruitment language is present; participants section only lists counts/sex/handedness: \"Subjects: 3; Sex: {'m': 4}; Handedness: {'r': 4}\".","paper_abstract_analysis":"No useful paper information.","evidence_alignment_check":"Pathology:\n1) Metadata says: no diagnosis/clinical recruitment mentioned; only \"Subjects: 3\" with sex/handedness.\n2) Few-shot pattern suggests: absent clinical terms typically maps to Healthy cohorts.\n3) ALIGN (no conflict).\n\nModality:\n1) Metadata says: \"Participants vocalize **visually presented text**\".\n2) Few-shot pattern suggests: screen-based stimuli -> Visual modality (as in the motor imagery and visual discrimination examples).\n3) ALIGN.\n\nType:\n1) Metadata says: \"**Overt speech production task**\" and \"Participants **vocalize**...\" indicating movement/articulation is central.\n2) Few-shot pattern suggests: when the primary experimental focus is movement execution (even if visually cued), Type=Motor (motor imagery/movement example).\n3) ALIGN (speech is a specialized motor behavior; no better dedicated 'Language' type exists in allowed labels).","decision_summary":"Top-2 candidates and selection:\n\nPathology:\n- Healthy: Supported by lack of any stated clinical diagnosis/recruitment and minimal demographics only (\"Subjects: 3...\").\n- Unknown: Also plausible because \"Healthy\" is not explicitly stated.\nWinner: Healthy (dataset description does not indicate any disorder-based recruitment).\nEvidence alignment: aligned.\n\nModality:\n- Visual: Explicit \"visually presented text\".\n- Multisensory: Dataset includes \"audio data\" (vocal recordings), but these appear to be recorded responses rather than presented stimuli.\nWinner: Visual (stimulus/input channel is clearly visual text; audio is output/behavioral recording).\nEvidence alignment: aligned.\n\nType:\n- Motor: \"Overt speech production\" / \"vocalize\" implies articulatory motor activity is the core task.\n- Other: Could be considered speech/language decoding rather than generic motor control, but no dedicated language label exists.\nWinner: Motor (best match to allowed labels given overt movement production is central).\nEvidence alignment: aligned.\n\nConfidence justification:\n- Pathology 0.7: inferred from absence of clinical terms (no explicit 'healthy').\n- Modality 0.8: directly supported by the explicit phrase \"visually presented text\" plus strong few-shot convention.\n- Type 0.7: directly supported by \"Overt speech production\"/\"vocalize\" but some ambiguity between Motor vs Other (speech/language decoding framing)."}},"canonical_name":null,"name_confidence":0.86,"name_meta":{"suggested_at":"2026-04-14T10:18:35.343Z","model":"openai/gpt-5.2 + openai/gpt-5.4-mini + deterministic_fallback"},"name_source":"author_year","author_year":"Sato2026_Speech","bad_channels_info":null,"acknowledgements":"We thank Yasuo Kabe, Sensho Nobe, and Akito Yoshida for valuable discussions on the research direction and data collection strategy during the early stages of this project. We are grateful to Mayumi Shimizu for her support in data collection and administrative coordination, and to Ryu Miyata for administrative support. We thank Yukihito Yomogida for his continued administrative support, including the preparation of ethics committee documentation and other regulatory procedures essential to this study. We also thank Ryota Kanai for his guidance as program manager of the grant program and for discussions on the data collection strategy, and Kai Arulkumaran for helpful discussions on the data collection approach.","ethics_approvals":["Shiba Palace Clinic Ethics Review Committee","Declaration of Helsinki"],"references_and_links":["https://arxiv.org/abs/2407.07595","https://www.isca-archive.org/interspeech_2025/inoue25b_interspeech.pdf","https://iopscience.iop.org/article/10.1088/1741-2552/ae54d0/meta"],"associated_paper_meta":{"channel":"text/normalized-doi","confidence":"high","author_overlap":0,"is_oa":true,"oa_status":"hybrid","source":"paper_resolver","method":"normalization"}}}