Are you thinking of donating a corpus of speech/language data to FluencyBank?

We typically accept only transcribed files. If you only have raw audio or video, we would need to discuss the best ways to get your data into the linked CHAT transcript format that FluencyBank uses. However, we can convert virtually any other format to CHAT, including SALT. If your data contain personal identifiers, we can work with you to remove them.

We can host your data as fully open access or behind a password, depending upon the level of consent that you obtained from participants, the type of media (we can extract audio from video), and how identifiable the transcripts would be.

Corpora can be longitudinal or cross-sectional, mixed, or illustrative.

In a longitudinal corpus, one or more speakers are transcribed in a series of interactions over time.
Cross-sectional studies typically have groups of different speakers (perhaps divided by age, diagnostic group [e.g., individuals who do/do not stutter] or both).
A number of well-known corpora in fluency, such as the Illinois International Stuttering Project, are both cross-sectional and longitudinal. These corpora track individual children over time to identify potential factors in persistence and recovery.
Finally, some corpora in our teaching section illustrate specific behaviors, such as features of stuttering and typical fluency, response to DAF, discussions about affective/cognitive components of stuttering, and sample therapies.

Each contributed corpus should have a documentation file that we will use to construct a home page for the corpus. This file should contain the information that is indispensable for the proper interpretation of the data by other researchers or users. This should include the following:

Donor information. We post pictures of contributors and their contact information on the home page for all corpora. Please see individual corpora already in FluencyBank or CHILDES for examples.
Literature Citation
Restrictions. If the data are being contributed to TalkBank, contributors can set particular restrictions on the use of their data. For example, researchers may ask that they be sent copies of articles that make use of their data. Many researchers have chosen to set no limitations at all on the use of their data.
Warnings. This documentation file should also warn other researchers about limitations on the use of the data. For example, if an investigator paid no attention to correct transcription of speech errors, this should be noted.
Pseudonyms. The readme file should also include information on whether informants gave informed consent for the use of their data and whether pseudonyms have been used to preserve informant anonymity. In general, real names should be replaced by pseudonyms. Anonymization is not necessary when the subject of the transcriptions is the researcher's own child, as long as the child grants permission for the use of the data.
Project Description. There should be detailed information on the history, motivation, and procedures of the project. How was funding obtained? What were the goals of the project? How was data collected? What was the sampling procedure? How was transcription done? What was ignored in transcription? Were transcribers trained? Was reliability checked? Was coding done? What codes were used? Was the material computerized? How?
Codes. If there are project-specific codes, these should be described.
Demographic data. Wherever possible, demographic, dialectological, and psychometric data should be provided for each informant. Particularly for research data, there should be information on topics such as age, gender, schooling, social class, occupation, and so forth.
Situational descriptions. The readme file should include descriptions of the contexts of the recordings, such as the task or the nature of the activities being recorded. Additional specific situational information should be included in the @Situation and @Comment fields in each file, as appropriate. For example, in fluency, we’d certainly like to distinguish between conversation, monologue, narrative, experimental tasks, etc.
For data specifically contributed for teaching purposes, it helps us if you group and somehow label, both by filename and in the headers (@Comment) what concept you think the file best illustrates. This helps us to organize activities around your contribution. Your own exercises using these files are also welcome.

When these data are complete, please contact Brian MacWhinney (macw@cmu.edu) and Nan Bernstein Ratner (nratner@umd.edu) for instructions on how to transfer data through the WeTransfer system as described at https://talkbank.org/share/contrib.html THANK YOU for supporting FluencyBank.