Developers: Bill Grundy, Jared Bernstein, Elizabeth Rosenfeld, Amir Najmi, Psi Mankoski.
email: info@entropic.com
The database comprises about 5000 utterance files. These files include about 125 utterances from each of 40 different speakers, 20 male and 20 female. The recordings were all made with a high-quality, head-mounted microphone (Shure SM10A) in an office environment, and the utterances were digitized in 16-bit samples at 16 kHz.
Each of 13,000 sentences is identified by its own number sss1 through sss13000. The set of sentences was divided into 13 distinct sets of 1000 sentences each, and each successive speaker read from the next subset of 1000 sentences, rotating through the 13 subsets. For each speaker, the first 125 acceptable sentences are included in the Latino40 data base. It was necessary to reject 10 or 15 sentences for many speakers, and as many as 150 for one speaker, in order to find 125 acceptable ones. The following is a sample of 20 sentences from subset 3 that includes the longest sentence (sss3816) among the entire 13,000:
The forty speakers are identified as follows:
id age subset origin verifier note af01 30 5 Santa Cruz, Argentina af13 27 12 Buenos Aires, Argentina af14 43 5 Buenos Aires, Argentina am19 41 10 Buenos Aires, Argentina am26 28 4 Buenos Aires, Argentina bm21 30 2 Havana, Cuba bm22 55 4 Havana, Cuba cf11 52 2 Cali, Colombia cf30 34 5 Bogota, Colombia cm02 23 11 Bogota, Colombia cm05 40 7 Bogota, Colombia cm07 37 8 Bogota, Colombia gf10 30 13 Quetzaltenango, Guatemala gf18 27 1 Guatemala City, Guatemala poor reading gf20 34 12 San Marcos, Guatemala gf38 30 8 Guatemala, Guatemala gm06 29 10 San Marcos, Guatemala 119 sentences; poor reading gm17 18 12 Guatemala City, Guatemala hf28 43 9 Valparaiso, Chile hf39 39 11 Vina del Mar, Chile hm12 59 9 Santiago, Chile mf27 28 9 D. F., Mexico mm32 32 9 Durango, Mexico nf34 23 6 Granada, Nicaragua slow reading nf35 29 10 Managua, Nicaragua nm15 54 6 Managua, Nicaragua nm23 44 7 Managua, Nicaragua pf31 39 13 Lima, Peru pf33 37 2 Lima, Peru slow reading pf37 23 10 Cusco, Peru pf40 40 3 Lima, Peru uvular /rr/ pm03 36 3 Lima, Peru pm16 57 3 Lima, Peru pm24 31 4 Lima, Peru poor reading rf29 59 7 San Jose, Costa Rica rf36 35 11 San Jose, Costa Rica poor reading sf09 46 8 San Salvador, El Salvador sm04 24 6 San Salvador, El Salvador poor reading vf08 28 11 Valencia, Venezuela vm25 33 5 Caracas, Venezuela
The speakers wore a Shure SM10A unidirectional head-worn dynamic microphone, and controlled the recording session at their own pace using a recording program designed for the purpose. Control of the recordings was principally accomplished through a "record" button that displayed the text of the Spanish sentence, and initiated recording. The recording of a sentence was typically ended by pushing a "record next" button, that terminated the recording of the current sentence and then initiated the recording and display of the next sentence. Speakers had access to a full set of other controls that permitted them to play and re-record earlier sentences if they wished, and move about in the database they were constructing.
After an initial period during which an Entropic supervisor monitored the speaker's reading and recording control, speakers were left to monitor their own reading and recordings.
Speech signals went from the Shure microphone through a Rane MS-1 preamplifier into the 'line input' jack on the SGI Indy workstation. The gain of the Rane preamplifier and the SGI system were set and checked once toward the beginning of the recording session and were left fixed at that level.
Except for the carpeted floor, most surfaces in the room were hard and smooth. For example, subjects sat at a table with a plastified hardwood surface; there was a large white board immediately to the subject's right, and the wall behind the computer console was entirely glass.
Physical dimensions: 3.9 m x 2.9 m (floor to ceiling 2.7 m)
door <-------- glass wall ------------>
----------| | | | | |------------------------------------
| | | ________ | |
| | | | | | |
| | | table | SGI | | |
|cabinet| | | console| | |
| | | -------- | |
| | | | |
| | ============================== |
| | |
|_______| subject |
| seated |
| |
_ |
_ _________|
door _ | |
_ -------------- | file |
_ | bookshelf | | cabinet |
|-------------------------------------------------
File headers are formatted as in the following example: [as printed by SPHERE "h_read"]
database_id latino40 database_version 1.0 sample_rate 16000 sample_n_bytes 2 sample_sig_bits 16 sample_coding pcm,embedded-shorten-v1.09 channel_count 1 microphone Shure SM-10a prompt_type printed recording_site ERL Palo Alto native_language spanish geographic_origin Santa Cruz, Argentina age 30 gender Female sample_count 76801 prompt_text No habiendo objeciones, asm queds acordado. sample_max 14030 sample_min -13585 sample_byte_format 10 sample_checksum 64953 speaker_Id af01