Tutorials Bugs Masterclass Free stuff Test pages Proposals

Basic concepts

Advanced concepts

RichInStyle.com CSS2 tutorial - Aural style sheets

Contents

Aural style sheets

Aural units

Volume

Speak

Speech-rate

Pause-before, pause-after

Pause

Cue-before, cue-after

Cue

Play-during

Azimuth

Elevation

Voice-family

Pitch

Pitch-range

Stress

Richness

Speak-punctuation

Speak-numeral

Speak-header

Aural style sheets

Aural style sheets are vital for accessibility purposes. They allow blind or partially sighted surfers to use your page and/or buy your product. It is important therefore that you consider them carefully with respect to CSS, as well as more general markup issues (e.g., using appropriate elements for their content). For example, if you have a piece of code that is easily readable to a sighted user, it might not be to a blind one - is speak-punctuation, etc. appropriately set.

Aural units

The units used for aural style sheets are angles, specified in rad (radians), deg (degrees), or grad (gradians). For example, 100deg.

Frequencies are specified in Hz (Hertz) or kHz (kilohertz). For example, 100hz

Times are specified in ms (miliseconds), or s (seconds). For example, 100ms.

Volume

This takes any number between 0 and 100, where 0 is the minimum audible level, and 100 the maximum comfortable level. Volume is inherited, and may be specified as a percentage of the inherited value.

Also valid are the keywords silent, x-soft (equivalent to 0), soft (=25), medium (=50), loud (=75), x-loud (=100). For example, P {volume: 0} or P {volume: 50%}

Speak

Valid keywords are none (don't speak - different from volume: silent insofar as volume: silent is actually spoken and takes up the time that this would take, where as speak: none is completely ignored), normal, and spell-out (one letter at a time). Speak is inherited. This means that, unlike display: none, which suppresses the elements descendants as well, speak: none can be overridden by a subsequent declaration.

Speech-rate

This specifies the number of words per minute as a number or as x-slow (=80 w.p.m.), slow (120 w.p.m.), medium (=200 w.p.m), fast (=300 w.p.m.), x-fast (500w.p.m.), faster (40 w.p.m. faster than the inherited rate), slower (40 w.p.m. slower). It is inherited and is initially medium. E.g., speech-rate: 80.

Pause-before, pause-after

These can be specified as a time or as a percentage of the time it takes to speak one word. For example, speech-rate: 240 means 240 wpm, so one word takes 0.25s, so pause-before: 10% = 0.025s = 25ms. Percentages are recommended because of large possible changes in speech-rate.

These properties refer to the pause before or after speaking an element. They are not inherited.

Pause

If one value is specified for this shortcut, it applies to both before and after. If two values are specified, the first one applies to before and the second to after.

Cue-before, cue-after

This specifies the url of an audio file to play before speaking the element. It can also be set to the keyword none (initial value). E.g., cue-before: url(dingdong.wav). It is not inherited.

Cue

If one value is specified for this shortcut, it applies to both before and after. If two values are specified, the first one applies to before and the second to after. E.g., cue: url(ding.wav) url(dong.wav).

Play-during

This specifies a background sound to be played simultaneously with an element. It is not inherited.

In addition to a url, either mix or repeat or mix repeat may be specified. Mix causes the parent's background to continue playing mixed with it. If mix is not specified, the element's background sound replaces the parents. Repeat overrides the default behavior of playing only once and repeats it for all the time that the element is spoken. E.g., play-during: url(hello.mid) mix repeat;

As an alternative, auto (initial value - causing any parent background sound to continue) or none (stop all parent background but continue it when the element has finished being spoken) can be specified.

Azimuth

This is specified as an angle between -360deg and 360 deg, which refers to the voice's position in the sound stage. 0deg means to the centre of the sound stage, 90deg to the right of the sound stage, 180deg to the back of the sound stage, and 270deg (equivalent to -90def) to the left.

Alternatively a keyword may be specified as one of left-side (=270deg), far-left (=300deg), left (=320deg), center-left (340deg), center (0deg), center-right (20deg), right (40deg), far-right (60deg), right-side (90deg). In addition, a relative keyword can be specified as leftwards (subtract 20deg from the current angle) or rightwards (add 20deg to the current angle). Azimuth is inherited. E.g., azimuth: center-left.

In addition to specifiying a keyword, behind may be specified. This has the effect of changing the angle: left-side (270deg), far-left (=240deg), left (=220deg), center-left (200deg), center (180deg), center-right (160deg), right (140deg), far-right (120deg), right-side (90deg). E.g., azimuth: center-left behind.

Elevation

This is inherited, and takes an angle between -90 and 90deg. 0deg is level with the listener, 90deg is directly overhead, and -90deg directly below. Alternatively, a keyword may be specified: below (-90deg), level (0deg), above (90deg) or higher (add 10deg) or lower (subtract 10deg). For example, elevation: 90deg.

Voice-family

This is a comma delimited list of voices, possibly including a generic voice-family (can be male, female or child). E.g., voice-family: "Julia Roberts", actress, female. Note that, as with font names, voices that include white space must be in quotes.

Pitch

This can be specified as a frequency, x-low, low, medium, high or x-high. The average male frequency is 120Hz, and 210Hz for females. Pitch is inherited.

Pitch-range

This indicates the inflection. Valid values are any number between 0 and 100, where 0 is monotonic and 50 is normal. Pitch-range is inherited.

Stress

This indicates the amount of stress given to stressed parts of sentences. Valid values are numbers between 0 and 100, where 50 is the initial value. Stress is inherited.

Richness

This indicates the brightness of the voice. Valid values are numbers between 0 and 100, where 50 is the initial value. Higher values give a voice greater 'carry', but lower values make it softer. Richness is inherited.

Speak-punctuation

This can be set to code (i.e., speak the punctuation marks literally) or none (initial). It is inherited.

Speak-numeral

This can be set to digits (speak 123 as one two three) or continuous (initial - 123 = one hundred and twenty three).

Speak-header

This can be set to once (initial)or always. It is inherited. Always means speak after every relevant cell. This only works if the browser knows that the cell is relevant. Thus if you have a TH cell, you should give it a name with the axis or scope attribute, and give each subsequent cell that pertains to that main header a headers attribute with the same value. For example:

<TR>
  <TH scope="col">Regulations</TH>
  <TH scope="col">Directives</TH>
</TR>
<TR>
  <TD>Are directly effective</TD>
  <TD>Require specific enaction</TD>
</TR>

If TH {speak-headers: always}, then the header would be spoke before each element in that column.

The next section deals with language styles.