While "it's all learned" may be a fine rule of thumb, it's not true that the senses are interchangeable based on how one is trained.
Senses require a different number of "dimensions" to represent them. Pitch is one-dimensional, color is two-dimensional, and taste is four- or five-dimensional. You can represent pitch with a single number (frequency), representing how fast the eardrum vibrates (from 20 Hz to 2,000 Hz). You can represent color with a two-dimensional number (such as an HTML code to describe where on a graph a color falls). To represent smell, you need to represent how strongly each of four (or five) types of taste buds is stimulated. There's no way that a human could learn to experience color (2-D)as pitch (1-D) or taste (4-D or 5-D) as color (2-D).
It's true that both pitch and color can be measured by machines on a linear scale. The ear, in fact, responds directly to the frequency of a sound wave and perceives pitch linearly. We perceive three notes as each higher, lower, or between the other two. The eye, however, does not respond to light frequency directly . We know that blue has a higher frequency than yellow, which has a higher frequency than red, but we don't perceive color that way. We don't perceive yellow as closer to blue than red is, the way we perceive C as closer to D than B is.
Here's another example. Pitch includes the experience of octaves (one pitch that's twice or half the frequency of another). Since audible pitches range from 20 Hz to 20,000 Hz, there's room for about ten octaves. Since the visible spectrum ranges from 400 million million Hz to 750 million million Hz, even if the cells in the retina could vibrate that fast, they would not be able to register an octave (one stimulus with twice the frequency of another).
There's no way that humans could be taught from birth to experience sound waves as color and light as sound. The senses don't map one to the other that way. "Red" and "sweet" could not be reversed.