The Difference Between Finding and Confirming
There are two very different approaches to playing a phrase.
In one, the hand follows the notes and discovers what to do on the way. The fingers are searching. The arm asks questions. The sound is found as it happens.
In the other, the sound picture has already been decided. The hand arrives at the keys to confirm what was already heard.
The difference is not small. It is the difference between practice that compounds and practice that does not.
Ask the diagnostic question on any phrase you are working on: does my hand know what it is looking for before it gets there? If the answer is yes, you are confirming. If the answer is no — if the hand only finds out what the sound is at the moment of contact — you are searching. Most of what feels like "I just need more repetitions" is actually searching mode trying to repeat its way into clarity. It does not get there.
Why Searching Mode Accumulates Hours Without Clarity
Students who play in the searching mode — the "see what'll happen" mode — can accumulate practice hours without accumulating much clarity.
The notes may be correct in pitch. The rhythm may be close enough. But the color is uncertain. The weight is searched for. The timing is approximate. The phrase holds together loosely, and a second performance of it sounds noticeably different from the first.
The reason is simple: the body had no target. You cannot aim the arm, the fingers, or the touch at something that has not been specified. Without a target, the hand defaults to whatever it does by habit — which is usually a generic, middle-of-the-road sound, neither warm nor bright, neither shaped nor flat. It is the sound of a competent body with no instructions.
This is also why hours stack up without compounding. Each repetition reinforces the same generic default, because that is what is actually being practised. The hand gets better at the default, not at the music. To make practice compound, the body needs something specific to aim at — and the something specific has to come from the ear.
What the Inner Sound Image Actually Is
The inner sound image is the specific, internally-heard version of the phrase that the body uses as its target. It is what you hear in your head before the fingers move.
It is not meditation. It is not visualization in a mystical sense. It is concrete preparation — the ear deciding, in detail, what the phrase is going to sound like, so that the arm, the fingers, the touch, the timing, and the weight are all organized by what has already been heard.
The image has to be specific to be useful. A vague sense of "I want this to sound nice" is not an inner sound image. It is a wish. The image becomes useful at the point where it commits to particular qualities — this tone color, this arm weight, this phrase shape, this peak, this release. Specificity is what turns the image from a feeling into a target.
When the image is specific, the body has something to follow. When it is not, the fingers default and the practice does not compound.
The Three Dimensions: Tone Color, Arm Weight, Phrase Shape
A useful inner sound image is specific on three dimensions at once. Each one organizes a different part of the body.
Tone color. Is the note bright, round, dark, delicate? Close or distant? A warm singing tone and a transparent silvery one are produced by entirely different combinations of arm weight and finger touch. Decide which one before you play. This is also where concentrated fingers behind a warm soft tone become a deliberate choice rather than an accident — soft playing is the clearest case where the image decides the result, because there is no margin for the hand to default.
Arm weight. Heavy and grounded, light and floating, bouncy, or settled? The arm is the part of the body that delivers the chosen color to the key. A bright tone with a heavy arm sounds entirely different from a bright tone with a light arm. The image has to decide both. The deeper mechanism behind this is the coordinated system of arm, wrist, and fingers — the inner sound image is what tells that system what to coordinate towards.
Phrase shape. Where does the phrase peak? Where does it release? Which note carries the weight of the line, and which notes lead into and out of it? A phrase without a decided shape sounds flat even when every note is correct. The image specifies the arc, and once the arc is specified, the phrase that finally sounds shaped, not flat becomes a phrase you can actually aim at.
When all three dimensions are specific, the hand has a complete target — and the practice hours have something to compound toward.
Singing Internally Is the Practical Tool
The question is how to make the inner sound image specific. The most reliable tool is internal singing.
Not humming. Not vague mental playback. Singing the phrase internally with enough detail that the three dimensions become specific — this tone color, this arm weight, this shape — before the fingers move.
Singing works because it forces commitment. A vague mental playback can stay vague indefinitely. Singing — even silently, inside the head — has to choose a pitch, a length, a stress, a release. The act of singing the phrase to yourself, properly, exposes every place where the image was still general. When you reach a note and realize you do not know what color it should be, the singing stops working. That is the diagnostic. You go back, decide the color, and sing it again.
This is also the moment where slow practice with fast finger movement becomes the natural partner. Slow tempo is the tempo at which the image can actually be specified and tested. At speed, there is no room to ask the diagnostic questions; at slow tempo, image and execution can both be examined together.
Try This: A Phrase, Sung First, Then Played
Take a short phrase you know — four to eight bars is plenty.
- Close your eyes. Sing the phrase internally until the tone color is clear. Ask: what color is this opening note? Bright? Round? Close or distant?
- Play one note to test. Did the hand produce the color you heard, or the color it defaults to? If the default — adjust the arm weight, adjust the touch, and sing the note again first.
- Sing the next note. Decide its color in relation to the first. Then play, and check.
- Sing the whole phrase internally, including the shape — where the peak sits, where the release happens.
- Play the full phrase to confirm the image. The phrase will not suddenly be perfect. But you will notice something different: the hands are not searching. They are responding to a target that was already set.
The diagnostic question runs through every step: did the hand produce what the ear specified, or did the hand default? When the answer is "default," go back one step and re-specify. The image gets sharper through this loop, and the hand follows.
How This Connects to the Rest of the Practice Architecture
The inner sound image is the upstream of almost everything else in the practice.
It is the upstream of preparation. You cannot prepare the movement before the note when you do not know what sound the movement is preparing for. Preparation is a physical act, but it is a physical act in service of a specific sound — and the sound has to be decided first.
It is the upstream of listening. You can only listen for the gap between what you intended and what you played when there is something intended to compare against. Without the image, listening has no reference point and collapses into vague self-judgment.
It is the upstream of speed. The controlled tempo of the Stack Method only stays under control when each tempo is a layer where the inner sound image gets confirmed. At every floor of the stack, the question is the same: is the sound still the sound I heard, or did the tempo erode it? Without the image, "under control" has nothing to measure against.
These three — preparation, listening, controlled tempo — are the physical and procedural disciplines. The inner sound image is the decision they all serve. Train it, and the rest of the practice architecture has something to organize around.
In Short
Most practice is about repetition. Effective practice uses the inner sound image as the organizing force.
The hand does not invent the sound. The hand realizes the sound that the ear has already decided. When the image is specific in tone color, arm weight, and phrase shape, practice hours compound. When it is not, they accumulate without clarity.
Where This Is Built Step by Step
Training the inner sound image is not a separate exercise. It is a habit woven through every piece, from the first note of the first phrase to the most complex passage at tempo. Each piece in the Piano Fantasy Academy is approached the same way: decide the sound first, then let the body confirm it.
You can keep experimenting on your own, or follow a clear path that builds this inner-ear discipline step by step.









