How do I optimize transforming vertices to screen space?

I think I should be able to optimize this (two vector-matrix multiplies per iteration):

Vector3[] vertices = renderer.vertices;
for (int i = 0; i < vertices.Length; i++)
    vertices _= camera.WorldToScreenPoint(transform.TransformPoint(vertices*));*_

Into something like this (one vector-matrix multiply per iteration):
Vector3[] vertices = renderer.vertices;
var m = renderer.transform.localToWorldMatrix * camera.worldToCameraMatrix
* camera.projectionMatrix * somethingElsePerhaps;
for (int i = 0; i < vertices.Length; i++)
vertices = vertices * m;
How do I do it?

Solution for posterity:

// Once per frame
var worldToProjectionMatrix = camera.projectionMatrix * camera.worldToCameraMatrix;
var projectionToScreenMatrix = Matrix4x4.TRS(
    new Vector3(camera.pixelWidth * 0.5f, camera.pixelHeight * 0.5f, 0),
    Quaternion.identity,
    new Vector3(camera.pixelWidth * 0.5f, camera.pixelHeight * 0.5f, 1));
var worldToScreenMatrix = projectionToScreenMatrix * worldToProjectionMatrix;

// Once per object per frame
var localToScreenMatrix = worldToScreenMatrix * renderer.transform.localToWorldMatrix;
Vector3[] vertices = renderer.GetComponent<MeshFilter>().sharedMesh.vertices;
for (int i = 0; i < vertices.Length; i++)
    vertices _= localToScreenMatrix.MultiplyPoint(vertices*);*_