Performance issues with mesh processing

In order to create a certain effect, I need to calculate the pose of a set of skinned meshes, since Unity doesn’t expose that data in version 3. I was able to get everything working, but I’m suffering from some major performance issues. Since I don’t have Unity Pro, I did some ad-hoc timing of functions using Time.realtimeSinceStart and narrowed down my bottleneck to this section of code:

private function CalculatedPosedMesh(skinnedMesh : SkinnedMeshRenderer) : Mesh
{
	// Build a new mesh
	var baseMesh = skinnedMesh.sharedMesh;
	var mesh = new Mesh();

	var baseMeshVertices = baseMesh.vertices;
	var baseMeshVerticesLength = baseMeshVertices.Length;
	var baseMeshBoneWeights = baseMesh.boneWeights;
	var baseMeshBindPoses = baseMesh.bindposes;
	var newVert : Vector3[] = new Vector3[baseMeshVertices.Length];
	
	var i : int = 0;
	for(i = 0; i < baseMeshVerticesLength; i++)
	{
		//Only the first bone is being factored in right now in order to cut down on calculations. 

		// Apply bone weights
		newVert <em>= GetBoneInfluence(skinnedMesh, baseMesh, baseMeshBoneWeights_.boneIndex0, 1.0, baseMeshVertices*);*_</em>

* // Transform the vertex to be in local space for convenience.*
_ newVert -= transform.position;
* }*_

* mesh.vertices = newVert;*
* mesh.uv = baseMesh.uv;*
* mesh.triangles = baseMesh.triangles;*

* mesh.RecalculateBounds();*
* mesh.RecalculateNormals();*

* return mesh;*
}

private function GetBoneInfluence(skinnedMesh : SkinnedMeshRenderer, baseMesh : Mesh, boneIndex : int, boneWeight : float, vertex : Vector3) : Vector3
{
* // Transform the mesh vertice first so that it’s local in bone space, and then transform the*
* // local coordinates to world coordinates using the current bone transform.*
* var localVertexPosition : Vector3 = baseMesh.bindposes[boneIndex].MultiplyPoint3x4( vertex );*

_ return skinnedMesh.bones[boneIndex].transform.localToWorldMatrix.MultiplyPoint3x4( localVertexPosition ) * boneWeight;
}
This code works just fine, and the effect looks excellent, but this piece of code is causing 0.2-1.0 second hitches each time I call it on a 3k vertex mesh, which is not acceptable. I was able to optimize it down to what it is now by caching a lot of the return values from the mesh class (a lot of functionality is apparently being done behind the scenes by the property accessors), but I’ve run out of ideas in that regard. If I can bring the calculation cost down consistently to half a second or less, I think I can mask the rest of it with multithreading or coroutines.
Can anyone spot redundant or unnecessary operations here that I might have missed, or another way to approach the problem? Any help would be appreciated!_

As it was said in the comments: runtime mesh modification is an expensive operation.

But besides that move these out of the loop:

baseMesh.bindposes[boneIndex]
skinnedMesh.bones[boneIndex]

Don’t ever dobaseMesh.bindposes[index] or mesh.vertices[index], etc in the loop. Every time this operations is performed Unity does a copy of C++ data into C# (whole array! and in vertices case it has to convert data from combined vertex stream to Vector3 array). Simply do this before the loop:

Matrix4x4[] bindposes = baseMesh.bindposes[boneIndex];

this way a copy of the array is performed only once - not on every vertex. And then use bindposes[index] in the loop.

IIRC Unity caches result of localToWorldMatrix, but simply creating an array of matrices before the main loop and caching them should help a bit too.