Jashan has an extensive post about that on Photon's forums (and I don't know why he hasn't linked it to this Question - if you like this answer, go vote up a random post by him :)
He basically goes into detail on Lucas' option 2 - using Unity on the server. Having Photon handle all the networking, and passing all collision data to a headless Unity server for determining the results.
This is a difficult problem. There's several ways to deal with this, all with pros and cons:
make the gamelogic not depend on complicated 3d environments and physics, but instead on like a 2d tilemap. The gamelogic can then operate both on the clients and on the server. In a lot of cases, you can actually get away with this
find a way to use unity on the server. You could build a serverapp in unity, that you publish as a standalone player. This has downsides of being nontrivial to integrate with existing backend solutions.
I usually prefer option number one, and try to somehow design the game in a way that the logic doesn't have to rely on physics, and ideally not on 3d. For an mmo game where people walk around and need to collide with eachother, you could get away with 2d positions, and 2d circle based collision system.
I've recently found out about uLink. uLink is a networking solution created completely in C#, and is meant to be run inside Unity and hence, has access to the physics state of the 3d world. This solves the problem of server back-ends not knowing about the 3d world.
From what I understood, it replaces the built-in networking library found in Unity (which uses RakNet), and is a much better alternative in that it can easily handle ~100 players in one game session without any problem (though you'd just have to run the Unity server in batchmode to prevent it from trying to render the ~100 3d models of the players!).