Create your own scripted tiles as per documentation.
Add a prefab to a basic tile… the prefabs get instantiated when you run the game.
Manually check and manipulate tiles via script. The Tilemap has a Grid component that you can use to transform world coordinates to grid coordinates, which you can then use to query tiles on a layer and do different things based on that.
Like said, the system is pretty open-ended, so several options are available on how to do interactive tiles. We will document best practices eventually, but it is too early for that now.
I’d say that one of the core reasons that tilemap exists in the first place, is that you don’t need to have every tile to be a gameobject. This means not only performance gains, but also makes editing levels more clean and convenient.
The more your tile is interactive and complex, the more you want to use an actual gameobject. That is most convenient way in Unity to describe complex things. (Note that the tile itself can provide this gameobject)
On the other hand, the more you have instances of certain tile, the less you want them to be actual gameobjects, due to performance reasons. For example having 100x100 level with all tiles being destructible, you are looking at 10k gameobjects. Unity can do that (depending on components), but at some point it starts to break down.
So for destructible terrain, I would use single component (script) that handles the entire destructible terrain logic for entire level. The actual terrain tiles are just dummies.
On the other hand, lets say an animated door with sound and scripts, I would just use tile that instantiates prefab or simply overlay GO with spriterenderer on top.
We haven’t really focused on performance yet (premature optimization is evil right!), so I’m shooting blind here. My conservative guess is that you probably can’t pull off smooth Terraria (8400 x 2400, 20M tiles) with the basic single tilemap setup. You might need to build your own systems that stream multiple instances of small tilemaps side-by-side near the camera or something like that.
If you test bigger procedural maps yourself, I’d be very interested to know where we stand in the perf right now.