I am trying to make something so, so I have a String, and let’s say I have 20 objects in the array. I want to select 8 of the strings without repeats randomly. I had an Idea, but there would be a possibility of repetition.
So I could do
RandomeFromList(MinNumber, MaxNumber, How many I want to select);
While the Shuffle and pick example (below) will work I think it is too restrictive to be part of a library. Solutions don’t all have to be “librariable” but it is nice when a solution solves the immediate task and at least some variations of that task.
I’ve got it written but don’t quite have the time free to test it properly but you can create array extension methods to handle this quite easily. A shuffle method (with generics) will shuffle an array of most types and a pick method will return the first n elements. So your syntax becomes var result = testarray.Shuffle().Pick(8);
If you want ordering you add an Order method (or methods) that can order them in whatever orders you choose.
Again, using such an approach would work with strings, ints, floats or what have you.
Generally, I think the shuffle method is the fastest and easiest way to get this done and for such small sets it’s perfectly suited to it.
If you end up with much larger sets (like millions of elements) then I find there are better solutions. One way I used in the past was to iterate through the list and randomly decide if each element visited is selected. The chance of being selected is a factor of the number of elements left to visit and the number of elements still needed for your select. This way the chance of selection approaches 1 as you near the end of the list. It also means that you don’t need to perform any kind of sorting or touch each element more than exactly once.
I believe it is fair to predict that just about nobody does it that way. You can check yourself but surely shuffling the source values and simply pulling them one at a time is the most common way. It is certainly a very efficient method.
Let’s see if this works for you. I’m doing a lot of VRChat work these days and only have a subset of C# (or I would probably opt for using Linq here). Adding Ascending and Descending sort methods can be added (VRChat doesn’t expose what I need to do it).
using System;
using UERandom = UnityEngine.Random;
namespace Leylan.ArrayExtensions
{
public static class LeylanArrayExtensions
{
public static T[] Shuffle<T>(this T[] array)
{
var max = array.Length;
while (max > 1)
{
var k = UERandom.Range(0, --max);
T value = array[k];
array[k] = array[max];
array[max] = value;
}
return array;
}
public static T[] Take<T>(this T[] array, Int32 max)
{
T[] result = new T[max];
var cnt = 0;
while (cnt < max)
{
result[cnt] = array[cnt++];
}
return result;
}
}
}
The concern there is that you usually want to preserve all the elements in the original collection.
To avoid modifying the original list, you would need to make a copy of it.
Obviously performance isn’t always a concern, but shuffling a collection (in a way that doesn’t allocate) and running through N number of elements can avoid all allocations.
Yes true, needing to use a copy of the list is a downside to my method.
Though also good to note that if you want to not only preserve the elements but also the order of the list, you’d have the same issue with the shuffle method too.
This is another one of those discussions about code that cannot have a definitively correct answer.
The reason is that code itself is not the problem, data is the problem, and code is the solution. Depending on the input data and the desired output, the optimal transformation in between will vary.
Several factors influence the best approach, some of them will be:
Should our collection be immutable? If so, is it better to create a copy and mutate that copy, or should we avoid copying altogether? The answer depends on context, whether we prioritize readability, performance, ease of debugging, etc.
What is the size of our collection?
What is the size and nature of the elements, how many random number we need? Are they value types or reference types?
What type of collection are we dealing with, and how is it represented in memory? Where in memory is it stored?
Some examples include:
If we have a large collection (e.g., 1,000 elements) but only need to select two or three, picking a random index and retrying if duplicates occur may be more efficient, as the probability of a collision is only 1/1,000.
If we don’t mind mutating the collection, we could swap the first randomly chosen element with the last, then pick another random element from the remaining (excluding the last one), swap it with the second-to-last, and repeat for as many elements as needed.
The type of collection also affects performance. For example, removing an element from a linked list is far more efficient than from an array-based list, which requires shifting elements.
If our collection contains expensive-to-move elements, we might use an auxiliary array of integers as an index table, shuffling that instead of directly modifying the main collection.
If multi-threading is a factor, we must consider cache lines. Collections that store elements in contiguous memory may suffer from performance degradation, making them less suitable unless the algorithm is designed to mitigate this issue.
Ultimately, discussing solutions without defining the problem is meaningless. Since code is merely a means of transforming data, different people may propose different “best” solutions based on their own understanding of the problem and its constraints.
This topic falls under pure implementation algorithms. Without a precise definition of the problem, specifically, the data we have and the data we want after processing, we cannot objectively evaluate a solution.
Consequently, there is no universally “better” or “worse” way to implement it and is meaningless discussing a solution to a problem that is not defined as everyone will have a different problem in his mind.
I don’t think it is necessarily about correct answers. The use case was defined by the original poster. I think any method that returns the answer is probably good enough. I have a tendency to write something I can use in several cases rather than just the one I have at the moment.
I’m somewhat averse to removing items from a list if removing items isn’t the goal.
Of course it is about correct answers :), but I get what you mean; it isn’t about the best possible solution as long as the solution is within the requirements.
But it is actually hard to give clear requirements. Look at the answers so far, the OP hasn’t mentioned if mutating the array is desirable and so far the answers are about either removing elements or shuffling the array, which may be undesirable.
We can have solutions that remove elements, or solutions that don’t but shuffle the elements, or solutions that don’t mutate the array but return references to the resulting elements, or solutions that return the indexes instead of the array elements, and while I believe that probably he doesn’t mind shuffling his array, as long as we don’t agree on what we want the resulting data to be and what will happen to the original data, every solution will be different and all will be relatively correct, but not for all the cases.