How can I go about shortening a string?

for example, if I have this in a string

><>0>2><>0>1><Banners 1>0>0><

How can I shorten it to something more like:

f98sS(*D(#@@&#SD13

3 Answers

3

Either use an existing compression library (zlib, SharpZipLib, …) or implement your own. Depending on your programming experience you could implement the LZ77 and huffman encoding yourself. I’ve once implemented a simple huffman encoding class myself.

Though you have to understand that “too short strings” can be even longer when “compressed”. LZ77 is only efficient when you have long text and there are several longer phrases that repeat. Your example string contains not much of those. Huffman coding on the other hand is very efficient when only a few characters are used and there is a lot “single character repetition”. A huffman encoding requires an entropy tree of some sort. The best compression results are yielded when you create a specialised tree for the data you want to compress. However such a tree need to be stored as well which could result in a larger result. Though if you have a lot of data this usually pays quickly.

Note that a huffman encoding produces binary data which can’t be directly represented as string in any meaningful way as the whole range of ASCII contains many control characters which aren’t printable. If you only want to store the data or send it over a network you can simply use a byte array. However if it need to be represented as printable text you might have to base64 encode the byte array. Base64 encoding however will increase the size of the data (4/3 of the data).

I just cleaned up my old Huffman class and added two special methods to huffman-encode a string into a base64 encoded string. As i said since the resulting size might be larger it first checks if it actually makes sense to compress and only compresses the text when appropriate. For this it places a “mark” ($ or §) at the beginning so the decoding method knows how to handle it.

When the class compresses it’s own source code file, which has a size of 15581 bytes the resulting base64 string is just 11129 bytes. Though if encoded to binary data it’s just 8344 bytes.

string x = myString
if (string.Length > maxLength)
{
string y = string.Empty;
for (int i = 0; i < maxLength; i++)
{
y += x*;*
}
}
or
if (x.Contains(“<”)
{
x.Replace(“<”, “”)
}
Also check out [String Formatting Examples][1]
[1]: Pretty Format a Number as File Size in C#, PHP and Delphi | Azulia Designs

There’s plenty of tools to help string manipulation and construction in .NET! You can use String.substring() and String.IndexOf()

string name = "I only want the name inside of the brackets [Tyler]";
string shortName = name.substring(name.IndexOf('['), name.IndexOf(']'));
// shortName = "Tyler"

You can also just shorten a string like this using substring if you don’t want anything particular…

string str = "12345678910";
string short = str.substring(0, 5);
// short = "123456"