Given a UTF-8 Char Array, we can use the following Java Function to Convert to Raw Byte Array. Each UTF-8 Character has 3 types: 3 bytes, 2 bytes or 1 byte depending on the first byte range.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | public static byte[] char2Byte(char[] a) { int len = 0; // obtain the length of the byte array for (char c : a) { if (c > 0x7FF) { len += 3; } else if (c > 0x7F) { len += 2; } else { len++; } } // fill the byte array with UTF-8 characters var result = new byte[len]; int i = 0; for (char c : a) { if (c > 0x7FF) { result[i++] = (byte) (((c >> 12) & 0x0F) | 0xE0); result[i++] = (byte) (((c >> 6) & 0x3F) | 0x80); result[i++] = (byte) ((c & 0x3F) | 0x80); } else if (c > 127) { result[i++] = (byte) (((c >> 6) & 0x1F) | 0xC0); result[i++] = (byte) ((c & 0x3F) | 0x80); } else { result[i++] = (byte) (c & 0x7F); } } return result; } |
public static byte[] char2Byte(char[] a) { int len = 0; // obtain the length of the byte array for (char c : a) { if (c > 0x7FF) { len += 3; } else if (c > 0x7F) { len += 2; } else { len++; } } // fill the byte array with UTF-8 characters var result = new byte[len]; int i = 0; for (char c : a) { if (c > 0x7FF) { result[i++] = (byte) (((c >> 12) & 0x0F) | 0xE0); result[i++] = (byte) (((c >> 6) & 0x3F) | 0x80); result[i++] = (byte) ((c & 0x3F) | 0x80); } else if (c > 127) { result[i++] = (byte) (((c >> 6) & 0x1F) | 0xC0); result[i++] = (byte) ((c & 0x3F) | 0x80); } else { result[i++] = (byte) (c & 0x7F); } } return result; }
First, we iterate the char array to compute the total length of the result byte array, and then second pass, we fill the byte array with corresponding UTF-8 value.
–EOF (The Ultimate Computing & Technology Blog) —
GD Star Rating
loading...
239 wordsloading...
Last Post: Teaching Kids Programming - Determine a Armstrong Number
Next Post: Teaching Kids Programming - Number of Quadruplets That Sum Target via Hash Table