Convert UTF-8 Char Array to Raw Byte Array in Java


Given a UTF-8 Char Array, we can use the following Java Function to Convert to Raw Byte Array. Each UTF-8 Character has 3 types: 3 bytes, 2 bytes or 1 byte depending on the first byte range.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public static byte[] char2Byte(char[] a) {
    int len = 0;
    // obtain the length of the byte array
    for (char c : a) {
      if (c > 0x7FF) {
        len += 3;
      } else if (c > 0x7F) {
        len += 2;
      } else {
        len++;
      }
    }
    // fill the byte array with UTF-8 characters
    var result = new byte[len];
    int i = 0;
    for (char c : a) {
      if (c > 0x7FF) {
        result[i++] = (byte) (((c >> 12) & 0x0F) | 0xE0);
        result[i++] = (byte) (((c >> 6) & 0x3F) | 0x80);
        result[i++] = (byte) ((c & 0x3F) | 0x80);
      } else if (c > 127) {
        result[i++] = (byte) (((c >> 6) & 0x1F) | 0xC0);
        result[i++] = (byte) ((c & 0x3F) | 0x80);
      } else {
        result[i++] = (byte) (c & 0x7F);
      }
    }
    return result;
}
public static byte[] char2Byte(char[] a) {
    int len = 0;
    // obtain the length of the byte array
    for (char c : a) {
      if (c > 0x7FF) {
        len += 3;
      } else if (c > 0x7F) {
        len += 2;
      } else {
        len++;
      }
    }
    // fill the byte array with UTF-8 characters
    var result = new byte[len];
    int i = 0;
    for (char c : a) {
      if (c > 0x7FF) {
        result[i++] = (byte) (((c >> 12) & 0x0F) | 0xE0);
        result[i++] = (byte) (((c >> 6) & 0x3F) | 0x80);
        result[i++] = (byte) ((c & 0x3F) | 0x80);
      } else if (c > 127) {
        result[i++] = (byte) (((c >> 6) & 0x1F) | 0xC0);
        result[i++] = (byte) ((c & 0x3F) | 0x80);
      } else {
        result[i++] = (byte) (c & 0x7F);
      }
    }
    return result;
}

First, we iterate the char array to compute the total length of the result byte array, and then second pass, we fill the byte array with corresponding UTF-8 value.

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
239 words
Last Post: Teaching Kids Programming - Determine a Armstrong Number
Next Post: Teaching Kids Programming - Number of Quadruplets That Sum Target via Hash Table

The Permanent URL is: Convert UTF-8 Char Array to Raw Byte Array in Java

Leave a Reply