Convert UTF Encoding to UCS in PHP


UCS-2 is a character encoding standard in which characters are represented by a fixed-length 16 bits (2 bytes). It is used as a fallback on many GSM networks when a message cannot be encoded using GSM-7 or when a language requires more than 128 characters to be rendered. The Basics of UCS-2 Encoding and SMS Messages

UTF-8 encoding is a variable sized encoding scheme to represent unicode code points in memory. Variable sized encoding means the code points are represented using 1, 2, 3 or 4 bytes depending on their size. A 1 byte encoding is identified by the presence of 0 in the first bit. The English alphabet A has unicode code point U+0041.

The following PHP function UTF2UCS converts the string encoding from UTF-8 to UCS-2. Text/string encoding is often required when you migrate the Database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<?php
if (!function_exists('UTF2UCS')) {
  function UTF2UCS($str, $s = "") {
      $str = strtolower($str);
      $char = 'UTF-8';
      $arr = array();
      $out = "";
      $c = mb_strlen($str, $char);
      $t = false;
 
      for ($i = 0; $i < $c; ++ $i){
          $arr[] = mb_substr($str, $i, 1, $char);
      }
  
      foreach ($arr as $i => $v){
          if (preg_match('/\w/i', $v, $match)) {
              $out .= $v;
              $t = true;
          } else {
              if ($t) $out .= " ";
              if (isset($s) && $s) $out .= "+";
              $out .= bin2hex(iconv("UTF-8", "UCS-2" ,$v)) . " ";
              $t = false;
          }
      }
      return $out;
  }
}
<?php
if (!function_exists('UTF2UCS')) {
  function UTF2UCS($str, $s = "") {
      $str = strtolower($str);
      $char = 'UTF-8';
      $arr = array();
      $out = "";
      $c = mb_strlen($str, $char);
      $t = false;

      for ($i = 0; $i < $c; ++ $i){
          $arr[] = mb_substr($str, $i, 1, $char);
      }
  
      foreach ($arr as $i => $v){
          if (preg_match('/\w/i', $v, $match)) {
              $out .= $v;
              $t = true;
          } else {
              if ($t) $out .= " ";
              if (isset($s) && $s) $out .= "+";
              $out .= bin2hex(iconv("UTF-8", "UCS-2" ,$v)) . " ";
              $t = false;
          }
      }
      return $out;
  }
}

For example:

1
echo UTF2UCS("Hello, World!");
echo UTF2UCS("Hello, World!");

This prints:

1
hello 2c00 2000 world 2100
hello 2c00 2000 world 2100

--EOF (The Ultimate Computing & Technology Blog) --

GD Star Rating
loading...
302 words
Last Post: Teaching Kids Programming - Three Consecutive Odds
Next Post: Teaching Kids Programming - Sum of Two Numbers Less Than Target using Two Pointer Algorithm

The Permanent URL is: Convert UTF Encoding to UCS in PHP

Leave a Reply