UTF-16: Perbedaan antara revisi

Konten dihapus Konten ditambahkan
→‎Code points U+10000 to U+10FFFF: Perbaikan kesalahan ketik, Perbaikan tata bahasa, Penambahan pranala, LÊ TIẾN TIẾP
Tag: Suntingan perangkat seluler Suntingan aplikasi seluler
k Dikembalikan ke revisi 6831316 oleh JohnThorne (bicara).
Tag: Pembatalan
Baris 1:
'''UTF-16''' (16-[[bit]] [[Unicode]] Transformation Format) adalah suatu [[pengkodean karakter]] Unicode yang mampu mengkodekan 1,112,064<ref><math>2^{16} - 2 \times 2^{10} + 2^{10} \times 2^{10}</math>, dimana <math>2^{16}</math> adalah BMP, <math>- 2 \times 2^{10}</math> adalah interval U+D800–U+DFFF, dan <math>2^{10} \times 2^{10}</math> adalah level (''plane'') tertinggi.</ref> angka (disebut ''[[code point]]'') dalam jangkauan kode Unicode dari 0 sampai 0x10FFFF. Pengkodean ini adalah sebuah "variable-width encoding" karena code point itu dikodekan dengan satu atau dua ''code units'' 16-bit .
 
[[Universal Character Set]] '''UCS-2''' (2-[[byte]]) mirip dengan pengkodean karakter yang sekarang digantikan oleh UTF-16 versi 2.0 sebagai standar Unicode pada bulan Juli 1996.<ref>{{cite web | url=http://www.unicode.org/faq//utf_bom.html|title=Questions about encoding forms| accessdate=12 November 2010}}</ref> Menghasilkan format dengan panjang tetap (''fixed-length format'') hanya menggunakan code point sebagai unit kode 16-bit dan membuahkan hasil yang tepat sama dengan UTF-16 untuk 97% (63.488; bukan 65.536) dari seluruh code point dalam jangkauan 0-0xFFFF, termasuk semua karakter yang telah diberi nilai saat itu.
Baris 13:
The first plane (code points U+0000 to U+FFFF) contains the most frequently used characters and is called the [[Basic Multilingual Plane]] or ''BMP''. Both UTF-16 and UCS-2 encode code points in this range as single 16-bit code units that are numerically equal to the corresponding code points. The code points in the BMP are the ''only'' code points that can be represented in UCS-2.
-->
[[=== Code points U+10000 ]] [[to U+1TELLT]]10FFFF ===
<!--
Code points from the other planes (called Supplementary Planes) are encoded in UTF-16 by pairs of 16-bit code units called a ''surrogate pair'', by the following scheme:
Baris 44:
Since the ranges for the lead surrogates, trail surrogates, and valid BMP characters are disjoint, searches are simplified: it is not possible for part of one character to match a different part of another character. It also means that UTF-16 is ''self-synchronizing'': the start of the next character following a given code unit can be found by examining only that one code unit. [[UTF-8]] shares these advantages, but many earlier encoding schemes did not allow unambiguous searching and could only be synchronized by re-parsing from the start of the string.
 
Because the most commonly used characters are all in the Basic Multilingual Plane, handling of surrogate pairs is often not thoroughly tested. This leads to persistent bugs and potential security holes, even in popular and well-reviewed application software (e.g. CVE-2008-2938, CVE-2012-2135).<ref>https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-2938 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-2135</ref>FR770000
-->
 
=== Code points U+D800 to U+DFFF ===
<!--
Baris 151 ⟶ 150:
+ 11 0010 0001
= 1101 1111 0010 0001
= 0xDF21 // second code unit of UTF-16 encoding
 
== Lihat pula ==
* [[Pengkodean karakter]]
* [[Unicode]]
Baris 160 ⟶ 161:
 
== Pranala luar ==
* [http://www.unicode.org/faq/utf_bom.html#utf16-4 A very short algorithm for determining the surrogate pair for any codepoint]
* [http://www.unicode.org/notes/tn12/ Unicode Technical Note #12: UTF-16 for Processing]
* [http://www.unicode.org/faq/basic_q.html#14 Unicode FAQ: What is the difference between UCS-2 and UTF-16?]
* [http://www.unicode.org/charts/charindex.html Unicode Character Name Index]
* RFC 2781: UTF-16, an encoding of ISO 10646
* [http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#charAt(int) java.lang.String documentation, discussing surrogate handling]
 
{{Unicode navigation}}
Baris 172 ⟶ 173:
{{DEFAULTSORT:Utf-16 Ucs-2}}
 
[[KategoriCategory:Tipografi]]
[[KategoriCategory:Pengkodean karakter]]
[[KategoriCategory:Unicode]]
[[Kategori:Komputer]]