UTF-8: Perbedaan antara revisi

Konten dihapus Konten ditambahkan
Utf-8
Tag: Suntingan perangkat seluler Suntingan peramban seluler
RXerself (bicara | kontrib)
Menolak 2 perubahan teks terakhir (oleh 114.125.118.25) dan mengembalikan revisi 11931421 oleh HsfBot
Baris 1:
'''UTF-8''' ('''[[:en:Universal Character Set|Universal Character Set (UCS)]] Transformation Format{{mdash}}8-bit'''<ref>{{Cite book|publisher=[[Unicode Consortium|The Unicode Consortium]]|title=The Unicode Standard|url=http://www.unicode.org/versions/Unicode6.0.0/|edition=6.0|publisher=The Unicode Consortium|location=Mountain View, California, USA|isbn=978-1-936213-01-6|chapter=Chapter 2. General Structure}}</ref>) adalah sebuah pengkodean karakter dengan lebar variabel tertentu (''[[:en:variable-width encoding|variable-width encoding]]'') yang mewakili setiap karakter komputer (''character'') dalam himpunan karakter [[Unicode]]. Didesain untuk ''[[:en:backward compatibility|backward compatibility]]'' dengan [[ASCII]] dan untuk menghindari komplikasi ''[[:en:endianness|endianness]]'' dan ''[[:en:byte order mark|byte order mark]]'' dalam [[UTF-16]] dan [[:en:UTF-32|UTF-32]].
 
UTF-8 telah menjadi metode [[pengkodean karakter]] (''character encoding'') yang dominan untuk [[World Wide Web]], meliputi lebih dari setengah jumlah seluruh halaman Web.<ref>{{Cite web|url=http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html|title=Unicode nearing 50% of the web|first=Mark |last=Davis|date=28 January 2010|work=Official Google Blog|publisher=[[Google]]|accessdate=5 December 2010}}</ref><ref name="BuiltWith">{{cite web
Baris 11:
| publisher = W3Techs
| accessdate = March 30, 2010
}}</ref> [[Internet Engineering Task Force]] (IETF) mengharuskan semua [[protokol (komputer)|protokol]] [[Internet]] untuk mengidentifikasi ''[[Pengkodean karakter|encoding]]'' yang dipakai untuk data karakter, dan pengkodean karakter yang didukung (''supported character encoding'') untuk menyertakan UTF-8.<ref name="rfc2277">{{Cite journal |first=H. |last=Alvestrand |title=RFC 2277 |contribution=IETF Policy on Character Sets and Languages |publisher=[[Internet Engineering Task Force]] |year=1998}}</ref> [[:en:Internet Mail Consortium|Internet Mail Consortium]] (IMC) merekomendasi seluruh program e-mail dapat menayangkan dan membuat e-mail menggunakan UTF-8.<ref name="IMC">{{cite web
| url = http://www.imc.org/mail-i18n.html
| title = Using International Characters in Internet Mail
Baris 17:
| date = August 1, 1998
| accessdate = November 8, 2007
}}</ref> UTF-8 juga terus meningkat penggunaannya sebagai ''default character encoding'' dalam [[sistem operasi]], [[bahasa pemrograman]], [[application programming interface|API]], dan [[aplikasi perangkat lunak]].
 
<!--
UTF-8 encodes each of the 1,112,064 [[code point]]s in the Unicode character set using one to four 8-bit [[byte]]s (termed "[[octet (computing)|octets]]" in the Unicode Standard). Code points with lower numerical values (i.e. earlier code positions in the Unicode character set, which tend to occur more frequently) are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with [[ASCII]], are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well.
 
The official [[Internet Assigned Numbers Authority|IANA]] code for the UTF-8 character encoding is <code>UTF-8</code>.<ref>{{cite web |url=http://www.iana.org/assignments/character-sets |title=CHARACTER SETS |publisher=Internet Assigned Numbers Authority |date=November 4, 2010 |accessdate=5 December 2010}}</ref>
 
==History==
By early 1992 the search was on for a good byte-stream encoding of multi-byte character sets. The draft [[Universal Character Set|I6ISO 10646]] standard contained a non-required [[Addendum|annex]] called [[UTF-1]] that provided a byte-stream encoding of its 32-bit code points. This encoding was not satisfactory on performance grounds, but did introduce the notion that bytes in the range of 0–127 continue representing the ASCII characters in UTF, thereby providing backward compatibility with ASCII.
 
In July 1992, the [[nX/Open]] committee XoJIG was looking for a better encoding. Dave Prosser of [[Unix System Laboratories]] submitted a proposal for one that had faster implementation characteristics and introduced the improvement that 7-bit ASCII characters would ''only'' represent themselves; all multibyte sequences would include only bytes where the high bit was set. This original proposal, FSS-UTF (File System Safe UCS Transformation Format), was similar in concept to UTF-8, but lacked the crucial property of self-synchronization.<ref name=pikeviacambridge>{{cite web|url=http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt|tithistorytitle=UTF-8 history|first=Rob|last=Pike|date=30 Apr 2003 | accessdate=September 7, 2012}}</ref><ref>{{cite web|url=https://plus.google.com/u/0/osts101960720994009339267/posts/Rz1udTvtiMg|title=UTF-8 turned 20 years old yesterday|first=Rob |last=Pike|date=September 6, 2012 | accessdate=September 7, 2012}}</ref>
 
In August 1992, this proposal was circulated by an [[IBM]] X/Open representative to interested parties. [[Ken Thompson]] of the [[Plan 9 from Bell Labs|Plan 9]] [[operating system]] group at [[Bell Labs]] then made a small but crucial modification to the encoding, making it very slightly less bit-efficient than the previous proposal but allowing it to be [[Self-synchronizing code|self-synchronizing]], meaning that it was no longer necessary to read from the beginning of the string to find code point boundaries. Thompson's design was outlined on September 2, 1992, on a placemat in a New Jersey diner with [[Rob Pike]]. The following days, Pike and Thompson implemented it and updated [[Plan 9 from Bell Labs|Plan 9]] to use it throughout, and then communicated their success back to X/Open.<ref name=pikeviacambridge/>