UTF-8: Perbedaan antara revisi

Konten dihapus Konten ditambahkan
LaninBot (bicara | kontrib)
k Perubahan kosmetik tanda baca
Bellacyntia (bicara | kontrib)
YTc3ZWI0NmEtCiBhNzdkLTNiOTYtYTc3OC1jNjRiNWE1N2EwODk
Tag: Suntingan perangkat seluler Suntingan peramban seluler
Baris 1:
'''UTF-8''' ('''[[:en:Universal Character Set|Universal Character Set (UCS)]] Transformation Format{{mdash}}8-bit'''<ref>{{Cite book|publisher=[[Unicode Consortium|The Unicode Consortium]]|title=The Unicode Standard|url=http://www.unicode.org/versions/Unicode6.0.0/|edition=6.0|publisher=The Unicode Consortium|location=Mountain View, California, USA|isbn=978-1-936213-01-6|chapter=Chapter 2. General Structure}}</ref>) adalah sebuah pengkodean karakter dengan lebar variabel tertentu (''[[:en:variable-width encoding|variable-width encoding]]'') yang mewakili setiap karakter komputer (''character'') dalam himpunan karakter [[Unicode]]. Didesain untuk ''[[:en:backward compatibility|backward compatibility]]'' dengan [[ASCII]] dan untuk menghindari komplikasi ''[[:en:endianness|endianness]]'' dan ''[[:en:byte order mark|byte order mark]]'' dalam [[UTF-16]] dan [[:en:UTF-32|UTF-32]].
 
UTF-8 telah menjadi metode [[pengkodean karakter]] (''character encoding'') yang dominan untuk [[World Wide Web]], meliputi lebih dari setengah jumlah seluruh halaman Web.<ref>{{Cite web|url=http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html|title=Unicode nearing 50% of the web|first=Mark |last=Davis|date=28 January 2010|work=Official Google Blog|publisher=[[Google]]|accessdate=5 December 2010}}</ref><ref name="BuiltWith">{{cite web
| url = http://trends.builtwith.com/encoding/UTF-8
| title = UTF-8 Usage Statistics
| publisher = BuiltWith
| accessdate = March 28, 2011
}}</ref><ref name="W3Techs">{{cite web
| url = http://w3techs.com/technologies/overview/character_encoding/all
| title = Usage of character encodings for websites
| publisher = W3Techs
| accessdate = March 30, 2010
}}</ref> [[Internet Engineering Task Force]] (IETF) mengharuskan semua [[protokol (komputer)|protokol]] [[Internet]] untuk mengidentifikasi ''[[Pengkodean karakter|encoding]]'' yang dipakai untuk data karakter, dan pengkodean karakter yang didukung (''supported character encoding'') untuk menyertakan UTF-8.<ref name="rfc2277">{{Cite journal |first=H. |last=Alvestrand |title=RFC 2277 |contribution=IETF Policy on Character Sets and Languages |publisher=[[Internet Engineering Task Force]] |year=1998}}</ref> [[:en:Internet Mail Consortium|Internet Mail Consortium]] (IMC) merekomendasi seluruh program e-mail dapat menayangkan dan membuat e-mail menggunakan UTF-8.<ref name="IMC">{{cite web
| url = http://www.imc.org/mail-i18n.html
| title = Using International Characters in Internet Mail
| publisher = Internet Mail Consortium
| date = August 1, 1998
| accessdate = November 8, 2007
}}</ref> UTF-8 juga terus meningkat penggunaannya sebagai ''default character encoding'' dalam [[sistem operasi]], [[bahasa pemrograman]], [[application programming interface|API]], dan [[aplikasi perangkat lunak]].
 
<!--
UTF-8 encodes each of the 1,112,064 [[code point]]s in the Unicode character set using one to four 8-bit [[byte]]s (termed "[[octet (computing)|octets]]" in the Unicode Standard). Code points with lower numerical values (i.e. earlier code positions in the Unicode character set, which tend to occur more frequently) are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with [[ASCII]], are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well.
 
The official [[Internet Assigned Numbers Authority|IANA]] code for the UTF-8 character encoding is <code>UTF-8</code>.<ref>{{cite web |url=http://www.iana.org/assignments/character-sets |title=CHARACTER SETS |publisher=Internet Assigned Numbers Authority |date=November 4, 2010 |accessdate=5 December 2010}}</ref>
 
==History==
By early 1992 the search was on for a good byte-stream encoding of multi-byte character sets. The draft [[Universal Character Set|ISO 10646]] standard contained a non-required [[Addendum|annex]] called [[UTF-1]] that provided a byte-stream encoding of its 32-bit code points. This encoding was not satisfactory on performance grounds, but did introduce the notion that bytes in the range of 0–127 continue representing the ASCII characters in UTF, thereby providing backward compatibility with ASCII.
 
In July 1992, the [[X/Open]] committee XoJIG was looking for a better encoding. Dave Prosser of [[Unix System Laboratories]] submitted a proposal for one that had faster implementation characteristics and introduced the improvement that 7-bit ASCII characters would ''only'' represent themselves; all multibyte sequences would include only bytes where the high bit was set. This original proposal, FSS-UTF (File System Safe UCS Transformation Format), was similar in concept to UTF-8, but lacked the crucial property of self-synchronization.<ref name=pikeviacambridge>{{cite web|url=http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt|title=UTF-8 history|first=Rob|last=Pike|date=30 Apr 2003 | accessdate=September 7, 2012}}</ref><ref>{{cite web|url=https://plus.google.com/u/0/101960720994009339267/posts/Rz1udTvtiMg|title=UTF-8 turned 20 years old yesterday|first=Rob |last=Pike|date=September 6, 2012 | accessdate=September 7, 2012}}</ref>
 
In August 1992, this proposal was circulated by an [[IBM]] X/Open representative to interested parties. [[Ken Thompson]] of the [[Plan 9 from Bell Labs|Plan 9]] [[operating system]] group at [[Bell Labs]] then made a small but crucial modification to the encoding, making it very slightly less bit-efficient than the previous proposal but allowing it to be [[Self-synchronizing code|self-synchronizing]], meaning that it was no longer necessary to read from the beginning of the string to find code point boundaries. Thompson's design was outlined on September 2, 1992, on a placemat in a New Jersey diner with [[Rob Pike]]. The following days, Pike and Thompson implemented it and updated [[Plan 9 from Bell Labs|Plan 9]] to use it throughout, and then communicated their success back to X/Open.<ref name=pikeviacambridge/>
 
UTF-8 was first officially presented at the [[USENIX]] conference in [[San Diego]], from January 25–29, 1993.
 
In November 2003 UTF-8 was restricted by RFC 3629 to four bytes in order to match the constraints of the [[UTF-16]] character encoding.
 
Google reported that in 2008 UTF-8 (misleadingly labelled "Unicode") became the most common encoding for HTML files.<ref name=markdavis>{{cite web|url=http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html|title=Moving to Unicode 5.1|first=Mark|last=Davis|date=5 May 2008 | accessdate=2013-03-01}}</ref><ref name=davidgoodger>{{cite web|url=http://www.artima.com/weblogs/viewpost.jsp?thread=230157|title=Unicode misinformation|first=David|last=Goodger|date=6 May 2008|accessdate=2013-03-01}}</ref>
-->
 
== Deskripsi ==
Desain UTF-8 dapat dilihat di tabel berikut yaitu skema yang asalnya diusulkan oleh Dave Prosser dan selanjutnya dimodifikasi oleh Ken Thompson (<code>x</code> diganti degan bit dari ''code point''):