The difference between Unicode string
and byte string
is that Unicode string
is an abstract design, whereas byte string
is the abstract design’s realization in memory or on disk.
Every character has a code point according to the Unicode standard. However, if you want to write a Unicode string
onto disk, you have to encode it into bytes, which is the only thing that can be serialized onto disk.
When you open a file :
with open(fp) as f:
f.write(...)
If the thing that you are trying to write into the file on the disk has already been encoded (e.g., utf-8, latin), then the write method will just write the byte sequence onto disk. However, if the thing has not been encoded yet, then write method will use the system’s default encoding (i.e., ascii).
When you are using codec
package, it will allow you to specify an encoding when opening a file. Thus, you can just pass in Unicode string
without encoding. The write
method will encode the Unicode string
into byte sequence first before serializing it onto disk.