Understanding SMS Encoding

Understanding SMS Encoding

GSM-7 vs UCS-2: Why it matters and how to keep your messages efficient in YakChat

When you send a text message (SMS), your content is converted into a data format that mobile networks can transmit. What you may not realise is that the type of characters you use affects how your message is encoded, how many segments it uses, and how much it costs.

SMS messages are typically encoded using either GSM-7 or UCS-2. Knowing the difference between these two encoding formats—and how YakChat helps you manage them—can help you avoid unnecessary costs and message splitting.

What is GSM-7?

GSM-7 is the most common encoding format for SMS. It's highly efficient and used for standard English characters.

 Key Features:

    1. Uses 7 bits per character

    1. Supports up to 160 characters per message segment

    1. Includes:

      1. Upper and lower case letters (A–Z, a–z)·        

      1. Numbers (0–9)·        
      1. Common punctuation and symbols such as ?, !, @, &, :, and +·       

GSM-7 is ideal for most plain text messages. If you stay within its character set, you can maximize your message length and minimize costs.

What is UCS-2?

UCS-2 is a more expansive encoding format that supports a wider range of characters, including emojis and non-Latin scripts.

Key Features:  
  1. Uses 16 bits per character (double GSM-7)·        

  1. Supports up to 70 characters per message segment    
  1. Required for:

    1. Emojis

    1. Accented characters (e.g., é, ü, ñ)

    1. Non-Latin alphabets (e.g., Arabic, Chinese, Cyrillic)

    1. Smart punctuation (e.g., curly quotes)


The Switch to UCS-2 Is Automatic

You don’t need to choose the encoding type—it’s determined automatically based on the characters you type. If your message contains even one character that isn’t supported by GSM-7, YakChat (and the underlying mobile network) will automatically switch the message to UCS-2 encoding.

This change affects the entire message, not just the section with the special character, and it reduces your character limit from 160 to 70 per segment

Example:

Message: Thanks for your order ❤️

Although it looks short, the heart symbol is not supported by GSM-7. This triggers UCS-2 encoding, and now the message is limited to 70 characters per segment instead of 160.


Why Encoding Matters

Encoding impacts both the length of your message and how many segments it uses. If your message is too long, it will be broken into multiple parts (segments), and each segment is billed separately.

Segment Limits by Encoding: 


Encoding


Max Characters


When Split Into Multiple Segments


GSM-7


160


153 characters per segment

UCS-27067 characters per segment

Being aware of encoding helps you write more efficient messages and control messaging costs.

Tips to Stay Within Limits in YakChat

  1. Avoid emojis or extended characters unless necessary. These automatically trigger UCS-2 encoding.

  1. Use straight quotation marks and standard punctuation. Avoid smart quotes, long dashes, and symbols copied from rich text sources.

  1. Avoid copy-pasting text. For example from Word documents, emails, or websites, as it may contain hidden formatting or unsupported characters.

  1. Check the segment counter in the lower right-hand corner of the YakChat message tray. This displays how many segments your message will use and your current character count (e.g., 1 Segment | 64/160).

  1. Use the “Shorten” option. In the AI tools menu (paper icon with sparkle) to automatically condense your message while keeping it clear and readable. This is especially helpful if your message is approaching the segment limit.

YakChat provides these tools so you can confidently manage your messages and avoid unnecessary splits or encoding changes.