Urdu on Computers

URDU on Computers

(Mac, Windows, and Linux)

by

Kamal Abdali


DOWNLOADS (for the impatient)

Urdu Keyboards For MacOS, Windows, and Linux

Download the file appropriate for your machine. Unzipping it will produce a folder. Open this folder and follow the instructions in the readme file.

Here are some additional keyboard layouts. They are described in Additional Keyboard Layouts

Source codes for generating the above seven keyboard layouts are all downloadable from links on this GitHub page.

One-page Keyboard Maps

TeX source files for generating the above maps are downloadable from the keyboard layout links on this GitHub page.

Inpage To Unicode Text Converter

Download the file appropriate for your machine. Unzipping it will produce a folder. Open this folder and follow the instructions in the readme file.

Source codes for building the above converter applications are downloadable from links on this GitHub page.

Fonts

TUTORIALS



Your computer (running MacOS, Windows, or Linux) is capable of editing and word processing in Urdu. In a few simple steps, you can enable your machine to handle Urdu documents.

The first few paragraphs describe the procedures for keyboard installation, keyboard activation, and font installation. The remaining sections about using Urdu on computers are mostly illustrated for Macs but with obvious adaptations they also apply to Windows and Linux machines. Separate instructions are provided for the latter two systems where a different treatment is needed.

Starting with MacOS 10.7 (Lion), the Macintosh supports Urdu natively. Similarly, starting with version 8, Windows supports Urdu natively. If you are satisfied with their built-in Urdu keyboards, then skip the UrduQWERTY keyboard installation sections. But if you aim for perfection, then read on!

The information on this Web site can also be used to compose Persian (Farsi) documents. The keyboard, fonts, and explanations below apply equally to Persian, but the explanations are illustrated with Urdu text. The keyboard and fonts also suffice for Punjabi (Shahmukhi or Pakistani style, written in the Arabic script), Arabic, and Ottoman Turkish.

The main difference among the operating systems as far as Urdu typing is concerned is in how Urdu sysmbols are input. For typing various Urdu symbols, you have to press a key either by itself (plain input) or together with a modifier. On the Mac, the modifer is either 1) Shift or 2) Option or 3) Option and Shift pressed together. For Windows and Linux, the modifer is either 1) Shift or 2) Right-ALT or 3) Right-ALT and Right-Shift pressed together. The other difference is that to edit and manipulate files you, of course, need to run different applications under different operating systems.

Please email your enquiries, comments, criticisms, and suggestions to me at kabdali@gmail.com

(Back to CONTENTS)


Installing the Urdu keyboard layout

[Note that we are talking about an Urdu keyboard layout, not an Urdu keyboard. This layout will enable you to type Urdu characters using the standard English keyboard that came with your computer.]
  1. Download the zip archive file for the keyboard layout appropriate for your OS. Specifically, click here (UrduQWERTY-v7mac.zip) for MacOS; here (UrduQWERTY-v7win.zip) for Windows; or here (UrduQWERTY-v7linux.zip) for Linux.
  2. Double-click on the downloaded zip file and save the resulting folder. The folder's name will be UrduQWERTY-v7mac for MacOS; UrduQWERTY-v7win for Windows; or UrduQWERTY-v7linux for Linux).
  3. One of the files in this folder is a pdf file with the name UrduQWERTYkeyboardMac.pdf for MacOS, or UrduQWERTYkeyboardWinLnx.pdf for Windows and Linux. It is a printable one-page map showing the assignment of keys in the Urdu QWERTY keyboard layout to Urdu letters and symbols. It is suggested that you keep a printed copy of it for reference while you're getting used to these key assignments. The one-page keyboard maps are also downloadable from here for MacOS, and here for Windows and Linux.
  4. Another file in the saved folder is a text file with instructions for installing the keyboard layout and activating it. It's name is readme-mac.txt for MacOS, readme-win.txt for Windows, and readme-linux.txt for Linux. Please follow the instructions in this file carefully.

(Back to CONTENTS)


Switching Between keyboard layouts

Each keyboard layout is identified by a name and an icon. Our keyboard layout's name is Urdu-QWERTY in MacOS and Urdu (Pakistan) in Windows and Linux. Its icon Urdu-QWERTY Keyboard Icon looks somewhat like the national flag of Pakistan.

If two or more keyboard layouts are installed and activated on a system, then the computer screen shows the active layout by displaying its name and its associated icon, at top right in the menu bar in MacOS and near the right end of the taskbar in Windows and Linux. The default keyboard icon is likely to be a US flag for English. To switch to another keyboard layout, you have to do this:

- For MacOS, either 1) click on Command and space keys together, and repeat this until the desired layout shows up on the menubar, or 2) click on the keyboard layout icon at the right on the menu bar and, from the menu that appears undermeath it, select the desired layout.

- For Windows and Linux, click on Left-Shift and Left-Alt keys together (or, if there is a Windows key on your keyboard, click on Windows and space keys together), and repeat this until the desired layout shows up on the taskbar.

(Back to CONTENTS)


Installing Urdu Fonts

Your computer comes with dozens of fonts especially developed for Arabic and Persian languages, and most of them also include the Urdu letters not found in those languages. Moreover, many fonts bundled with the operating system have as their goal the support of a vast repertoir of Unicode characters, including Arabic. Examples are Times New Roman, Arial, Arial Unicode MS, and Courier New. The shapes of letters in some of these fonts do not always please Urdu readers. So suggested below are some additional fonts that you might consider installing.

But first here is how fonts are installed. Most fonts that that you download from the Web come packaged into zip files. Unzipping such a file produces a folder that contains the proper font files and some additional files related to them. Most font files are of type .ttf or .otf. To install fonts in MacOS, copy the files of type .ttf and .otf (and of any other tyes that you recognize to be fonts) into the folder /Library/Fonts. To install fonts in Windows or Linux, the simplest method is to double-click on the files of type .ttf and .otf (and of any other types that you recognize to be fonts) and select the Install button in the box that opens.

Nastaleeq Fonts

Very popular in this style are the Jameel Noori series of fonts. You can download them from here. Double clicking on the downloaded zip file will create a folder that will include the two font files Jameel Noori Nastaleeq.ttf and Jameel Noori Nastaleeq Kasheeda.ttf. Install these two.

Awami Nastaliq, developed by SIL International, is another recommended font. It handles "non-dictionary words" and diacritics better than most other nastaleeq fonts. To check its description and download it, visit AWAMI.

Naskh Fonts

A free and very attractive set of fonts is XB Zar distributed by the Iran Mac Users Group (IRMUG). Developed originally for the Mac, these fonts work equally well in Windows and Linux computers. To obtain them

  1. Visit IRMUG's X Series 2 page by clicking on this link .
  2. On the page that opens, scroll down to the section Download fonts, and then click on the link given to download the XB Zar font.
  3. Save the downloaded file as Zar.zip.
  4. Click on the downloaded file to get a folder named Zar.
  5. Install the fonts contained in this folder.

Scheherazade New and Lateef are two other free, very high quality Naskh fonts for preparing pleasant-looking Urdu documents. Developed by SIL International, they can be downloaded from visiting this link.

More font choices are discussed in the sections More Nastaleeq Fonts and More Naskh Fonts further below. I also recoommend taking a look at Knut Vikor's excellent page The Arabic Macintosh that has a detailed discussion of various Arabic Fonts with interesting information about them and links to obtain them.

Your computer is now ready for handling documents in Urdu!

(Back to CONTENTS)


The Urdu QWERTY Unicode Keyboard

The Urdu-QWERTY keyboard layout that you have downloaded has been designed to closely resemble the phonetic keyboard of InPage, a popular commercial desktop publishing application for Urdu that runs under Windows.

The advantage of a QWERTY (also called, phonetic) keyboard is that keys are assigned to letters based on letter sounds; e.g., the key "b" for the Urdu letter "bay", "p" for "pay", "k" for "kaaf", "g" for "gaaf", and so on. Such an assignment helps you remember most of the keys. As we do not have enough keys on the standard computer keyboards to assign to all Urdu characters, we need to use shifted keys for some (e.g., "shift-k" for "khay", "shift-g" for "ghain", etc.)

In addition to being phonetic, this is also a Unicode keyboard layout. Whatever you type is converted to its Unicode representation which is the modern universal character encoding used in computers for multi-lingual texts.

In MacOS, the Input menu (Keyboard menu) has an item Show Keyboard Viewer. If you select this item, then the system will display a picture of the keyboard on the screen. By default, the picture is very small. But you can make it larger or smaller like any window by pulling the handle at its right-bottom corner with your mouse. In this picture you can see what character each key corresponds to. The characters will change appropriately if you press the shift key or option key (or another modifier key) or select a different keyboard from the Input menu. Windows and Linux do not provide a similar feature for quickly displaying the picture of the keyboard layout.

For reference, below are larger pictures of the Urdu-QWERTY Keyboard showing the characters corresponding to the keys in plain, shift, option, and option-shift modes. (The light marks shown on the left corner of keys identify the keys on a Western keyboard.) For a pdf file with printable keyboard pictures, click here. (This file was updated on 2023-03-17.) You can then print the keyboard pictures for reference.


Urdu-QWERTY Keyboard: Symbols generated without any modifier key (Shift, Option, Caps Lock) pressed

uq7macplain.jpg


Urdu-QWERTY Keyboard: Symbols generated with Shift pressed

uq7macshift.jpg


Urdu-QWERTY Keyboard: Symbols generated with Option pressed

uq7macoption.jpg


Urdu-QWERTY Keyboard: Symbols generated with both Option and Shift pressed

u7macoptionshift.jpg

With one exception, the above pictures contain all the information that there is to give about the Urdu-QWERTY key assignments. The exception is this: The same digit keys can generate digits in three different shapes: (1) Western shapes of digits when no modifier key (SHIFT, OPTION, CAPS LOCK) is pressed; (2) Urdu shapes of digits when CAPS LOCK is pressed and SHIFT or OPTION are not pressed; and (3) Arabic shapes (suitable for Naskh fonts) when OPTION is pressed.

The CAPS LOCK key has no effect on other keys. So if you like your digits to be displayed in their Urdu, and not the Western, shapes, then you can just leave CAPS LOCK depressed, and release it only to type a symbol which requires both SHIFT and OPTION keys to be pressed.


Even though we have tried to make the keyboard layout as phonetic as possible, the mismatch between the Urdu alphabet and the available keys on a Western keyboard has forced us to make some unintuitive mapping between letters and keys. But with a little practice you should be able to type most letters from memory.

For quick reference here are some tables of useful key bindings.


BINDINGS

Keys for homophone (similarly sounding) letters

Similar sounding letters

Some unobvious key bindings

Uncommon bindings

Another useful key is Shift-" (i.e., the double quote key) that generates the dash-like Kasheeda character. This character is mostly used to strech horizontal components of letters. (This is described in more detail in the section The KasheedaFeature).

(Back to CONTENTS)

Similar Letters in Urdu, Persian, and Arabic

The following tables lists the letters which are similar in Urdu, Persian, and Arabic, but are treated as different Unicode symbols. You should carefully choose the key bindings appropriate to the language of the text being typed.

Unicode Also called Languages Positional Forms Key
Name Shape Initial Medial Final Isolated
Keheh ک Kaf Urdu, Persian کـ ـکـ ـک ک k
Kaf ك Arabic كـ ـكـ ـك ك option k
Heh ه Persian, Arabic هـ ـهـ ـه ه option o
Teh Marbuta ة Persian, Arabic ـة ة option shift O
Heh Goal ہ ChoTi Heh Urdu ہـ ـہـ ـہ ه o
Teh Marbuta Goal ۃ Urdu ـۃ ة shift O
Heh Dochashmee ھ Urdu ـھـ ـھ h
Farsi Yeh ی Urdu, Persian یـ ـیـ ـی ی i
Yeh ي Arabic يـ ـيـ ـي ي option i
Alef Maksura ى Yeh KhaRa Zabar Arabic ـىٰ option u

Note, in particular, that the key "o" is bound to the Urdu letter "Goal Heh" (also called "ChoTi Hay"), and the key "Option-o" is bound to the Arabic/Persian letter "Heh". Both letters have the same shape when standing alone i.e., in the isolated position. But their shapes in initial, medial, and final positions are different. Worse, in the medial position, they can be confused with the Urdu letter "Dochashmi Hay" (key "h"). You have to make sure to use Goal Hay and Dochashmi Hay with Urdu fonts, and Arabic Hay with Arabic and Persian fonts; otherwise the letter might not be rendered properly.

In Urdu and Persian, the letter "yeh" has two dots underneath in initial and medial positions, but none in final and isolated positions. In Arabic, this letter has those dots in all positions. You should use the key "i" when typing in Urdu or Persian, and "Option-i" when typing in Arabic.

The letter "alef maksura" (key Option-u) is always dotless. In Urdu, it appears only in Arabic words, and is treated as the same as the "ChoTi yeh" i.e. "Farsi yeh" (key "i"). It is traditioanl to decorate this letter with a "khara zabar" (key Shift-I) above it.


Some Old Letter Shapes

Urdu letters didn't always have the shapes they have today. The UrduQWERTY keyboard lets you type the old, four-dotted forms ٿ , ڐ , and ڙ , respectively, of the letters whose modern forms are ٹ , ڈ , and ڑ . Those obsolete forms have been included in the layout to facilitate text searches in important digitized old documents and to enable reproduction of extracts from their text for quotation and other research neeeds.

Some diacritical marks and special combinations

(This picture was updated on 2015-10-12.)
Diacritics and combinations


(Back to CONTENTS)

Preparing Urdu Documents

Email Messages

An email message is usually a very simple document. You can try writing your messages in Urdu to test your Urdu set up. Your messages should be readable by most of your correspondents as almost all computers now have built-in fonts that correctly display Urdu characters. An occasional character in these messages might be undecipherable, and might be replaced with its unicode icon (or some gibberish, in the worst case).

Both Gmail and YahooMail systems work admirably when the Urdu input is turned on and message composition is in Rich Text format with right-to-left text direction. With Hotmail, mixing right-to-left and left-to-right text in the same line seems problematic, as that seems to interfere with the correct sequencing of words. In each case, the cursor behavior is a bit erratic; the cursor sometimes shows up at the right end of the line instead of being at left next to the last word typed. But you can ignore all that since your text is still set correcly. The cursor behavior will likely be fixed by Apple and email system producers anyway.

It helps to set the Web browser preference for Default Character Encoding to Unicode (UTF-8). If you use the Firefox Web browser, then in its preferences set the Default Font to XB Zar, Size 18, all Fonts for Arabic to XB Zar, Monospace Font Size to 16, and other sizes to 18. The Safari Web browser does not allow language by language font control, so just set Standard Font to XB Zar 18. If your computer does not have the XB Zar font installed, then use the Times New Roman font that is built-in on most machines these days.

Google has developed an input method based on transliteration of text typed using letters of the Latin (i.e., Roman or English) alphabet. This method, called Google Transliteration Input Method Editor (IME), is available for Urdu. It lets you enter Urdu words using Latin characters phonetically. Google Transliteration IME will convert the text, based on its sound, to Urdu characters. The conversion is quite liberal so the correct Urdu word will result from most of its reasonable phonetic Roman spellings. By the same token, the same Roman text can lead to several different Urdu words. In the latter case, you will be able to choose one of the words from a menu. For example, the Roman text "sada" can correspond to the Urdu words سادہ (meaning simple), سدا (always), صدا (sound), and possibly others. So if you type "sada" in Google Transliteration IME for Urdu, the system will display these Urdu words in a menu, from which you can select your intended word.

IME saves you the trouble of installing keyboard layouts for the languages you like to type in. But experimenting with IME is likely to convince you that it is more efficient to type Urdu text directly by using the Urdu-QWERTY keyboard layout, rather than via IME. The mismatch between Roman and Urdu alphabets is so substantial that there are too many phonetic Roman spellings for the same Urdu word, and too many Urdu words result from the same Roman text. So while using IME, you are likely to be wasting much time trying and selecting alternatives.

IME is available as an option in Gmail. It is also available for MacOS as a service so you can use it with applications such as word processors.

Gmail also allows specifying Urdu as the application language. In that case all the menus, titles, warnings, etc., will be translated to Urdu. But you do not need this drastic setting in order just to read and write Urdu email messages. While keeping English as the system working language, you can type Urdu text by selecting the Urdu input. Of course, you can also intersperse texts in Urdu and English by simply switching back and forth between Urdu and English inputs.

(Back to CONTENTS)

Traditional Documents, Editors for Urdu

For the Mac, there are two free office suites LibreOffice and OpenOffice offering powerful word processors that can be used to edit Urdu documents. These are multi-platform open-source software applications with functionality very similar to that of the commercial product Microsoft Office. Moreover, Microsoft itself offers a free online version of Word (the word processing component of Microsoft Office) that can be used via a desktop browser. If you want to work with one or more of them, you can find detailed descriptions and installation instructions on their official Web sites LibreOffice, OpenOffice, and Microsoft Office Online. We will not describe their use.

Google Docs is a word processor with which you work online without having to install any application on your computer. It is free and easy to use, and includes many of the frequently used features found in more elaborate office suites. To try it, visit Google Docs.

Of course, the Mac's built-in application TextEdit is itself an excellent editor for most needs. TextEdit is considered a text editor rather than a word processor. Yet it can be used for composing documents with multilingual text, embedded graphics, tables, and other advanced features typically found only in large, expensive software applications. Its advantage is that it does not need to be installed: It is always there, and is the Mac's default editor for text files.

In Plain Text mode, TextEdit allows only a single font and a single paragraph justification style for the entire document. In Rich Text mode, you can mix various font families, font sizes, font styles (e.g., bold, outlined, shadowed), and justifications (e.g., centered text, or text justified at left or right or both sides). You need to use Rich Text since Plain Text does not work well with Urdu.

To start a new Urdu document, select the menu item File > New, then Format > Text > Writing Direction > Right to Left. Set the Input menu (Top Right) to Urdu-QWERTY. Choose fonts, font styles, size, colors, etc., as is usual with most word processors. Since the default formatting is Plain Text, switch to the Rich Text Format by doing Format > Make Rich Text. Now you can apply a different justification to each paragraph, and a different formatting style to each selection.

It is helpful to also do Format > Font > Show Fonts. This puts on the screen a font palette which is convenient for choosing font family (e.g, XB Zar or Jameel Noori Nastaleeq), size, color, etc. The XB Zar font family also includes typeface variants such as italic, bold, and bold italic. (But see the note in the section More Naskh Fonts about the use of italics.) The system seems to unpredictably switch the font sometimes to Geeza Pro (the "system default font" for the Arabic script). So you need to watch the font palette and, if necessary, change the font back to what you want it to be.

You can configure TextEdit to use your favorite font as the default. To do this, start TextEdit, and on the Menu bar click on TextEdit, then on Preferences, and then on the New Document tab. If the Rich Text radio button is not active, click on it. Click on the Change... button next to Rich text font: . The font dialog will open. Now, select, for example, XB Zar in the family column, and 18 in the size column, then close the dialog. Finally, close Preferences, and quit TextEdit. When you restart TextEdit, it will use the Rich Text format and the XB Zar size 18 font as the default for new documents.

Here is an image of a portion of the Mac screen during the editing of an Urdu document using TextEdit.


TextEdit In Action

TextEdit in Action


(Back to CONTENTS)

More Naskh Fonts

X Series 2 is a set of free, high quality, attractive Naskh fonts that support Urdu and Persian. These come with matched groups of regular, italic, bold, and bold italic characters; some even have outline and shadow variants. The X Series 2 fonts are downloadable from the Iran Mac User Group Wiki site. Particularly nice font families on that site are XB Zar, XB Yas, and XB Niloofar for general editing, XB Titre for headings and titles, and XB Kayhan Navaar, XB Kayhan Pook, and XB Kayhan Sayeh for further special effects.

NOTE: In the regular typeface of Naskh fonts, the "vertical" strokes of Alif, Laam, etc., are actually drawn with a slight tilt to the left. The italic Naskh typefaces of X Series 2 fonts are designed by slanting the same strokes a bit to the right. Some font families in this series also have oblique typefaces in which the strokes are slanted even more to the left than in the regular typeface. But since few Urdu letters contain prominent vertical elements, the italicized (or oblique) text in Urdu does not stand out well. (This is in contrast to the Latin alphabet where nearly every letter has vertical strokes.) The boldface text in Urdu is, of course, quite noticeable.

Dozens of free Naskh fonts can be downloaded from the Internet. But you need to experiment with them to pick the ones that are of good quality and work with the whole Urdu alphabet. Some of them have been adapted from Arabic or Persian, without extending them properly for the additional letters of Urdu. You should check, in particular, whether these fonts properly display all the needed forms of the letters "Noon Ghunna" ں, "Goal Hay" (also called "ChoTi Hay") ە, "DoChashmi Hay" ھ, and "Bari Yay" ے .


(Back to CONTENTS)

More Nastaleeq Fonts

In Mac OS 10.5 and above, it is possible, with some care, to use Nastaleeq fonts with TextEdit and Bean. Some freely available Nastaleeq fonts are:

A variety of Nastaleeq as well as Naskh fonts are available for download from Urdu Web, Urdu Jahan, Barqi Kitaben Urdu Library, and Deedahwar. Please be warned though that the InPage company alleges that some freely available Nastaleeq fonts are pirated from their work.

NOTE: To install a font file, copy it to /Library/Fonts. On Mac OS 10.6 and above, you can also install a font file by double clicking on it, then clicking on the Install Font button in the dialog box presented to you. Any font installed by the latter method is copied to ~/Library/Fonts where it is available to the active user but not to other users.

Nastaleeq works satisfactorily with the message composer in Google's Gmail. You need to have the rich text style activated and the Right-to-Left text direction turned on. Since Gmail composer's font menu is fixed and there is no Nastaleeq font in it, how do you make the composer use Nastaleeq? The only way that seems to work is to start a message with some Nastaleeq text copied from another Gmail message. Then while you edit this text, Gmail preserves the current text font! But keep in mind that any text copied from elsewhere, e.g., a TextEdit window, won't be rendered in Nastaleeq, so there is no sense starting a message with such text for the purpose of composing a message in Nastaleeq.

Nastaleeq works very well with TeX, discussed below in the section Typesetting Using TeX, LaTeX, XeTeX. TeX allows easy control over font size and interword spacing. For best results, use Open Type (OT) fonts with TeX.

OBSERVATIONS:

  1. A problem with Nafees Nastaleeq under TextEdit is that words sometimes get clipped at top or bottom. Perhaps the culprit is TextEdit's poor management of vertical space. The solution is to provide more generous spacing. For this, click on the Spacing pull-down menu on the formatting bar in the TextEdit window, select the Other... menu item, and adjust the Line height multiple and Inter-line spacing values for a satisfactory display.
  2. IranNastaliq is undoubtedly the most elegant and stylish of all currently available Nastaleeq fonts. But designed for traditional Persian-style Nastaleeq typesetting, it cannot handle the letters particular to Urdu. Specifically, this font does not support the Urdu retroflex letters "Tay" ٹ , "Daal" ڈ , and "Ray" ڑ , and the letter variants "Noon Ghunna" ں , "Dochashmi Hay" ھ , and "BaRi Yay" ے . Actually some of these letters do get displayed, but they are not properly connected. For proper rendering of "Goal Hay" (ہ , also called "ChoTi Hay"), this letter should be typed as Option-o, not o.
  3. For Urdu, Nafees, Alvi Lahori, and Jameel Noori fonts are comparable in the calligraphic quality of shapes, with the latter two being somewhat more pleasings, and the last one being perhaps the most attractive. But the difference in their appearence might be a result of kerning variations and word processor functions.

Here is an image of the TextEdit window during the eding of a document mainly with the Nafees Nastaleeq font (size 48 for the title, size 36 for the author's name, size 22 for the text body).


Nastaleeq Fonts Used In TextEdit

Nastaleeq Fonts Used In TextEdit


To contrast the Urdu and Persian Nastaleeq styles, here is the image of a document set in the IranNastaliq font. Notice that compared to the Urdu sample, the strokes in the Persian sample are more consistent and uniform, and the latter sample more closely resembles manually calligraphed old manuscripts. Specially pleasing to the eye are the long slanted strokes (called "markaz") of kaaf and gaaf, and the stretched horizontal parts of letters like bay, tay, kaaf, etc.

NOTE: For the Persian letter Hay (ه), type the key Option-o. The Urdu letter Goal Hay (ہ), typed with the key o, will not get properly connected within a word when a Persian font is being used.


Persian Style Nastaleeq Font Used In TextEdit

Persian Nastaleeq Font Used In TextEdit


(Back to CONTENTS)

The Kasheeda Feature

(This section was added on 2012-07-30.)

Worth special mention is the "kasheeda" feature of Iran Nastaliq to extend the horizontal strokes of letters. Essentially, by typing the Kasheeda character (key shift-") one or more times either before or after a letter, that letter can be extended arbitrarily. The judicious application of kasheeda makes the appearance of a document more pleasing by introducing variety in the shape of letters. It is also useful for highlighting titles and headings and for visually balancing the lines in poetry. There are long established calligraphic conventions as to where kasheeda extension is permissible in nastaleeq and where it is not. (See this Persian document.)

The kasheeda feature is not specific to Nastaleeq; it can be applied to Naskh fonts as well. In Naskh fonts, all horizontal strokes and letter joints are positioned at the same level in a line of text. The kasheeda character is a horizontal dash, so it can be added essentially to any letter that has a horizontal component. In Nastaleeq fonts, even the strokes that seem horizontal are not truly horizontal but have subtle slopes and slants. Also, they keep varying in thickness, and have to curve anyway at letter joints. So the kasheeda feature is quite difficult to incorporate in Nastaleeq fonts. Undoubtedly, the handling of kasheeda in Iran Nastaliq is masterly!

The kasheeda approach of Iran Nastaliq is superior to the one taken by the special "Kasheeda fonts" of Urdu (e.g., Jameel Noori Nastaleeq Kasheeda). Iran Natstaliq allows the user to apply kasheeda extension selectively to any chosen occurrences of any chosen letters. By contrast, the Urdu Kasheeda fonts have built-in kasheeda extensions in certain letters and ligatures. These fonts also violate the Kasheeda conventions sometimes.

The sample below shows the use of Kasheeda in two different nastaleeq fonts. Unfortunately, the Kasheeda feature of Iran Nastaliq does not yet work fully under MacOS, so the sample has been prepared on a Windows machine.


Kasheeda Application in Nastaleeq Fonts

Kasheeda Application in Nastaleeq Fonts

(Back to CONTENTS)

InPage Files and Their Conversion to Unicode Text

(This section was last revised on 2015-10-20.)

InPage is a commercial desktop publishing application for Windows. It is widely used by publishing houses for producing Urdu publications because of its rich feature set, multi-lingual and multi-script capabilities, and robustness. Until recently, it was one of very few applications that could produce high-quality Nastaleeq documents.

Unfortunately, InPage works only on Windows and, moreover, uses proprietary document structure and fonts. Naturally, there is much interest in converting InPage files into alternate, more portable versions that could be processed on multiple computing platforms with multiple applications. So several online tools and programs have become available to convert Inpage files to Unicode text files. In fact, InPage itself has a unicode coversion facility of sorts through its copy and paste Edit menu items.

The coversion programs that I tried turned out to have errors or limitations. Some of them require you to have a running Inpage to display the file to be converted. This severely reduces their utility, since what you would really want is a program to simply take an InPage file as input and create an equivalent Unicode text file as output. So I had to write such a program myself. This program is available as a standard application for Windows XP/7/8/10 and a command line application for MacOS and Linux.

Download the converter package appropriate for your machine.

NOTE: Source codes for building the above converter applications are downloadable from links on this GitHub page.

Unzipping the downloaded file would produce a folder containing the converter application and a readme file with instructions.
In case of MacOS or Linux, make the application file executable.
Copy the application file to the directory where you keep Inpage files.

The instructions to use the conversion application are in the readme file. Briefly, to convert an existing InPage file with the name, say, story.inp , into a new Unicode text file to be created with the name story.txt, do this:

MacOS

  1. Open a Terminal window by double clicking on Terminal in the /Utilties directory.
  2. Use the "cd" command to get into the directory of your InPage files.
  3. Type the command

        ./InpToUni-mac  story.inp  story.txt

    and press Enter.
    CAUTION: If your files or the saved application are in different directories, then make sure to use the right path for each file.

Windows XP/7/8/10

  1. Double-click on the InpToUni-win (or InpToUniTxt-win) application to launch it.
  2. You will be asked to choose the existing InPage file.
  3. Go through the folders as usual to locate tnd select the file (say, story.inp). Then press OK.
  4. You'll be asked to choose the new file where the converted Unicode Text will be written (say, story.txt). Type its name or select an existing txt file, then press OK.
  5. The converted Unicode text file will be created, and the application will quit.

Linux (32- or 64-bit)

  1. Use the "cd" command to get into the directory of your InPage files.
  2. Apply one of the following commands approporiate to your operating system:

    •     ./InpToUni-lnx32  story.inp  story.txt
    •     ./InpToUni-lnx64  story.inp  story.txt

    and press Enter.
    CAUTION: If your files or the saved application are in different directories, then make sure to use the right path for each file.

Processing Converted Unicode Text Files

The InPage document's formatting properties (justifications, fonts sizes and styles, colors, etc.) are lost during the conversion. The conversion mainly consists of text extraction. So you have to format the raw text file produced by the conversion application.

MacOS: To read/edit the converted file under MacOS:

  1. Open the converted file (e.g., story.txt) in TextEdit.
  2. Switch TextExit's mode to RichText and the Text Direction to RightToLeft.
  3. Do Edit > Select All to select the whole file in the TextEdit window.
  4. Change to your preferred font, such as Jameel Noori Nastaleeq, XB Zar, etc.
    NOTE: Changing the Text Direction to right-to-left and using a proper font is crucial. Otherwise, some Urdu characters may not be rendered properly.

Windows XP/7/8/10: To read/edit the converted file under Windows:

  1. Open the converted file (e.g., story.txt) in your favorite word processor. We assume it is MS Word.
  2. MS Word would present a dialog. In this dialog, select Text encoding Unicode and Document direction Right-to-Left , and then press "OK".
  3. The file would then be opened, and would be readable although its font may not be to your liking.
  4. Format the file by changing its font, font size, etc. as you please.
  5. Save it as a doc or docx file since that would make its further editing easy.
    NOTE: Changing the Text Direction to right-to-left and using a proper font is crucial. Otherwise, some Urdu characters may not be rendered properly.

Linux: To read/edit the converted file under Linux, follow a procedure similar to the one given above for MacOS.

(Back to CONTENTS)


Typesetting Using TeX, LaTeX, XeTeX

Skip this section if you are not interested in programmatic typesetting.

The term TeX is used here in a generic sense for TeX (a system invented by Donald Knuth) or any of its derivatives, such as LaTeX, AMSTeX, ArabTeX, XeTeX, ArabXeTeX, etc. But when the discussion is about a particular derivative, that system is mentioned by name, e.g., XeTeX.

When you use a word processing system such as MS Word, you take formatting actions yourself, and the system keeps displaying the document as it changes in response to your actions. When you use TeX, you put the document contents (text, images, etc.) and the formatting instructions together in one or more tex files, and the system processes them to produce the desired document. The tex files you prepare constitute a TeX program. The TeX system executes this program to produce the desired document, typically in the form of a PDF file.

TeX has a steep learning curve, but once mastered it allows you to produce very complex, high-quality documents, and provides you very fine control over the look and feel of the document. TeX is widely used for producing scholarly works, and many scientific journals and conferences require that articles be submitted to them in the form of tex files.

Various distributions of TeX are available for MacOS, Windows, and Linux. We highly recommend the MacTeX distribution downloadable from here for MacOS, and the TeX Live distribution downloadable from here for Windows and Linux. We also highly recommend the graphic environments TeXShop (by Richard Koch) for MacOS and TeXWorks (by Jonathan Kew et al) for Windows. These are included in the distributions just mentioned, and are very similar to use (the development of the latter was inspired by the former). So below we illustrate the use of TeX only with the instructions for TeXShop running under MacOS.

The above distributions includes most of the components needed for processing Urdu documents. In particular, they includes the system called XeTeX (originally designed by Jonathan Kew) which currently offers the best facilities for Urdu.

XeTeX has overcome two limitations that were not satisfactorily addressed by the previously existing derivatives of TeX, and that, in particular, greatly hampered the production of Urdu documents with TeX:

  1. For its input, TeX originally used only the ASCII character set, so other characters had to be encoded as ASCII character combinations. XeTeX incorporates Unicode, thus giving access to nearly all of the world's written languages. For example, Urdu characters can now be typed directly within tex files.
  2. For its output, TeX originally used only a small number of fonts. More fonts could be added but they had to be specified using METAFONT, a companion program that came with TeX. XeTeX allows one to use most of the fonts installed on one's computer. For example, the program ArabTex could previously be used for Urdu, but the output was restricted to a single, Naskh-style font. Now there are dozens of fonts in different styles that can be used in Urdu documents with XeTeX.
So for the TeX approach to typesetting Urdu documents, you have to know LaTeX and XeTeX. We will not provide any details about TeX, LaTeX, and XeTeX beyond some cursory information. You have to learn these on your own! It would make sense for you to start reading about XeTeX only after you have become sufficiently familiar with LaTeX to be able to create some documents with it! There are hundreds of books, articles, and tutorials about LaTeX. An excellent tutorial is The Not So Short Introduction to LATEX2ε. A very comprehensive on-line reference is The LaTeX wikibook. TeXShop itself has a number of online books and tutorials under its Help menu. For XeTeX, the essential reference is The XeTeX typesetting system from SIL International where the system was designed. Also a useful reference is the 100+ page online document with much historical background, examples, and practical hints, The XeTeX Companion.

TeXShop makes it very easy to edit and execute TeX programs. When you launch TeXShop, it brings up a window in which you can edit your TeX program, i.e., your tex source file. But you should first set TeXShop's Preferences. The most important preferences are in the Source and Typesetting tabs.

  1. Click on the Source tab to open it. In the Editor block, check the box with Arabic in it. Under Encoding, select the item Unicode (UTF-8). In the Document Font block, select the item XB Zar 16pt or Times New Roman 16pt (this should make the TeX source more readable).
  2. Click on the Typesetting tab to open it. In the Default Command block, check the Command Listed Below radio button, and in the text field below it type "XeLaTeX". (You can change the Typesetting command to LaTeX or other choices on the editing window itself.)
    In the Default Script block, make sure that the Pdftex radio button is turned on.
After the Preferences are taken care of, you are ready to edit your source file. The TeXShop editor is powerful, and yet very simple and intuitive. For typing the Urdu content, you have to use the Urdu-QWERTY keyboard layout, of course. But make sure to switch to the US English keyboard while typing TeX commands and the symbols that go with them, such as \, &, {, }, [, ], !, %, etc. When numbers and delimiter characters are mixed with Urdu alphabetic text, the sequence of symbols sometimes appears wrong. You just have to tolerate such disorder at present. To avoid confusion, it might be helpful to put Urdu and non-Urdu symbols on different lines.

Since preparing TeX files for Urdu documents requires frequent switches between the Urdu keyboard (for text) and the English keyboard (for special characters and TeX commands), you might consider setting up Keyboard shortcuts for that purpose. Refer to the section Keyboard Shortcuts for Changing Keyboards further below.

Note that TeXShop's default in the source window is to display the TeX commands in blue, comments in red, and other text in black. Also, you will notice that TeXShop uses the first character of each new line of text in the source window to determine whether to start displaying the line from the left or from the right end of the window. If the first character of the line belongs to a left-to-right script (e.g., English), then the line is started at left. But if the first character of the line belongs to a right-to-left script (e.g., Urdu or Persian or Arabic), then the line is started at right. Characters like space and certain punctuation symbols are considered belonging to left-to-right scripts, and cause the line to be started at left.

Once the editing of your tex file is complete, you should click on the Typeset button. TeXShop will process your tex file, and will display the resulting PDF file if the program ran successfully. It will also bring up a Console window with progress and error messages.

We now give an example of typesetting an Urdu ghazal using XeTeX. The first image below shows the TeX program, poemRA.tex. XeTeX (actually the program xelatex) executes this file to produce the PDF file poemRA.pdf, shown in the next image.

The TeX program uses the fontspec package to gain access to the fonts installed on the computer. The program uses Vafa Khalighi's bidi package for the text's bidirectionality (i.e., to handle left-to-right and right-to-left scripts). The bulk of the poetry formatting is done by the bidipoem package The essential TeX instructions for typesetting consist of the first seven lines and the part between \begin{document} and \end{document}. Note how simple the TeX code is in this case; it really amounts to just putting the lines of the poem within the traditionalpoem environment.

IMPORTANT: To make sure that bidipoem justifies the lines of the poem correctly, you need to typeset the document twice (that is, press the Typeset button again after running the program successfully once).

To illustrate how TeX makes it easy to add extra flair to the output, the TeX program also puts a decorative border along the page margins. This is done by the block of code in the middle section. The work is done mainly by the fancyhdr package. You need to install on your computer the free font WebOMints GD which can be downloaded from the Internet. The symbols in this font can be used for decorating documents in various ways. Here some of its symbols are being used to assemble the border shown on the output PDF file. Note that in the new font family declaration, we give a name (\w) to the WebOMints GD font, and use the Color option so that all symbols of this font will be in the designated color. The 6 hexadecimal digits represent the code for a shade of turquoise.

If you try to typeset a poem with longer lines, then you might get each of its couplets displayed on two lines, in a different poem style. Also, you might need to play with the parameters (62,-18) in the line beginning with \begin{picture} to display the border correctly.


TeX File Edited in TeXShop source window

TeX File Edited in TeXShop

PDF File Produced by XeTeX, as displayed by TeXShop

PDF File Produced by TeXShop


(Back to CONTENTS)

More complex Urdu documents are best produced in TeX by making use of François Charette's package polyglossia. This package is intended to support texts in multiple languages, including Urdu. It runs on top of the XeLaTex derivative of TeX.

The example below shows an Urdu document typeset with the aid of polyglossia. It illustrates a number of features typically needed in an article or scholarly paper, such as: typesetting of titles and section headings; formatting of lists and tables; footnotes; and automatic numbering of sections, list items, and tables. The document also shows how to insert English text in an Urdu document.

The document style employed for the sample Urdu document is article; this can have sections and references but not such components as tables of contents. For a book length document, you should use the book or memoir document styles. These styles greatly facilitate and automate much of the work needed in the production of: title pages; table of contents; chapters with sections, subsections, subsubsections, etc.; automatic numbering of lists, figures, tables, etc.; bibliographic references; and indices.

The next two images below give the beginning and ending parts of the TeX source file to produce the sample Urdu document. The third image below shows the PDF pages of the Urdu document. (These files were updated on 2014-06-05.)

First Lines of TeX/Polyglossia Source File to Produce Urdu Document

Urdu document

Last Lines of TeX/Polyglossia Source File to Produce Urdu Document<

Urdu document

Urdu Document in PDF, Produced Using Polyglossia

Urdu document
Urdu document

Links to obtain the above Urdu document in PDF form as well as the TeX source to generate the document:

The PDF file of the Urdu document is here, and. the full TeX source code to produce the Urdu document is here. (It is best to download these than try to view them in the Web browser.)

(Back to CONTENTS)

Web Pages

Skip this section if you are not interested in creating Web pages with Urdu content.

Modern web browsers are quite good at interpreting and displaying multi-lingual texts from their Unicode character encodings. Of course, the browser needs to be told that it should expect Unicode material in the web document (usually, an html file) that it is being asked to execute. The Unicode character encoding for Urdu and Persian letters, along with the letters of many other languages, is called UTF-8. So to display Urdu text, you have to specify in your web document that its character set is given by UTF-8, as explained next.

The particular character set that a web document contains is specified by the meta statement. Near the beginning of your html file you will find some code that looks like this:

   <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">

(This is just an example. Your character set might have a name different from "ISO-8859-1".)
You have to change the character set declaration to "UTF-8", by replacing the above meta statement by:

   <meta content="text/html; charset=UTF-8" http-equiv="content-type">

Any Unicode inserted after this meta statement will be displayed as the character that the code represents. The Unicode for Urdu and Persian can be found in the Unicode Arabic page. A table which gives the standard Unicode as well as its html representation, called html numeric character reference, is given here. A very useful online tool is UTF Converter that lets you quickly convert a string of one or more characters to Unicode in various formats. UTF Converter's author, Mark Davis, has a Web site Macchiato with several other very useful Unicode-related utilities.

From the table on page 2 of Unicode Arabic page, you can check that the hexadecimal Unicode representations of the Urdu letters Alif, Re, Daal, and Vaao are, respectively, 0627, 0631, 062F, and 0648. Now the html syntax for a hexadecimal code HHHH is &#xHHHH; . So suppose in your html document you insert the following:

   <center>
   <big><big><big>
   &#x0627;&#x0631;&#x062F;&#x0648;
   </big></big></big>
   </center>

The result will be the word "Urdu" (in Urdu) displayed in 3-size larger letters and centered in a line, as follows:

اردو

Typing numerical codes in this way is clearly impractical except for displaying just a few characters. Fortunately, you don't have to enter character codes manually if you use the Urdu-QWERTY keyboard layout. The characters typed on this keyboard are automatically converted to their Unicode version and placed in the input. All you have to do is to switch to Urdu-QWERTY on the Input menu at the point in your html file where you desire to insert Urdu text.

A caveat is in order here. To prepare html files, you are likely to use some special editor different from TextEdit. We have seen that, in RichText mode, TextEdit processes Urdu letters correctly, displaying the right form of the letter and connecting the letters appropritaely. Other editors, specially the so-called programmer's editors often used to prepare html files, may not do all that. For example, your typed Urdu letters might be displayed in their isolated form from left to right in the order of their entry, without being connected together. Or worse, your typed input might appear garbled in even more annoying ways! If you are looking for an excellent, free html editor that handles Unicode and UTF-8 well, and displays Urdu text correctly, try Arachnophilia.

Of course, the readers of your Web page will be able to see the Urdu text correctly only if their system has been configured for multi-lingual processing and has the Urdu fonts installed. In addition, it might be necessary for your readers to set the viewing option of their web browser for "Unicode (UTF-8)" character encoding.

(Back to CONTENTS)

Mathematical and Technical Typing

Some mathematical symbols are so frequently needed in technical typing that they have become standard in Mac's English keyboards. So the Urdu-QWERTY keyboard also provides several of these symbols, via option and option-shift keys as usual. Note that the symbols for summation, integration, root, etc., change their orientation to match the right-to-left text direction.

In Urdu mathematical notation, the dots of the dotted letters are sometimes omitted. So the Urdu-QWERTY keyboard provides the dotless forms
ٮ for ب
ڡ for ف
ٯ for ق
Another practice in Urdu mathematical writings is to sometimes use just the stems of letters, not their full form. Such symbols can be easily generated by adding the "kasheeda" character (ـ) to a letter. For example, the symbol خـ , generated by the key sequence shift-K and shift-" , represents the "imaginary" (in Urdu, خیالی) number i.

NOTE: The keyboard suffices only for the casual typing of a few mathematical symbols in a general document. To prepare documents with elaborate mathematical content, the ideal approach is to use TeX/LaTeX/XeTeX. See the section Typesetting Using TeX, LaTeX, XeTeX.

Sample of Technical Text

Sample of Technical Text

(Back to CONTENTS)


Keyboard Shortcuts for Changing Keyboards

NOTE: For "Changing Keyboards", the standard MacOS terminology is "Changing Input Source" or "Changing Input Method".

While editing certain documents, you need to change Keyboards quite often. For example, you may be working on a dictionary. Or, you might be preparing TeX files for Urdu documents, and need to continually switch keyboards between Urdu (for text) and English (for special characters and TeX commands).

The standard way for changing Keyboards is to select the desired keyboard in the Input menu at the right end of the Apple menu bar at the top. This is cumbersome and annoying when Keyboard changes are very frequent. So you might like to set up a Keyboard Shortcut for it.

MacOS has shortcuts programmed for Keyboard change already. These are: Command-space for toggling between keyboards and Command-shift-space for cycling through the active keyboards. But in MacOS versions 10.5 and above these are disabled by default, because exactly the same shortcuts are enabled for Spotlight searches.

If you prefer, you can disable the Spotlight shortcuts and enable the Keyboard shortcuts. To do this in MacOS 10.5 and above, go into the Finder, and select Apple > System Preferences > Keyboard. In the window that opens, click on the Keyboard Shortcuts tab. Select Spotlight in the left column, and disable the shortcut items that show up in the right column. Then select Keyboard & Text Input in the left column, and enable the input source-related items that appear in the right column.

Of the two Keyboard shortcuts Command-space and Command-shift-space, the former is certainly easier to type. If you have only two keyboards activated (say, English and Urdu), then the two shortcuts are equivalent. (You can quickly see which keyboards are active by clicking on the Input menu. An icon appears underneath it for each active keyboard.) However, if there are more than two active keyboards, then you might like to interchange the shortcuts. You can change a shortcut by double-clicking on it, and typing over it any desired combination of modifier keys (Command, Control, Shift, etc.) and a regular key (space, letter, number, etc.)

(Back to CONTENTS)


Installation Problems

Most of the reported installation difficulties turned out to have a simple reason: during download or extraction, the file extensions got changed. Often a .txt extension was appended to one or more file names.

So first please make sure that your Mac shows extensions in file names. For this, move into Finder (for example, by clicking in a Finder window, or on the Finder icon in the Dock, or at a point on the screen which is not occupied by an application window). Then on the Menu bar (the one with the Apple icon at the left), click on Finder, then on Preferences, then on the Advanced tab. Now look at the Show all file extensions item. If the check box on its left does not have a check mark, then click on it so that a check mark appears there. Finally, close the Advanced window.

Now you can check whether the extensions of the Urdu-QWERTY files are correct. The downloaded file (UrduQWERTY-v6.zip) and the files that your unzipper extracts (UrduQWERTY.keylayout and UrduQWERTY.icns) should have exactly those names. Change their extensions if necessary, ignoring the Finder's complaint that this could render your files dysfunctional.

Another problem some people have encountered is that during editing Urdu letters show up isolated rather than connected together in the normal way. This can happen when the editor being used is different from TextEdit or Bean. For example, at present Microsoft Word does not handle the Naskh and Nastaleeq scripts correctly on the Mac. Even in TextEdit, sometimes Urdu letters appear isolated rather than correctly connected. This is usually due to TextEdit being run in the plain text mode rather than the rich text mode which Urdu editing requires.

To fix this problem, start TextEdit, and on the Menu bar click on TextEdit, then on Preferences, and then on the New Document tab. If the Rich Text radio button is not active, click on it. Now close Preferences, and quit TextEdit. When you restart TextEdit, it will use Rich Text as the default for new documents.

A related problem that has troubled some people is that in their Urdu files some letters don't seem to have correct shapes. For example, the letters "Goal Hay" or "yay" don't connect to the preceding or following letters properly. The culprit in such cases is nearly always the font used. At present only the X Series 2, Scheherazade, Lateef, and Geeza Pro among Naskh fonts, and Nafees and Jameel Noori among Nastaleeq fonts are known to work correctly with the whole Urdu alphabet. Please let me know if you discover (or design) other well-behaved fonts for Urdu.

(Back to CONTENTS)


Orthographic Hints

Diacritical Marks

In Urdu, short vowels (e`raab) are denoted by diacritical marks that are placed above, below, or to the left of the letter involved. Although usually omitted, they are occasionally needed to remove ambiguity or to show the correct pronunciation of a word. In particular, the tashdeed and madd signs and the zer of izaafat combinations are always helpful to the reader of the text.

While composing text, you should type such a mark after typing the letter to which it belongs. The most frequently used marks are: zabar (shift->), zer (shift-<), pesh (shift-P), tashdeed (shift-_), and madd (shift-M). Alif with madd can be typed directly as shift-A. Alif with dozabar can be typed directly as equal sign (=). The "jazm" mark (slash key /), which should print like a tiny "daal", does not have that shape in many fonts. In Naskh fonts it shows up as a little circle, being the "sukun" of Arabic orthography.

A complete list of diacriticl marks is given earlier with the keyboard images.

(Back to CONTENTS)

Yay

The I and Y keys correspond, respectively, to the maaroof and majhool forms of "yay", popularly referred to as "ChoTi yay" ی and "baRi yay" ے , respectively. (See the note below about maaroof and majhool sounds.) Thus, "galee" گلی (meaning lane) is to be typed as G, L, I, and "taaray" تارے (meaning stars) is to be typed as T, A, R, Y.

The form entered by Y does not connect to the next letter. So even a majhool "yay" letter that occurs in the middle of a word should be typed as I. For example, "bayTay" بیٹے (meaning sons) has to be typed B, I, shift-T, Y. Even though both "yay" letters occuring in this word are pronounced with the majhool sound, the first one has to be entered as I.

In Arabic, the letter "yay" has two dots underneath. In Urdu, the two dots are shown only if "yay" appears at the beginning or in the middle of a word, but not when it is the final letter of a word or when it stands alone (e.g., in an alphabet table). If needed, the "yay" with two dots ي can be typed as option-i.

(Back to CONTENTS)

Noon Ghunna

Noon Ghunna, which appears as the letter Noon but without a dot, is entered as shift-N. Thus "maaN" ماں (mother) is typed as M, A, shift-N. Noon Ghunna adds a nasal quality to the sound of the vowel preceding it.

In the freewheeling, inconsistent way of Urdu orthography, Noon Gunna is used only at the end of a word. In the middle of a word, even where Noon Ghunna would be appropriate, Urdu just uses the ordinary Noon. Examples: "saaNp" سانپ (snake) has to be entered as S, A, N, P; or "pataNg" پتنگ (kite) has to be entered as P, T, N, G. This inconsistency is forced by the circumstance that in the middle of a word, Noon is written as a shosha with a dot above. Without a dot, such a shosha would be visually quite confusing.

NOTE: In some old books, specially Urdu instructional primers, Noon Ghunna was indicated by a tiny crescent-like mark placed above the Noon. This indicated nasalization both in the middle and at the end of a word. An equivalent sign, ٘ , is still available as shift-7 although its use in Urdu went out of style decades ago. Note that, by contrast, Hindi takes the rational approach of signifying the nasal modification by always placing a special mark above the affected letter.

(Back to CONTENTS)

Hamza

The main forms of this letter are
    1) independent ء , entered as shift-4,
    2) hamza above "alif" أ , entered as the plus key (shift +),
    3) hamza in the middle of a word ئ , entered as U,
    4) hamza above "vaao" ؤ , entered as shift-W, and
    5) hamza above "Goal Hay" ۂ , entered as the hyphen key (-).

The rules relevant to these forms are the following:

1) If hamza is the last letter of a word, use the independent hamza form (shift-4). Examples: "ziaa" ضیاء (meaning light) is entered by typing shift-J, I, A, shift-4; "zakaa" ذكاء (intelligence) is entered by typing shift-Z, K, A, shift-4.

NOTE: The terminal hamza is usually omitted in modern Urdu publications. For example, the above two words are frequently spelled as "ziaa" ضیا and "zakaa" ذكا .

NOTE: Only the words of Arabic origin can have a terminal hamza. It is incorrect to append a hamza to the words derived from other languages. For example, the words "Asia" ایشیا , "Australia" آسٹریلیا , "Angela" اینجلا , or "boo" بو (smell) should not be spelled with a terminal hamza. An exception is made in Urdu when a hamza is needed for the "izafat" combination; e.g., "Asia-e Kuchak" ایشیائے کوچک (Little Asia) or "boo-e gul" بوئے گل (the flower's fragrance). But the hamza used in such combinations is specific to Urdu spelling (However, see the NOTE just below); in Persian, the same combinations contain only the yay, not the hamza. So the above Urdu phrases are spelled in Persian as "Asia-e Kuchak" ایشیای کوچک or "boo-e gul" بوی گل .

NOTE: The traditional spelling of the "izafat" combination involving a "yay" is without a "hamza" even in Urdu. Thus the "correct" spellings of the above Urdu phrases are "Asia-e Kuchak" ایشیاے کوچک and "boo-e gul" بوے گل . The new trend is, however, to add a "hamza" before the "yay".

2) If the letter hamza occurs in the middle of a word, use the key U for it. When typed, it is displayed as a hamza over the letter "yay" ئ. But as soon as the next letter is typed, the yay disappears, and the correct combination of hamza and the next letter is displayed. Examples: "ghaael" گھائل (wounded) entered by typing G, H, A, U, L; "chaae" چائے (tea) entered by typing C, A, U, Y; "na-i" نئی (new) entered by typing N, U, I.

3) However, even in the middle of a word if a hamza precedes a "vaao", and this pair starts an isolated subword, then the two should be typed together as the single "hamza above vaao" key (shift-W). (To start an isolated subword, this pair should come after an alif, vaao, daal, ray, etc.) Example: "gaaoN" گاؤں (village) should be entered by typing G, A, shift-W, shift-N, and not G, A, U, W, shift-N which would result in the wrong shape گائوں !

The isolated subword condition is important. Otherwise just a medial hamza form (key U) is to be used. Example: "ga-u maataa" گئو ماتا (Mother Cow) should be entered by typing G, U, W, space, M, A, T, A; Typing G, shift-W, space, M, A, T, A would result in the wrong shape گؤ ماتا !

4) The "hamza above Goal Hay" ۂ occurs in "izaafat" combinations derived from Persian, and it is helpful to add a "zer" sign below it. Examples: "sitaara-e shaam" ستارۂِ شام (evening star) should be entered by typing S, T, A, R, -, shift-<, space, X, A, M. Or, "naala-e dil" نالۂِ دِل (heart's cry) should be entered as N, A, L, -, shift-<, space, D, (optionally, shift-<), L.

The form of the letter "Goal Hay" with a hamza above can occur only in the terminal and isolated positions of a word, while the form without a hamza can occur in all positions---initial, medial, terminal or isolated. One should be careful in choosing the correct form of "Goal Hay" in "izaafat" combinations. The form without hamza should be used when the "Goal Hay" ending a word is pronounced as H, as in "tah" تہ (layer or bottom). The form with a hamza above should be used when the Goal Hay ending a word is pronounced as A or E, as in "gila" گلہ (complaint). This point is taken up again in the next subsection.

(Back to CONTENTS)

Hay

"BaRi Hay" ح (humorously called "Halvay Vaali Hay") is entered by typing shift-H. Thus "muhabbat" محبّت (love) is entered by typing M, shift-H, B, (optionally shift-_ for tashdeed), T.

"Dochashmi Hay" ھ is entered by typing unshifted H. In modern Urdu orthography, this letter is used only in combination with some consonant (which precedes it), and its purpose is to modify that consonant's sound to make it an "aspirated letter".

"Goal Hay" ہ , entered by typing the letter O, is pronounced separately by itself rather than being just used to "aspirate" another consonant. For example, the "Hay" sound is pronounced independently in the word "kahaa" كہا (said); so this word is typed with a "Goal Hay", as K, O, A. This is in contrast to the word "khaa" كھا (Eat!) where the "Hay" is used to aspirate the "k" sound; so this word is spelled with a "Dochashmi Hay", as K, H, A.

NOTE: Some recent spelling practices violate the rule of using "Dochashmi Hay" ھ exclusively for aspirated letters. For example, a bizarre new trend is to switch from the rational spelling "muNh" منہ (mouth) to the irrational spelling منھ . The latter spelling makes absolutely no sense. The letter "Noon" ن in this word doesn't have an aspirated "nh" sound — as has, e.g., the second "Noon" in "nan-nhaa" ننھا (tiny) — but it is a "Noon Ghunna" meant to nasalize the sound of the short "pesh" vowel operating on the letter "meem" م. In this word, the letter "Goal Hay" ہ is not being affected at all, so there is no justification for modifying it. The urge to reform seems to have been misdirected here to fix what ain't broke. Instead, that urge should have been channeled to popularize the placing of some mark on "Noon" here (and in similar cases) to indicate that it is the "Noon Ghunna" vowel modifer and not the true consonant "Noon". (See the NOTE in the section Noon Ghunna above.)

In the word "majhool" مجہول even though "h" follows "j", no aspiration takes place since the two letters belong to different syllables ("maj-hool") and are pronounced independently. This word should therefore be typed as M, J, O, W, L, and not M, J, H, W, L which would appear incorrectly as مجھول ! In general, "Dochashmi Hay" should not be used in any Urdu word that is derived from Arabic or Persian, since these languages do not have aspirated letters. Aspirated letters can occur only in the words of Indic origin.

There is an exception to the rule that "Goal Hay" must be pronounced with an "h" sound. At the end of a word, "Goal Hay" is pronounced as an A or E, not as H; for example, the word تكیہ typed as T, K, I, O is pronounced as "takya" (pillow).

An exception to that exception occurs sometimes, and the terminal "Hay" is actually pronounced as H, not A or E. For example, the word "shah" شہ (meaning check [of chess]) is typed X, O. The word "shaah" شاہ (meaning king), typed as X, A, O, is another example where a terminal "Goal Hay" is pronounced with an "h" sound.

However, the oddities of Urdu orthography do not end here. In the words ending in a pronounced Goal Hay which is not isolated but connected to the previous letter, the Hay is often written twice! For example, the word "kah" (meaning say!) is often written as كہہ , entered by K, O, O; or "sah" (meaning bear!, from the verb "sahna") as سہہ , entered by S, O, O; or "faqeeh" (expert of fiq-h [jurisprudence] ) as فقیہہ , entered by F, Q, I, O, O.

The purpose of doubling the Goal Hay is ostensibly to avoid its being wrongly pronounced as A or E. For example, without the extra Goal Hay the above words "kah" كہہ and "sah" سہہ could be easily confused with the words "ke" كہ (that) and "se" سہ (Persian three), respectively, in which the terminal Goal Hay is indeed pronounced as E. But such is clearly not the case with "faqeeh" فقیہہ , where the extra Goal Hay actually introduces the hazard of this word being confused with "faqeeha" (female expert of fiq-h). The reason for writing the Goal Hay twice in this word seems to be just the whim of the scribe rather than any logical need. In general, you will find that the spelling variation of doubling the Goal Hay is practiced unpredictably and rather inconsistently!

NOTE: When using a Persian or Arabic font, use the Option-o key rather than the plain o key for the letter Goal Hay (ہ); otherwise the letter might not be rendered properly.

(Back to CONTENTS)

Punctuation

The end of an Urdu declarative sentence is marked with a small dash rather than a period. But the period key itself generates the dash in the Urdu-QWERTY keyboard. Other punctuation symbols such as question mark, exclamation, comma, semicolon, parentheses, brackets, braces, double and single quotation marks, etc., are entered with the usual keys. Punctuation symbols are appropriately reversed or inverted to match the right-to-left flow of text.

(Back to CONTENTS)

Numbers and Dates

The same digit keys of the Urdu-QWERTY keyboard can be used to type digits in three different shapes: (1) Western digit characters when no modifier key (SHIFT, OPTION, or CAPS LOCK) is pressed; (2) "Eastern Arabic" digit characters when CAPS LOCK is pressed and SHIFT or OPTION are not pressed; and (3) "Traditional Arabic" digit characters when OPTION is pressed.

The Traditional Arabic digit forms are used in Arabic documents. The Eastern Arabic forms are commonly used in Urdu documents with Nastaleeq fonts and in Persian documents with both Naskh and Nastaleeq fonts. An Urdu document produced in a Naskh font looks better when the Traditional rather than Eastern Arabic digits are used.

The CAPS LOCK key has no effect on other keys. So if you like your digits to be displayed in their Urdu, and not the Western, shapes, then you can just leave CAPS LOCK depressed, and release it only to type a symbol which requires both SHIFT and OPTION keys to be pressed.

With Arabic digits, the decimal point sign "٫" and the thousands separator "٬" are, respectively, typed as option-period and option-comma.

Examples: The number 3.1416 is typed as option-3, option-period, option-1, option-4, option-1, option-6, and is displayed as ۳٫۱٤۱٦ . One million is typed as option-1, option-comma, option-0, option-0, option-0, option-comma, option-0, option-0, option-0, and is displayed as ۱٬۰۰۰٬۰۰۰ .

NOTE: Separating groups of digits by thousand, million, billion. etc., is a relatively recent practice. The more traditional separation is by hazaar, laakh (lac), karoR (crore), arab, kharab, etc., for which the separating commas are placed after 3, 5, 7, 9, 11, ... digits from the right.

It is traditional in writing dates to insert a "date separator" or "small slash" symbol ؍ (shift-6) between the day number and month word, or between the numbers designating day, month, and year. The abbreviation equivalent to "A.D." is a Hamza (which looks as the stem of the letter Ain) entered by typing shift-4.

Example: August 14, 1947 is typed as option-1, option-4, shift-6, A, G, S, T, space, option-1, option-9, option-4, option-7, shift-4, and is displayed as ١٤؍اگست ۱۹٤۷ء . Alternatively, this date can be typed as option-1, option-4, shift-6, option-8, shift-6, option-1, option-9, option-4, 7, shift-4, and is displayed as ١٤؍۸؍۱۹٤۷ء .

NOTE: The Urdu-QWERTY keyboard has made Western digit characters the default action of digit keys because Urdu publications are switching more and more to Western numerals. This move is accompanied by the adoption of British-American style of formatting numbers, that is, using a period for the decimal point and a comma for the thousands separator.

The apostrophe (') key on the Urdu-QWERTY keyboard generates the period symbol, and can be typed as the decimal point to go with Western digit characters. For example, to produce 3.1416, just type the key sequence 3,',1,4,1,6. Western numerals with decimals can thus be typed without needing to switch to the English keyboard or to use any modifier keys.

(Back to CONTENTS)

Inter-word Spaces. (And A Lament!)

People accustomed to Nastaleeq publications will discover that the documents composed in Naskh have spaces and other punctuation separating each pair of adjacent words. This is the correct and rational approach to word processing, shared by every non-Nastaleeq word processor in the world. Nastaleeq word processors stand alone in suppressing inter-word spaces. The user, of course, still has to type spaces to signify ends of words, but those spaces are removed and the words follow each other in a continuous stream.

Just imagine reading this English page if it did not include any spaces between words. Deciphering such a character stream requires, in essence, that you already know what you are trying to learn! But that's exactly what is expected of you when you are reading a text composed in Nastaleeq. Because some Urdu letters (e.g., alif, daal, re, vao) do not connect to the next letter in a word, Urdu words consist of isolated parts that could themselves be thought of as words. For example, the word درخواست (meaning request or appeal) contains as potential words خواست , است , رخوا , خوا , رخو , درخوا , درخو , در , and many more. When inter-word spaces are used, there is no confusion between any unintended "words" and the intended words because the beginning and end of each intended word is clearly delimited. But in the text edited with current Nastaleeq word processors, the only reason you are able to skip over the unintended "words" is that you already know the intended words, not because the text display is of any help!

When computer typesetting of Nastaleeq was first introduced for Urdu in the 1980s, inter-word spaces were actually employed. The practice of suppressing them is more recent. This unwise retrogression, justified in the name of "tradition and esthetics", is an unnecessary obstacle to anyone trying to learn Urdu. The Nastaleeq script already suffers from too many complexities, obscurities, irregularities, and inconsistencies. It makes no sense to invent more barriers to the accessibility of Urdu. The practice simply prolongs the time it takes students to master the language. It is also hindering the development of optical character recognition and other important electronic processing technologies for Urdu.

Exercise for the reader: Find out what ghatrabood is, and enjoy the story.

(Back to CONTENTS)


Urdu Transliteration of English Words

As more and more words are being imported in Urdu from English, and more and more foreign geographical and personal names are being mentioned in Urdu writings, there is increasing need to systematize the transliteration of English words into Urdu. Not everyone spells English words in Urdu in the same way, but some common conventions are the following:

All English consonants have reasonable equivalents in the Urdu alphabet. Conventionally, the letters "D" and "T" are rendered as the Urdu letters "ڈ " and "ٹ " , respectively. Both "V" and "W" are rendered as "و " . The digraph "th" is rendered as either "د " or "تھ ". Thus, "the" is transliterated as "دی " and "three" as "تھری ".

If an English word begins with the letter "S" that is immmediately followed by one of "c", k", "m, "n", "p", "q", and "t", then the Urdu transliteration often adds an inititial "alif" to the normally expected "seen". Examaples of this rule are the transliteration of "school" as "اِسکول ","sketch" as "اِسکیچ ", "smuggler" as "اِسمگلر ","Spain" as "اِسپین ", and "Steven" as "اِسٹیوین ". However, this rule (addition of "alif") is not followed uniformly. Also the "alif" is never added when the letter following "S" is "l". Thus ""slate" is transliterated as "سلیٹ " and "slice" as "سلائِس ".

The transliteration of some English vowels is not phonetically correct, but the practices are too firmly entrenched to do anything about them. Two widely used conventions are the following:

1. The long "a", the open "o", and the diphthongs like "au" and "aw" are generally rendered as "alif". For example, "Paul" is transliterated as "پال ", even though "پَول " would be more correct phonetically. Other examples are: "Dawn" as "ڈان ", "Law" as "لا ", "lot" as "لاٹ ", "collar" as "کالر ", "hot dog" as "ہاٹ ڈاگ ", and "New York" as "نیو یارک ". The diphthong "oi" sometimes gets the same treatment. Thus, "vice" and "voice" (as in "Voice of America"!) are both transliterated as "وائس ".

2. The vowel "e" is usually rendered as "Yeh" (with a majhool sound). Thus, "Web" is transliterated as "ویب " even though "وِب " would be more correct phonetically. The rationale for using "Yeh" (with a majhool sound) for "e" is, presumably, to reserve the "zer" ( –ِ ) diacritic for rendering "i". For example, "bell" is transliterated as "بیل " rather than as "بـِل " so that the latter can be the transliteration of "bill". Other examples are: "set" as "سیـٹ ", "cassette" as "کیسیـٹ ", "pen" as "پین ", and "vowel" as "واویل ".

(Back to CONTENTS)


Urdu Transliteration of Hindi Words

Devanagari, the script of Hindi, is perfectly phonetic, and nearly all of its sounds are representable by Urdu letters. So if you can read a word written in Hindi, you can spell it in Urdu very easily. But many Urdu writers do not read Hindi and rely on English transliteration of Hindi words to render them into Urdu. This is complicated by some differences in Hindi, English, and Urdu spelling conventions. As quite a few people are unaware of these conventions, some very poor transliterations of Hindi words are showing up in Urdu publications. It is a pity that many Hindi words that were once part of Urdu's own rich vocabulary are now being misspelled and mispronounced because Urdu has started to isolate itself from its Indic source and heritage.

When transliterating Hindi words to Urdu, you need to pay special attention to these cases:

  1. Unless modified in some special way, every consonant letter in a Hindi word is supposed to be pronounced as if it is followed by the vowel "a". In actual practice, this implicit "a" is not pronounced, or is articulated too speedily to be perceptible. But the convention in English transliteration of Hindi words has been to record this sound with an "a" anyway. That is why you see so many occurrences of "a" in, and especially at the end of, English spellings of so many Hindi words; for example, "गौतम बुद्ध" (Gautama Buddha), "कर्म" (karma), "धर्म" (dharma), and "योग" (yoga). In Urdu, the standard transliteration practice has been to only represent the actual Hindi letters, and not add this artificial terminal "a". So in Urdu the above words are spelled and pronounced as:
    "گوتم بدھ" (NOT "گوتما بدھا" !),
    "کـَـرْمْ" (NOT "کارما" !),
    "دھرم" (NOT "دھارما" or "دھرما" !),
    and "یوگ" (NOT "یوگا" !).
  2. In Hindi sometimes two words are joined together without any separator between them. (This often happens when the two words are parts of a name.) It is customary in Urdu transliteration to write the two words separately; moreover, if the second word happens to begin with an "alif" ("ا") sound, then an "alif-madd" ("آ") is used to represent that sound. Example: "सहस्राक्ष" (Sahasraksha --- a name of Indra) is transliterated as "سَہَسْرْ آکْش" .
  3. A special Urdu transliteration convention has been made for the consonant "य" (y) standing alone (that is, not followed by a vowel) in Hindi words. This "y" is transliterated as "یه", not as "ی" since in Urdu the letter "ی" at the end of a syllable could be mistaken to be a vowel acting on the letter preceding it. For example, the word "आर्य" (Arya) is transliterated as "آریه" (rather than "آری" in which the letter "ی" would be incorrectly thought of as a vowel).
WARNING: No Urdu transliteartion convention seems to have evolved for the consonant "व" (v) standing alone (that is, not followed by a vowel) in Hindi words. Representing the consonant "व" (v) by the Urdu letter "و" creates exactly the same ambiguity that would have been caused by representing the consonant "य" (y) by the letter "ی" alone. But while the latter ambiguity was resolved by representing this "y" as "یه" (as mentioned just above), the former has been left unresolved, and results in wrong pronunciations. For example, consider the historical names "कौरव" (Kaurava) and "पाण्डव" (Pandava). Spelled in Urdu as "کـَوْرَوْ" and "پانڈَوْ", they are liable to be (and are) pronounced as "kaurau" and "pandau" inspite of all the diacritical marks added. Without vocalization, they are often pronounced even more awfully as "koru" and "pandu"! The problem is that in general the Urdu letter "و" is pronounced as the consonant "v" only at the beginning of a syllable. There are, of course, exceptions, such as in the Urdu words like "نشو" (development), "عضو" (organ), and "سرو" (the cypress tree). But in most cases when "v" occurs at the end of a syllable, it is pronounced as the vowel "o" or "u", and rarely as the consonant "v". So, while most Urdu speakers routinely pronounce "देवी" (دیوی) as "day-vee", they are likely to pronounce "देव" (دیو) as "day-o", and would have to make a special effort to pronounce it as "day-v".

Some languages that use the Arabic script have a special letter "ۋ" (i.e., the letter "و" with three dots added above) to designate the "v" sound. In fact, even in several Arab countries one often sees signboards and printed advertisements with another, again non-Arabic, alternative "ڤ" substituted for the letter "v" in the transliteration of English words such as "television". It would make a lot of sense in Urdu too to use "ۋ" for the consonant "v" sound, leaving the letter "و" for its traditional use as a vowel. Thus, some of the words mentioned in the above paragraph could be written as "کـَورَۋ" (Kaura-v), "پانڈَۋ" (Panda-v), "نشۋ" (nash-v), "عضۋ" (az-v), "سرۋ" (sar-v), and "دیۋ" (de-v).

NOTE: A terminal "a" in an English transliteration can sometimes represent the true "ा" (a) vowel of a Hindi word. For example, in Hindi each of the words "आत्मा" (atma), "बिरहा" (birha), "विधवा" (vidhava), and "यात्रा" (yatra) does have the vowel "a" at its end which needs to be represented in Urdu by an "alif" ("). So the Urdu transliteration of the above words are
"آتما",
"بِرہا",
"ودھوا",
and "یاترا" .
Can you tell whether or not the letter "a" in an English transliteration corresponds to a true Hindi vowel "a"? No, because, unfortunately, the English transliteration is too ambiguous to settle that question. So just look up the word in a Hindi dictionary!

The above conventions are illustrated in the following list of Hindi words and their Urdu transliterations:

"आर्यभट" (Aryabhata) "آریه بھٹ",
"भट्टाचार्य" (Bhattacharya) "بھٹ آچاریه",
"ब्रह्मगुप्त" (Brahmagupta) "برَہـم گُپت",
"भास्कर" (Bhaskara) "بھاسکر",
"कालिदास" (Kalidasa) "کالی داس",
"राम" (Rama) "رام",
"अर्जुन" (Arjuna) "اَرجُن",
"कृष्ण चन्द्र" (Krishna Chandra) "کِرِشن چندر",
"अशोक मौर्य" (Ashoka Maurya) "اشوک مَوریه",
"वेद" (Veda) "وید",
"शास्त्र" (Shastra) "شاستر",
"महाभारत" (Mahabharata) "مہا بھارت",
"गीतगोविन्द" (Gita Govinda) "گیت گووِند",
"महाकाव्य" (Mahakavya) "مہا کاویہ",
"सूर्य सिद्धांत" (Surya Siddhanta) "سوریہ سِدّھانت",
"शकुन्तला" (Shakuntala) "شَکُنتَلا",
"सत्यपाल" (Satyapal) "ستیہ پال",
"सत्याग्रह" (Satyagraha) "ستیاگرہ",
"अहिंसा" (Ahinsa) "اہـِنسا",
"मंडल" (Mandala) "مَنڈَل".

(Back to CONTENTS)


Ottoman Turkish

The term Ottoman Turkish refers to both the language and the Arabic-based script in common use in Turkey during the Ottoman period. The Ottoman and Modern Turkish languages do not differ much, but their scripts are totally different because Modern Turkish uses a script based on Latin characters. However, since there exists a huge volume of older Turkish publications and manuscripts written in the Ottoman script, this script still remains of interest as an essential scholarly tool and is taught in most departments of Turkish Studies. It is suitable, for example, for preparing the contents of older texts for linguistic analyses.

There is exactly one character, "Saghir Nun" ڭ (key Option-g) which is unique to Ottoman Turkish and is not used in Urdu or Persian. This character has been added to the Urdu-QWERTY keyboard layout. So with this keyboard layout installed, your computer will be ready for Ottoman Turkish texts.

To download and install this keyboard, see the earlier section Installing the Urdu keyboard layout. To activate the Ottoman Turkish input, follow the steps for Activating Urdu Input. Also, you will need to install some suitable fonts in order to be able to read, edit, and produce documents in Ottoman Turkish. Some beautiful, freely available fonts are recommended in Installing Urdu Fonts. Finally, if you prefer to do the Ottoman Turkish work on a Windows or Linux computer, there is an Urdu-QWERTY keyboard layout for Windows and Linux also, with key settings identical to the Mac one. The fonts and general instructions given here will work on the Windows and Linux machines also. See Urdu QWERTY for Windows and Linux.

The Urdu-QWERTY keyboard provides the various diacritics employed in older Ottoman texts (as well as in Urdu and Persian). The following diacritics are used in a standard way:

  1. Fathah –َ– (key Shift->), Kesrah –ِ– (key Shift-<), Dammah –ُ– (key Shift-P),
  2. Jazm –ْ– (key /),
  3. Fathatain –ً– (key Shift-~), Kesretain –ٍ– (key `), Dammatain –ٌ– (key Shift-8),
  4. Madd –ٓ– (key Shift-M),
  5. Teshdid –ّ– (key Shift-_),
  6. Hamza above –ٔ– (key Shift-3).
Some diacritics that are used in old Ottoman Turkish texts in non-standard ways are the following:
  1. "Wasla" is used over "alef" to form "alef-wasla" ٱ (key Option-w). Alef-Wasla indicates a "silent" alef at the beginning of the second word of a two-word Arabic combination, such as "Daar-a (a)l-ilm" دارٱلعلم (House of Knowledge).
  2. The "little v" –ٚ– (key Option-j) and "little inverted v" –ٛ– (key Option-h) are superscripts that are used for various purposes, e.g., for marking the letter "Vao" و (key w) as the v-sounding consonant rather than a vowel; distinguishing "rounded" vowel sounds "ö" and "ü" from "unrounded" vowel sounds "o" and "u" denoted by the same letter "Vao" و (key w); distinguishing vowel sounds "o" from "u" and "i" from "e", denoted, respectively, by the same letters "Vao" و (key w) and "Yeh" ی (key i); etc.

The example below shows a sample text typed in Ottoman Turkish (right), together with its equivalent in Modern Turkish (left). Incidentally, this entire two-column table has been prepared by using nothing else but the Mac OS's built-in TextEdit utility. The Ottoman text part that you see on the right in the table has been typed using the exquisite Lateef font, which can be downloaded and installed as described in the section Installing Urdu Fonts. The Turkish text is supposedly the translation of a French text, Jean de La Fontaine's fable La Cigale et la Fourmi (The Cicada And The Ant). Both the typed (right) and the transcribed (left) versions shown below have been taken from the Web site of the Department of Turkish Studies at the University of Michigan. There you can also see the image of the original manuscript in the Osmani calligraphic style (referred to as Nastaleeq in Urdu and Persian calligraphy).

Sample of Ottoman Turkish Text

Sample of Ottoman Turkish Text

(Back to CONTENTS)


Additional Keyboard Layouts

(This section was added on 2017-04-20.)

Arab QWERTY Keyboard Layout for Windows

This keyboard layout has been designed specifically for those users of our Urdu QWERTY Keyboard Layout who also need to type large amounts of Arabic text. Four letters (namely, "Kaf" ك , "Heh" ه , "Teh marbuta" ة , and "Yeh" ي ) have different Unicodes in Urdu and Arabic because their shapes are slightly different in these two languages. (See the section Similar Letters in Urdu, Persian, and Arabic.) These letters have therefore been assigned different keys for Urdu and Arabic in Urdu QWERTY and Arab QWERTY keyboard layouts. On the Urdu QWERTY Keyboard Layout, those four letters can be typed for Urdu text without pressing the Option key (on the Mac) or the Right-ALT key (on Windows), but for Arabic text, the Option or Right-ALT key has to be held down also. For large amount of typing in Arabic, this is inconvenient and error prone. In the Arab QWERTY Keyboard Layout, these letters are typed without pressing the Option or Right-ALT key for Arabic, and with that modifier key held down for Urdu.

Except for the above mentioned difference, the Arab QWERTY keyboard layout is identical to the Urdu QWERTY one. It also includes support for Urdu, Persian, Punjabi (Shahmukhi or Pakistani style), and Ottoman Turkish. With this layout, you can type all the Arabic script characters that these languages use. In addition, it also has keys for several mathematical and technical symbols.

To install this keyboard, download Arab QWERTY Keyboard Layout for Windows. Unzipping the downloaded file will produce a folder. Open this folder and follow the instructions in the file readme-win.txt.

The file ArabQWERTYkeyboardWinLnx.pdf is a printable one-page map showing the Arab QWERTY Layout key assignments for Windows and Linux keyboards. It can be downloadable from here.

(Back to CONTENTS)


US-Polymath Keyboard Layout for MacOS, Windows, and Linux

This is a fairly general-purpose keyboard meant to facilitate typing literary as well as scientific texts. It provides a variety of "combining diacritical marks" to type "extended characters" of Latin script. ("Extended characters" are composed by decorating ordinary letters of the alphabet with various accents and diacritical marks.) The keyboard also includes several math symbols, especially those commonly used in logic, set theory, and number theory, and the most frequently used Greek letters.

If you use the TeX/LaTeX family of programs for typesetting documents containing math and Greek symbols, then with the aid of packages like unicode-math you can save time by typing some symbols directly rather than typing their TeX code; e.g., rather than typing "\varepsilon", you can just type "ε" (Polymath Keyboard Key: Shift-0 with CAPS on). The TeX package unicode-math is described here, and an example of its use is here.

ALA-LC (American Library Association - Library of Congress) Romanization is a set of standards for the representation of text in other writing systems using the Latin script. In particular, this table specifies the transliteration scheme for Urdu. According to this scheme the Urdu words عجائب , مضبوط , باعِث , and ٹؚـڈّی دؘل , for example, are represented by the Latin script strings "ʿajāʾib", "maẓbūt̤", "bāʿis̱", and "ṭiḍḍī dal". Such transliterations are often requird for quoting Urdu text in scholarly articles in foreign languages. The Polymath Keyboard includes the necessary diacritical marks to type the extended Latin script characters needed in the transliteration.

To type an extended character, you first press the letter to be modifed, and then press the special key assigned to the combining diacritical mark. How the extended characters are rendered and processed depends on the application receiving the text and the font used. In my own experimentation, the combination LibreOffice + STIX fonts has worked out the best.

To install this keyboard for MacOS, download US-Polymath Keyboard Layout for MacOS. Unzipping the downloaded file will produce a folder. Open this folder and then follow the instructions in the file readme-mac.txt.

The file US-PolymathKeyboardMac.pdf is a printable one-page map showing the US-Polymath Layout key assignments for the MacOS keyboard. It is downloadable from here.

To install this keyboard for Windows, download US-Polymath Keyboard Layout for Windows. Unzipping the downloaded file will produce a folder. Open this folder and then follow the instructions in the file readme-win.txt.

To install this keyboard for Linux, download US-Polymath Keyboard Layout for Linux. Unzipping the downloaded file will produce a folder. Open this folder and then follow the instructions in the file readme-linux.txt.

The file US-PolymathKeyboardWinLnx.pdf is a printable one-page map showing the US-Polymath Layout key assignments for Windows and Linux keyboards. It is downloadable from here.

(Back to CONTENTS)


Comparing Persian and Urdu Alphabets

The Urdu alphabet contains the following additional symbols that do not exist in Persian:

  1. The "retroflex" letters "Tay" ٹ , "Daal" ڈ , and "Ray" ڑ , each of which has a tiny "toey" mark above. These letters are typed with the keys shift-T, shift-D, and shift-R, respectively.
  2. "Aspirated letters", formed by combining certain consonants with "Dochashmi Hay", e.g., "bh" بھ , "ph" پھ , "th" تھ , "Th" ٹھ , etc. Note that aspirated letters are really combinations, and do not count as letters in the Urdu alphabet.
  3. "Noon Ghunna" ں (key shift-N).
  4. In Urdu, it is traditional to count "Goal Hay" ہ (key O) and "Dochashmi Hay" ھ (key H) as two different letters. In Persian, these are not different letters but simply two visually different forms of the same letter. In Persian, most commonly, the "Dochashmi Hay" form is used when the letter occurs in the initial or medial position in a word, and the "Goal Hay" form is used when the letter occurs isolated or in the terminal position.
  5. In Persian, "Hamza" ء (keys shift-4, -, U, shift-W, and =, depending on the context) is not considered a separate letter but a diacritical mark. Urdu and Persian diverge from Arabic in the treatment of "Hamza". In Arabic, "Alif" is a vowel, and "Hamza" is a consonant that represents the glottal stop. In Urdu and Persian, this consonant is written as an "Alif" when it starts a word and as a "Hamza" when it occurs in the middle or end of a word. Thus in Urdu and Persian, "Alif" is both a vowel and a consonant.
  6. "BaRi Yay" ے (key Y). In Persian, this is not a different letter, but just a visual variant of "ChoTi Yay" ی . It is used for calligraphic effect in decorative writing, and is sparingly employed in computer-generated text.
  7. The combination "Laam Alif" لا used to be listed as a separate letter in old Urdu primers. To continue doing that is anachronistic. That "letter" fulfills no need and serves no purpose. The Urdu-QWERTY keyboard does not have any single key assigned to the "Laam Alif" combination.
Here is a summary of the alphabets: Arabic has 28 letters. To those, Persian adds "pay" پ , "chay" چ , "zhay" ژ , and "gaaf" گ , and thus has a total of 32 letters. Urdu, somewhat arguably, has 39 letters, counting the following as additional letters: "Tay" ٹ , "Daal" ڈ , "Ray" ڑ , "Noon Ghunna" ں , "Hamza" ء , "Dochashmi Hay" ھ , and "BaRi Yay" ے .

"Noon Ghunna" ں , "Hamza" ء , "Dochashmi Hay" ھ , and "BaRi Yay" ے   are letters in a rather weak sense, since no Urdu word can begin with these. (Hence, Urdu dictionaries do not dedicate chapters to these as they do to regular letters.)

Urdu and Persian differ markedly in the pronunciation of vowels.

  1. In Persian, the vowel "Alif" is generally pronounced like au in the English word maul. In Urdu, the same vowel is pronounced like the vowel "a" in father.
  2. In Persian, the short vowels "zer" and "pesh" have majhool sounds and the long vowels "Vaao" and "Yay" have maaroof sounds. (See the note below about maaroof and majhool sounds.) In Urdu, the same vowels do double duty to represent both maaroof and majhool sounds.
These differences do not affect writing unless special marks are used to distinguish maaroof and majhool sounds.

There are minor variations in the placement of "Hamza" between Urdu and Persian orthographic styles. But the needed forms in all cases are adequately provided by the keyboard and the fonts that we have recommended.

The standard ("educated person's") pronunciation of consonants is generally identical in Urdu and Persian, and often different from Arabic. Some of the similarities and differences are as follows:

  1. The consonants contain some groups of separate letters that are homophones (pronounced with the same sound) in Urdu and Persian. These groups of letters and their approximate English pronunciations are:
    • [ ا (as consonant) , ء , ع ] pronounced as the "glottal stop"; ***
    • [ ہ , ح ] pronounced as "H";
    • [ ط , ت ] pronounced as "T";
    • [ ص , س , ث ] pronounced as "S";
    • [ ظ , ض , ز , ذ ] pronounced as "Z".
    *** English does not have any special letter or mark for expressing the glottal stop sound; this sound is simply a part of the sound of a syllable starting with a vowel. Thus it is the initial consonant sound heard in words such as "add", "end", "eye", "it", "ooze", and "up". Urdu, Persian, and Arabic explictly indicate this consonant via the letters "hamza" (ء) and "`ain" (ع). When it occurs at the beginning of a syllable, Urdu and Persian also indicate it by "alif" (ا). When occurring in the middle of a syllable in Urdu or Persian, "alif" serves as a vowel. In Arabic, this letter is always a vowel.

    In Arabic, the letters within each of the above groups have distinct pronunciations. The Arabic pronunciation is sometimes imitated in Persian and Urdu speech by religious clerics. But in the standard Persian and Urdu pronunciation, the letters in each group have identical sounds. The Urdu-QWERTY keys for these consonants are in a table given earlier .

  2. When occurring as a consonant, the letter "Vaao" و (key W) is pronounced like "v" in Urdu and Persian, but like "w" in Arabic.
  3. The letter "Qaaf" ق (key Q) has the same pronunciation in Urdu and Arabic but a different one in Persian. In Persian, the letters "Qaaf" ق and "Ghain" غ (key Shift-G) are generally pronounced alike, with the same sound as that of "Ghain" غ in Arabic and Urdu.

Since we have called our keyboard phonetic, we wanted to relate the pronunciation of the alphabet letters with the keys being used to enter them. The tedious details given above will perhaps help you in remembering the keys. As you can see, some Persian and Urdu letters are hard to phonetically map to a Latin-based keyboard!

(Back to CONTENTS)


Note on Maaroof and Majhool Vowels

There is an old classification of certain vowel sounds as maaroof (literally, well-known) or majhool (literally, unknown or unfamiliar). The difference between these can be illustrated with English words as follows:

  1. Short vowel mark Zer:       pill (maaroof), pell or bell (majhool)
  2. Short vowel mark Pesh:     pull (maaroof), *** (majhool)
  3. Long vowel letter Vaao:     pool (maaroof), pole (majhool)
  4. Long vowel letter Yay:        peel (maaroof), pale (majhool).
*** The majhool pesh is difficult to illustrate in English because the letter "O", which is closest to that vowel, is pronounced in several different ways. But this is the vowel sound found in the Persian pronouncement of "gol" گل (meaning flower) and "sokhan" سخن (utterance). An Urdu example is "bahot" بہت (meaning very or a lot). But beware that the Hindi-influenced pronunciation of this word is "bahut" which has a maaroof, not majhool, vowel sound.

In general, note that:

  1. The short majhool vowel sounds represented by Zer and Pesh occur in Urdu and Persian but not in Hindi or Arabic.
  2. The long majhool vowel sounds represented by Vaao and Yay occur in Urdu and Hindi but not in Persian or Arabic.

(Back to CONTENTS)


Acknowledgement

Urdu-QWERTY was designed with the aid of Ukelele, a keyboard layout editor for MacOS. I thank John Brownie, the author of Ukelele, for developing this melodious software and for making it available under a freeware license.

I also wish to thank
     Amal Ahmed, Aaron Jakes, Muhammad Javed, Shebab Javed, Karan Misra, Knut S. Vikor, and Muhammad Yusaf
for reporting problems and for offering suggestions to make this page more informative and useful.

I thank Faiz Imam for testing the keyboard on Windows 8 and writing the instructions for installing and activating the keyboard on that system

Special thanks to Ghalib Awan for translating some parts of this page in Urdu, thus making this excellent Urdu tutorial out of it:
Install and Use Urdu in Mac OS X.


(Back to CONTENTS)