Skip to content

The Document

Sinleqiunnini does not store transliterated documents as text files, but splits the various components of a cuneiform document into different database tables for more efficient process of information retrieval [link].

Nevertheless, to simplify the task of compiling and entering data, the system is equipped with a robust "parser" that accepts as input a simple text file formatted according to a relatively easy-to-remember "diplomatic notation". It is a shallow markup system employed to help the program understand how to differentiate the different words and signs values, and how to manage them into a database-driven architecture [refer to sample1 e sample 2]

From text to DB: "tokenizazion"

sample text RE 67

From the user's perspective, a key concept is the use of white-spaces to separate semantic units, that is tokens, that the parser will interpret as individual elements (i.e., words). Any word, as well as any epigraphic notation1, will be distinguished by a unique identifier (ID).

scribal layout

id area_id notation
14353 4201 DIŠ=-bu-ul-la
14354 4201 DUMU
14355 4201 ib-ri-a-da-li
14356 4202 u₃
14357 4202 MUNUS=-aš-tar-um-mi
14358 4202 DAM-šu₂
14359 4203 MUNUS=-ba-aʾ-la-ki-mi
14360 4203 DUMU.=MI₂-šu₂-nu
... ... ...

This segmentation process is of primary importance for the DAPCA's search engine, as it allows dealing with any token in the system as an isolated unit. It permits the application of a wide set of search strategies to individual elements of a document (e.g., regex, fuzzy search, similarity measures, etc.). At the same time, it is always possible to keep these elements in their context and perform complex searches, for instance, for syntagmata or "chains of words."

Likewise, the system keeps track of the "coordinates" of each of these tokens, thus allowing for the reorganization of the document in its integrity for printing on screen.

Final result: scribal layout

The parser

The following examples show how a "plain text" format (i.e., .txt) should be structured to be accepted by the parser.

Sample text 1

Diplomatic notation: sample 1
@obverse
1   A.ŠÀ ma-["la ma"]-s,u2-ú
2   i-na t_*BI-*IQ-mi URU g_ra-ab-ba-=KI
3   n_1-1/2 IKU GÍD.DA
4   n_1-1/2 IKU ru-up-šu
5   ÚS.SA.DU AN.TA DUMU.=MEŠ p_za-bi-hi
6   ÚS.SA.DU KI.TA DUMU.=MEŠ p_a-mur-ri
7   SAG.KI n_1.KÁM p_iš-bi-ᵈda-gan
8   DUMU p_na-ap-ši
9   SAG.KI n_2.KÁM {DUMU.=MEŠ} p_da-gal-li DUMU p_ir-am-ᵈda-gan
10  [A.ŠÀ] -ri--ma i-na Top_N_sí-ip-hu
11  n_[X] IKU GÍD.DA
12  n_[X] ši-id-du ru-up-šu
13  [Ú]S.SA.DU AN.TA DUMU.=MEŠ p_da-gal-li DUMU p_ir-am-ᵈda-gan
14  [Ú]S.SA.DU KI.TA DUMU.=MEŠ p_at-tu-wa
15  [SA]G.KI n_1.KÁM p_a-wi-ru DUMU p_il-la-ti
@bottom
16  [SAG].KI n_2.KÁM URU.=KI
17  ša DN_ᵈNIN.URTA
18  ù =.MEŠ=-ši-bu-ut
@reverse
19  GN_[UR]U e-mar-=KI
20  [b]e-lu-ú A.ŠÀ.=HI.=A
21  p_ᵐir-am-ᵈda-gan
22  DUMU p_il-la-ti
23  a-na n_1 me-at n_5/6 ma-na .BABBAR
24  [Š]ÁM TIL.LA
25  A.ŠÀ i-ša-am .BABBAR-^pa mah-
26  ŠÀ-šu-nu DU₁₀-^a-^ab ša ur-ra-am še-ra-am
27  n_2 A.ŠÀ.=HI.=A i-ba-qa-
28  n_1 li-im .BABBAR a-na DN_ᵈNIN.URTA
29  n_1 li-im .BABBAR a-na URU.=KI
30  Ì..E.MEŠ
31  IGI p_ab-ba-nu DUMU p_ᵈIM-GAL
32  IGI p_píl-su-ᵈda-gan ŠEŠ-šu
33  IGI p_ᵈen-ma-lik ŠEŠ-šu-ma
34  IGI p_ᵈra-ša-ap-la-i DUMU p_ki-ir-ra
35  IGI p_ab-da DUMU p_hi-e-mi
36  [IGI] p_ša--da DUMU p_ᵈda-gan-ka
37  [IGI] p_ša--da DUMU p_i3--a-bi
@left
38  [IGI p_i-]t[úr]-ᵈD[a-gan DUMU p_i]a-ah-ṣi-E[N]
39  [IGI p_ir-ib-ᵈIM DUMU p_ha-t]a-ni
40  [IGI p_x-x-x-x w_DU]B.SAR

Sample text 2

Diplomatic notation: sample 2
@obverse
$blank space(2)
1   É-^tu ma-la ma-ṣú-ú
2   n_25 i-na am-ma-ti GÍD.DA-šú
3   n_23 i-na am-ma-ti ru-up!-šú
4   ZAG-šu É.UDUN ša DUMU.=MEŠ p_ᵐga-ni
5   GÙB-šú É DUMU.=MEŠ p_ᵐba-at-ta
6   pa-nu-šú p_ᵐᵈKUR-a-bu DUMU p_ga-ni
7   EGIR-šú p_ᵐat!-tu DUMU p_zu-Ba-la
8   É ša p_ᵐa-hi-ᵈKUR ù p_ᵐÌR-DINGIR.=MEŠ DUMU p_ib-ni-be
9   KI p_ᵐa-hi-ᵈKUR ù p_ᵐÌR-DINGIR.=MEŠ DUMU p_ib-ni-be
10  p_ᵐab-du DUMU p_zu--tar-ti DUMU p_qa-ba-ri
11  a-na n_31 GÍN .BABBAR É-^ta -am
12  ma-an-nu-me-e ur-ra-am še-ra-am É-^ta
13  i-pa-qa-ru .BABBAR.=MEŠ TÉŠ.BI
@bottom
14  a-na p_ᵐab- DUMU p_zu--tar-ti
15  li-din É-^ta lil-
@reverse
$ruling
16  ù a-nu-ma a-šar .BABBAR.=MEŠ e-ru-bu
17  n_20 GÍN .BABBAR.=MEŠ a-na p_ᵐib-ni-ia DUMU p_ma-di-Te
18  n_10 GÍN .BABBAR.=MEŠ a-na DAM p_ᵐᵈKUR-a-bi DUMU p_ga-ni
19  n_1 GÍN .BABBAR a-na p_ᵐAD-DIRI DUMU p_da-a-i
$ruling
20  a-nu-ma ṭup-pu la-be-ru ša É an-ni-i
21  ha-liq šum-ma i-na EGIR u-mi ú-še-lu-šu
22  ṭup-pu an-nu-ú i-hap--e-šú
$ruling
23  NA=.KIŠIB p_ᵐa-hi-ma-lik __ NA=.KIŠIB p_ᵐa-hi-ᵈKUR
$seal(1) ________________________ $seal(1)
24  WN_LÚ=.UGULA __________ DUMU p_ib-ni-ᵈKUR EN É
25  _____ NA=.KIŠIB
$seal(1)
26  __ p_ᵐBe-li DUMU p_Ba-ia
@top
27  IGI p_ᵐi-mu-ut-ha-ma- DUMU p_ᵈKUR-GAL!(MA.)
28  IGI p_ᵐam-za-hi DUMU p_eh-li-ia
29  IGI p_ᵐEN-ma-lik DUMU p_ṣa!-al-
30  ________ IGI p_ᵐÌR-DINGIR.=MEŠ DUMU p_ib-ni-be EN É

1. Physical surfaces of tablet

Those are self-explanatory:

  1. @obverse
  2. @bottom
  3. @reverse
  4. @top
  5. @left

Accordingly, every text file must therefore begin with an @. Otherwise, the parser raises an exception and explicitly prompts the user. In all cases where a line begins with one of these tags, there is no need to add anything else. For example, anything after the @obverse tag will be ignored or, in the worst case, will produce an error.

This marking does not require a line number.

2. Free-text markers

Freely text-based markers can be inserted to indicate various aspects of both the document and metatextual elements. This feature is enabled by placing the $ character before the footnote.

This marking does not require a line number.

This markup includes some helpers for better document layout:

  1. $ruling is replaced by a horizontal line. Multiple horizontal lines can be represented by placing multiple $rulings. For example:

    1
    2
    3
    4
    10 <text in transliteration>
    $ruling
    $ruling
    11 <text in transliteration>
    

  2. If parentheses follow the markup, the number included in the parentheses tells the system the number of lines the "note" should occupy. For example:

    • $break(3) tells the system that we have a break that corresponds roughly to three lines of text.
    • $blank(2) tells the system that we have a portion of the tablet left blank that occupies the equivalent of two lines of text.
    • $seal(4) indicates that the space occupied by the seal corresponds to four lines.

Additionally, one can add any information with the $ prefix. For instance, $the beginning of the column is broken or $an unknown number of columns destroyed, eventually in combination with the (n) alike. This information will be searchable, but please note that it affects the text layout.

It is recommended to use this tool sparingly.

Annotations

For a proper annotation system, see the discussion in: ...

3. Line numbers

Each transliterated line must begin with a line number, as is customary in Assyriological tradition. Currently, there are two possibilities:

  • a simple numeral: 1
  • a numeral followed by a single quote character after breaks: 1'

Customization

A set of parser-specific rules prevents the system from accepting anything other than numbers and/or numbers + single-quote as a line label. Additional rules can, however, be added to allow it to accept different line formats.

The sequence of numbers is virtually free, that is, one can decide to start with 1' after breaks or to continue with the previous number sequence (e.g., ... 10 / $break / 11'...)

Note

Regardless of how one chooses to name the line numbers, the system internally stores their order, which is determined by the order of the lines in the text file. The line numbers should be considered simple labels.

3.1. Line number separators

After every line number, a white-space (i.e. \s) -- or eventually a tab separator (i.e. \t) -- must follow. This allows the parser to understand where the line number section ends and the transliteration begins.

4. Transliteration

4.1. Graphic relationships

Character Function Example
[carriage return] line boundary
[space] word boundary ša ur-ra-am še-ra-am
- sign boundary i-ša-am
. intra-logographic boundary ÚS.SA.DU AN.TA
+ used for ligatures i+na
x or × for inclusions AB×ḪA₂
_ for blank-spaces [____ i-]na

4.1.1. Breaks and lacunae

Tip

White-spaces in digital transliterations are often neglected, whereas they are of primary importance, for instance, for material philology.

Please refer to the following cases:

  • White-spaces for scribal layout

scribal layout

8 u ša EDIN ḪA.LA-ia ma-la it-ti ŠEŠ.=MEŠ-ia
9 i-kaš-ša-da-an-ni ______________ lil-qe
$ruling
10 a-nu-ma a-šar KU.BABBAR.=MEŠ u ŠE.=MEŠ ḫu-bul-‹‹la››-li-ia i-ru-ub
11 n_10 GIN KU.BABBAR.=MEŠ a-na le-et p_DIŠ=-zu-ba-la DUMU p_a-ḫi-ma-lik
12 n_10 _ MIN ________________ a-na le-et p_DIŠ=-DINGIR=-KUR-ta-li-iḫ DUMU p_zi-ik-ri-DINGIR=-KUR
13 n_10 _ MIN ________________ a-na le-et p_DIŠ=-še-i-DINGIR=-KUR DUMU w_tar-ta-ni
  • White-spaces for tablet fractures

Screenshot dell'interfaccia principale dell'editor

@obverse
1 [___________________________________]-x
2 [_________________________________ t]a-a-an-=ḪI.=A
3 [________________________________ ]x x IKU.=ḪI.=A
4 [_______________________________ ]-im-i
5 [_______________________________ ]x-ma p_eḫ-li-DINGIR
6 [_______________________________ i]l-la-ak

4.2. Modifiers

Character(s) Function Example
=- Preposed determinatives are followed by = and the sign boundary - LU2=-mu-ti-ia
-= Postposed determinatives are preceded by = which in turn is preceded by the sign boundary designation - ra-ab-ba-an-=KI
=. more complex determinatives LU2=.MEŠ=-ši-bu-ut
.= more complex determinatives A.ŠÀ.=HI.=A
^- Preposed phonetic complements are followed by the symbols ^- li^-lil-lik
-^ Postposed phonetic complements are preceded by the symbol -^ URU-^li3
* In front to uninterpreted signs *BI-*IQ-mi or *bi-*iq-mi

4.3. Condition of the text

Character(s) Fuction Example Info
x unreadable signs x or x-x-x or x x x
X a single unreadable number [X] li-im KU3.BABBAR
[ ] as usual
⸢ ⸣ as usual, but the half brackets must keep the entire sign ma-⸢la ma⸣-ṣu₂-u₂ is clearer than
ma-l⸢a m⸣a-ṣu₂-u₂
⸤ ⸥ as usual, but the half brackets must keep the entire sign available but not in use in DAPCA
{} for erasures {DUMU.=MEŠ} da-gal-li
< > added by a modern editor
<< >> mistakenly written by the scribe
[()] indicates that there may or may not be a sign present in a break [x-(x)-x] deprecated!
() Alternatives, actual signs, explanatory names, etc. mu-sa!(u2)-ra
? after the sign for uncertain reading
! after the sign for abnormal graphic writing. When possible, the actual sign must be reported[^7]. mu-sa!-ra or mu-sa!(u2)-ra
* before uninterpreted (uppercase) signs i-na *BI-*IQ-mi
° new readings: follow each sign a-na pa°-ni° deprecated!

Notes

  • In broken contexts, it is preferable to indicate the actual, visible space with a series of underscores, for example [________], rather than attempting to predict the number of missing characters. Therefore, although the notation [x (x) x] or [x x x] is accepted by the system, it is arbitrary at best and less preferable than the first one.
  • Since the system is a multi-user platform, this type of marking (e.g., a-na pa°-ni°), which is perfectly acceptable in printed publications, raises doubts as to who actually entered an alternative reading. It can be used, but sparingly, and should be replaced by the annotation system [*ref].

4.5. Punctuation

Character Function Sign Example
\ Glossenkeil[^9] GAM
: gloss marker[^10] KUR.=MEŠ :nu-ku-ur-ti
/ “new line” marker to be used either
alone or within a word
a-ḫi-ma-lik / ŠEŠ-šu
or
E₂ u₃ ḫa-ab-la i-ša-/am

5. Semantic classifiers

To indicate some domains, the program accepts the following classifiers to be placed in front of words:

Code Alternative Function Example
p_ PN_ masculine personal name p_za-bi-hi or p_ir-am-d_ᵈda-gan
f_ PNF_ female personal name f_al-ḫa-ti
d_ DN_ divine name DN_ᵈNIN.URTA
g_ GN_ geographical name g_ra-ab-ba-=KI
t_ Top_N_ topographical feature Top_N_sí-ip-hu
n_ NUM_ numerals n_1+1/2
w_ WN_ "work" name w_DUB.SAR or w_LU2=.DUB.SAR
m_ MN_ month name ITI m_ᵈḫal-ma

IMPORTANT! - These semantic classifiers must always precede any other element of the word. Thus, for instance, in the case of [X+1], the classifiers must also precede the initial square bracket: n_[X+1].

6.Language

... forthcoming

7. Allowed Characters 3

The system automatically checks for valid characters and will return an error message if unknown glyphs are used. It also recognizes "shortcuts", sequences of glyphs automatically changed to expected glyphs. For example, the combination of [ and " (i.e. [" ) is replaced by ⸢, TOP LEFT HALF BRACKET.

For a complete list of these combining characters, see the following table, column "alternative".

In any case, please prepare the text files with the desired Unicode glyphs. A "Virtual keyboard" button can be used to insert unusual characters into the texts on the "Insert a new tablet" webpage.

char alternative U. cat name code
0-9 all numbers
a-z all lowercase ascii characters
A-Z all uppercase ascii characters
h Ll LATIN SMALL LETTER H WITH BREVE BELOW U+1E2B
H Lu LATIN CAPITAL LETTER H
š sz or sh Ll LATIN SMALL LETTER S WITH CARON U+0161
Š SZ or SH Lu LATIN CAPITAL LETTER S WITH CARON
s, Ll LATIN SMALL LETTER S WITH DOT BELOW U+1E63
S, Lu LATIN CAPITAL LETTER S WITH DOT BELOW
t, Ll LATIN SMALL LETTER T WITH DOT BELOW U+1E6D
T, Lu LATIN CAPITAL LETTER T WITH DOT BELOW
_ Pc LOW LINE / underscore U+005F
- Pd HYPHEN-MINUS U+002D
, Po COMMA U+002C
: Po COLON U+003A
! Po EXCLAMATION MARK U+0021
? Po QUESTION MARK U+003F
. Po FULL STOP U+002E
' Po APOSTROPHE U+0027
" Po QUOTATION MARK U+0022
\< Pi SINGLE LEFT-POINTING ANGLE QUOTATION MARK U+2039
> Pf SINGLE RIGHT-POINTING ANGLE QUOTATION MARK U+203A
( Ps LEFT PARENTHESIS U+0028
) Pe RIGHT PARENTHESIS U+0029
[ Ps LEFT SQUARE BRACKET U+005B
] Pe RIGHT SQUARE BRACKET U+005D
{ Ps LEFT CURLY BRACKET U+007B
} Pe RIGHT CURLY BRACKET U+007D
@ Po COMMERCIAL AT U+0040
/ Po SOLIDUS U+002F
\ Po REVERSE SOLIDUS U+005C
[" Ps TOP LEFT HALF BRACKET U+2E22
"] Ps TOP RIGHT HALF BRACKET U+2E23
[, Ps BOTTOM LEFT HALF BRACKET U+2E24
,] Ps BOTTOM RIGHT HALF BRACKET U+2E25
+ Sm PLUS SIGN U+002B
× x Sm MULTIPLICATION SIGN U+00D7
| Sm VERTICAL LINE U+007C
= Sm EQUALS SIGN U+003D
; Po SEMICOLON U+003B
* Po ASTERISK U+002A
^ Sk CIRCUMFLEX ACCENT U+005E
% Po PERCENT SIGN U+0025
° So DEGREE SIGN U+00B0
0 No SUBSCRIPT ZERO U+2080
1 No SUBSCRIPT ONE U+2081
2 No SUBSCRIPT TWO U+2082
² No SUPERSCRIPT TWO U+00B2
3 No SUBSCRIPT THREE U+2083
4 No SUBSCRIPT FOUR U+2084
5 No SUBSCRIPT FIVE U+2085
6 No SUBSCRIPT SIX U+2086
7 No SUBSCRIPT SEVEN U+2087
8 No SUBSCRIPT EIGHT U+2088
9 No SUBSCRIPT NINE U+2089
DINGIR=- Lm MODIFIER LETTER SMALL D U+1D48
MUNUS=- Lm MODIFIER LETTER SMALL F U+1DA0
DIŠ=- Lm MODIFIER LETTER SMALL M U+1D50
ʾ ' Lm MODIFIER LETTER REVERSED GLOTTAL STOP U+02BE
Lm MODIFIER LETTER SMALL CHI U+1D6A
Lm MODIFIER LETTER SMALL GREEK GAMMA U+1D67

  1. u2-ú 

  2. Spiegazione della nota 2 

  3. Accented vowels are not present in the list, even though they are generally aceepted by the parser. In any case, on the background, they are substituted by combinations of letters and lower script digits (e.g., É > E₂, ì > i₃, DU10 > DU₁₀) 

Comments