Skip to content

Word search

Development Status

The search engine is under active development. The current configuration operates according to the following parameters:

  • Queries are case-insensitive
  • Results are grouped by normalized word forms (sequences stripped of epigraphic markers such as square brackets)
  • Determinatives (e.g., DIŠ, DINGIR, MUNUS) may occasionally produce non-standard sorting behavior

Query Input and Pattern Matching

The DAPCA search engine operates in two complementary modes, each designed to address distinct research requirements:

1. Basic Search for transliterated forms

The search engine accepts any character sequence as input, whether representing complete word forms (e.g., i-ša-am) or partial strings (e.g., ša-).

Substring Pattern Matching Behavior

The search engine is configured to match substring patterns by default. A search for the preposition a-na, for instance, will return all words containing this substring, including the toponym a-ia-la-na-za, the personal name a-na-ak-ka₄, and numerous other attestations. Users seeking exact matches must employ anchoring operators (see below).

2. Lemmatized Search (Dictionary Headword)

In addition, the system supports lemmatized queries based on standardized dictionary headwords. This modality enables users to retrieve all morphological variants and orthographic realizations of a given lexeme without requiring explicit enumeration of attested forms.

For instance, searching for the lemma šâmu (to buy) will return all inflected forms attested in the corpus, including a-ša-am, i-ša-am, i-ša-am-mu, i-ša-am-šu, ta-aš-ta-ma, and other grammatical realizations, regardless of their orthographic representation or morphological complexity.

Regular Expression Syntax

Both search modes supports regular expression (REGEX) syntax, enabling sophisticated pattern matching through special characters and metacharacters. For comprehensive documentation of REGEX syntax, consult standard references such as the Wikipedia article on Regular Expressions.

1. Positional Anchors

Define exact positions within word boundaries:

  • ^i-ša matches sequences beginning with i-ša (e.g., i-ša-am), excluding pa-ni-ša;
  • a-am$ matches sequences ending with a-am;
  • ^a-na$ matches exactly the preposition a-na, excluding longer forms.

2. Wildcards

Match variable character patterns:

  • . (dot) matches any single character (letter, digit, whitespace);
  • ? (question mark) indicates zero or one occurrence of the preceding element;
  • * (asterisk) indicates zero or more occurrences of the preceding element;
  • + (plus sign) indicates one or more occurrences of the preceding element.
Advanced Pattern Combinations

Wildcards achieve maximal utility when combined strategically. Consider the pattern -ru.?$, which matches:

  1. The reading -ru for the sign RU;
  2. Followed by any single character (.);
  3. Where that character may occur zero or one time (?)—i.e., optionally;
  4. At word boundary ($), with no subsequent characters.

This pattern retrieves maḫ-ru, maḫ-ru₃, and maḫ-rum uniformly, thereby capturing homophonic sign variants1 and forms with mimation2 through a single query expression.

3. Alternation Operators

Match multiple alternative patterns:

  • a|b matches either a or b;
  • (šu|su)-nu matches both šu-nu and su-nu;
  • [aeiou] matches any single vowel character.

Examples:

  • ^(a-na|i-na)$ retrieves both prepositions a-na and i-na simultaneously;
  • ma-[ḫh]i-ru matches both ma-ḫi-ru and ma-hi-ru, accommodating orthographic variation;
  • DUMU(\.MEŠ)? matches both DUMU (singular) and DUMU.MEŠ (plural).

Search Results and Output Format

Result Aggregation and Ordering

Query results are returned in aggregated form and sorted alphabetically. By default, results are clustered by normalized word form and semantic class (PN = personal name, GN = geographical name, DN = divine name, etc.). The "Additional Groupers" interface element allows users to apply supplementary grouping criteria, though word-form and semantic-class grouping remain active as baseline organizational principles.

Occurrence Documentation

For each attested form, the system returns a comprehensive list of occurrences with the following notational conventions:

  • Standard attestations: Line numbers are provided in plain format (e.g., 2, 15)
  • Broken contexts: When the term appears in damaged passages—whether partially or completely broken—the line number is enclosed in square brackets (e.g., [2]) or half-brackets for partially damaged lines (e.g., ⸢2⸣)
  • Alternative readings: Attestations occurring exclusively in editorial notes (such as alternative readings or restorations) are marked with an asterisk following the line number (e.g., 3*)

Clicking on any tablet identifier in the results list navigates directly to the full text view, where the queried term is automatically highlighted in yellow for rapid visual identification within its documentary context.

Search Results Interface

Search results display


  1. Seminara 1998, L'accadico di Emar, p. 90-97. 

  2. "Etymologically justifiable": the reading mahrum is inadmissible, even in post-Old Babylonian contexts, as no Akkadian dialect attests third-person plural masculine verbal suffixes with mimation (Seminara 1998, L'accadico di Emar, p. 91). 

Comments