Word search
Development Status
The search engine is under active development. The current configuration operates according to the following parameters:
- Queries are case-insensitive
- Results are grouped by normalized word forms (sequences stripped of epigraphic markers such as square brackets)
- Determinatives (e.g., DIŠ, DINGIR, MUNUS) may occasionally produce non-standard sorting behavior
Query Input and Pattern Matching
The DAPCA search engine operates in two complementary modes, each designed to address distinct research requirements:
1. Basic Search for transliterated forms
The search engine accepts any character sequence as input, whether representing complete word forms (e.g., i-ša-am) or partial strings (e.g., ša-).
Substring Pattern Matching Behavior
The search engine is configured to match substring patterns by default. A search for the preposition a-na, for instance, will return all words containing this substring, including the toponym a-ia-la-na-za, the personal name ḫa-na-ak-ka₄, and numerous other attestations. Users seeking exact matches must employ anchoring operators (see below).
2. Lemmatized Search (Dictionary Headword)
In addition, the system supports lemmatized queries based on standardized dictionary headwords. This modality enables users to retrieve all morphological variants and orthographic realizations of a given lexeme without requiring explicit enumeration of attested forms.
For instance, searching for the lemma šâmu (to buy) will return all inflected forms attested in the corpus, including a-ša-am, i-ša-am, i-ša-am-mu, i-ša-am-šu, ta-aš-ta-ma, and other grammatical realizations, regardless of their orthographic representation or morphological complexity.
Regular Expression Syntax
Both search modes supports regular expression (REGEX) syntax, enabling sophisticated pattern matching through special characters and metacharacters. For comprehensive documentation of REGEX syntax, consult standard references such as the Wikipedia article on Regular Expressions.
1. Positional Anchors
Define exact positions within word boundaries:
^i-šamatches sequences beginning withi-ša(e.g.,i-ša-am), excludingpa-ni-ša;a-am$matches sequences ending witha-am;^a-na$matches exactly the prepositiona-na, excluding longer forms.
2. Wildcards
Match variable character patterns:
.(dot) matches any single character (letter, digit, whitespace);?(question mark) indicates zero or one occurrence of the preceding element;*(asterisk) indicates zero or more occurrences of the preceding element;+(plus sign) indicates one or more occurrences of the preceding element.
Advanced Pattern Combinations
Wildcards achieve maximal utility when combined strategically. Consider the pattern -ru.?$, which matches:
- The reading
-rufor the sign RU; - Followed by any single character (
.); - Where that character may occur zero or one time (
?)—i.e., optionally; - At word boundary (
$), with no subsequent characters.
This pattern retrieves maḫ-ru, maḫ-ru₃, and maḫ-rum uniformly, thereby capturing homophonic sign variants1 and forms with mimation2 through a single query expression.
3. Alternation Operators
Match multiple alternative patterns:
a|bmatches eitheraorb;(šu|su)-numatches bothšu-nuandsu-nu;[aeiou]matches any single vowel character.
Examples:
^(a-na|i-na)$retrieves both prepositionsa-naandi-nasimultaneously;ma-[ḫh]i-rumatches bothma-ḫi-ruandma-hi-ru, accommodating orthographic variation;DUMU(\.MEŠ)?matches bothDUMU(singular) andDUMU.MEŠ(plural).
Search Results and Output Format
Result Aggregation and Ordering
Query results are returned in aggregated form and sorted alphabetically. By default, results are clustered by normalized word form and semantic class (PN = personal name, GN = geographical name, DN = divine name, etc.). The "Additional Groupers" interface element allows users to apply supplementary grouping criteria, though word-form and semantic-class grouping remain active as baseline organizational principles.
Occurrence Documentation
For each attested form, the system returns a comprehensive list of occurrences with the following notational conventions:
- Standard attestations: Line numbers are provided in plain format (e.g.,
2,15) - Broken contexts: When the term appears in damaged passages—whether partially or completely broken—the line number is enclosed in square brackets (e.g.,
[2]) or half-brackets for partially damaged lines (e.g.,⸢2⸣) - Alternative readings: Attestations occurring exclusively in editorial notes (such as alternative readings or restorations) are marked with an asterisk following the line number (e.g.,
3*)
Navigation and Highlighting
Clicking on any tablet identifier in the results list navigates directly to the full text view, where the queried term is automatically highlighted in yellow for rapid visual identification within its documentary context.
