Multi-word search

Syntagmatic Search: Multi-Word Pattern Matching

The syntagmatic search interface enables users to query for sequences of multiple words, leveraging DAPCA's token-based architecture (see tokenization process) to apply distinct pattern-matching criteria to each individual term within the sequence. This functionality addresses a fundamental requirement of cuneiform textual analysis: the identification of formulaic expressions, syntactic constructions, and multi-word lexical units that carry specific semantic or pragmatic significance within particular documentary contexts.

Input Parameters and Query Syntax

The search interface accepts sequences of words separated by whitespace. Each word position may incorporate its own pattern-matching criteria through regular expression syntax, allowing for flexible queries that accommodate orthographic variation, morphological uncertainty, or deliberately broad search parameters.

Basic multi-word queries:

a-na pa-ni (the prepositional phrase "before, in the presence of")
ab-ba-nu DUMU IM-GAL (the personal name Abbanu with patronymic "son of Baʿlu-kabar")

Pattern-enhanced queries:

Individual terms within the sequence may incorporate wildcards and regular expression operators to refine or broaden the search criteria. For instance:

ab-ba-n. DUMU .* retrieves any individual named Abbanu (with orthographic variants such as ab-ba-nu, ab-ba-ni) followed by the filiation marker DUMU and any patronymic (the .* pattern matches any character sequence)
^a-na pa-n. matches the syntagma beginning with a-na followed by any form of panum (e.g., pa-ni, pa-na, pa-nu)
i-na .*-ti identifies prepositional phrases with ina followed by any noun in the genitive case (marked by the -ti ending)

This token-level pattern specification enables researchers to construct sophisticated queries that balance precision and recall. A search for personal names with particular patronymics, for instance, can accommodate orthographic variation in the name itself while requiring exact matches for the patronymic, or vice versa. Similarly, queries for syntactic constructions can specify fixed elements (such as prepositions or conjunctions) while allowing variation in nominal or verbal components.

Results and Output Format

Result Aggregation

Query results are returned in aggregated form and sorted alphabetically. Unlike the basic word search, syntagmatic searches organize results primarily by the complete word sequence rather than by individual terms. The "Additional Groupers" interface allows users to apply supplementary organizational criteria (such as text genre, chronology, or archaeological provenance) to facilitate pattern analysis across different documentary contexts.

Occurrence Documentation

The result set displays tablet identifiers for documents containing the queried syntagma. Unlike the single-word search interface, which provides line-by-line occurrence lists, syntagmatic search results present document-level attestations. This design decision reflects the typical use case for multi-word searches: identifying documents containing specific formulaic expressions or syntactic patterns for subsequent detailed analysis.

Clicking on any tablet identifier navigates directly to the full text view, where all terms in the queried syntagma are automatically highlighted in yellow, enabling rapid visual identification of the matched sequence within its broader textual context.

Methodological Applications

The syntagmatic search functionality supports diverse research applications:

Formulaic analysis: Identification of recurring administrative formulae (e.g., a-na pa-ni PN in juridical texts, i-na ITI ŠE for temporal specifications in contracts)

Prosopographic research: Retrieval of individuals with specific patronymics or titles (e.g., PN DUMU PN₂, PN LÚ GN)

Syntactic studies: Corpus-wide analysis of particular grammatical constructions (e.g., prepositional phrases, verbal chains, genitive constructions)

Paleographic variation: Documentation of orthographic alternatives for multi-word expressions across different scribal traditions or chronological periods

Development Status

The syntagmatic search engine is under active development. Determinatives (e.g., DIŠ, DINGIR, MUNUS) may occasionally produce non-standard sorting behavior. Enhanced features for handling determinatives and improved result visualization are planned for future releases.

Input parameters

It expects any sequence of words separated by blank space. For instance:

a-na pa-ni
ab-ba-nu DUMU IM-GAL

Each term may have its own wildcards in order to refine or extend the searching criteria. For instance, searching for ab-ba-n. DUMU .* will return any character named ʾAbbānu (e.g. ab-ba-nu, ab-ba-ni) followed by any possible patronimic.

Results

The results of the query are returned in aggregate form and ordered alphabetically. In this case the basic criteria is the word only. The additional groupers input box allows to choose for different grouping options. Unlike the 'Basic Search', the resultset provides only the tablet names (and not the relevant lines). However, clicking on the tablet name still redirects to the text and the search result is highlighted in yellow.

Nota bene: the search engine is currently under development. Determinatives (e.g. DIŠ, DINGIR, MUNUS) may cause incorrect sorting.