dataset
The SR Passives dataset is a collection of approximately 120,000 passive sentences in various languages, annotated with various properties such as word order, inflection, and part-of-speech tags. MOF2WS-120K is a subset of the SR Passives dataset designed to capture the specific features of Modern Standard Arabic (MSA). The dataset consists of 115,641 synthetic sentences generated using a machine-learning approach and 4,377 real sentences extracted from different sources in MSA. The synthetic sentences are designed to capture areas that are under-represented in other corpora and/or are specific to Modern Standard Arabic, such as different types of passive constructions. The sentences are annotated with POS and Dependency Treebanks, allowing for detailed analysis of the sentences. The corpus can be used for various tasks such as syntactic parsing, discourse analysis, machine translation, and speech recognition applications.