9 thoughts on “Corpus Files

  1. Dear Maha,

    First of all, this is a great work that you do! I am interested in working with corpora of classical Arabic, but more from the perspective of social and religious history, rather than linguistics strictly (although, of course, the former is not possible without the latter; some of my work is on my website). According to the metadata table, you have picked all the texts from shamela.ws. I wonder what changes have you done to the initial texts (if any)? Have you collated them with printed editions?

    Best regards,
    Maxim

    • Hi Maxim,

      It is an interesting project you are working on, I’ve taken a look at your site.

      It is correct that the usage of KSUCCA is not limited to linguistics studies, but also can be used for example in literature, historical and social studies. It also can be used in language learning.

      Regarding the text of the documents included in the corpus, they are the original texts from the books with nothing added accepts that some books have the remarks of the annotators how have transcribed the original manuscripts. Almost all of the document have been approved to match the printed editions.

  2. This is an excellent corpus. Can you let me know which text editor or software you find is best to search through this corpus?

    Some text editors do not have good support for Arabic. Similarly, standard text editors do not easily allow one to perform the sorts of searches that are needed for corpus based linguistics such as finding the frequency of words, collocations etc etc. I would be grateful, if you could provide any advice for this and any software (preferably free but also proprietary) to address this problem for arabic text files.

    Also, do you know if there is a more comprehensive corpus for classical arabic poetry e.g one that would contain all the poems attrbuted to Imru’ul-Qays, Al-Nabighah etc etc as are published in their dawaween?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>