2 thoughts on “KSUCCA Metadata

  1. Dear Maha,

    I would like to congratulate you on this great achievement, which reflects the amount of effort you put into this project.

    Are you planing to develope or adapt any tool to be used for searching or analysing the corpus?

    It may be easier to access the files if you add the whole name (like BA1.txt) instead of dividing the code on the folders. That will make the real names similar to the codes in your Excel sheet as well.

    I hope you the best in your PhD degree.

    • Thanks Abdullah,

      Actually the names in the folders are exactly the same as in the Excel sheet. The first letter stands for the genre and the second letter stands for the subgenre within that genre and the number stands for the file’s sequence number within that subgenre. I think that grouping the files into folders (genres) is more helpful than spreading them all since it provides some means of organization. Don’t you agree?

      There are of course other ways to group the files, for example, they can be grouped chronologically, depending on the time period (Hijri century) on which they were written. But I have chosen the first way, and whom ever wants to study a specific aspect regarding the corpus, he/she can consult the Excel sheet by just clicking the sort button and choose to sort the files depending on document’s title, author name, time period, number of words, and so on, and he/she can get the list of files that specify the criteria and look for them in the corpus files. If, for example the fist letter of the file begins with letters ‘A’ and then ‘B’, then he/she can find the file with same name in the folder ‘A’ and in the sub folder ‘B’.

      regarding your first question, the answer is yes, the main reason I built this corpus was to use it in my PhD research on distributional semantics. I will (in sha Allah) add other applications to facilitate analyzing the corpus such as a concordance and a frequency calculator as Mr. Abdulbaqi suggested. The reason I published it before adding the tools was that other researchers needed it urgently.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>