TXT to Table to XML

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gvanassche

    TXT to Table to XML

    Dear all,

    this is what I want to do:

    I have 2 plain text files (UTF-8) with the same number of lines in both files. One is a source file, the other is a translation of the source file. Line X in the source file corresponds to line X in the target file.

    I would like to create an XML file (TMX format defined by LISA) with this simple structure:


    <?xml version="1.0"?>
    <tmx version="1.4">
    <header creationtool="AMS" datatype="PlainText" segtype="sentence">
    </header>
    <body>
    <tu tuid="1">----------------------------- this is the line numer
    <tuv xml:lang="EN">
    <seg>Line number one</seg>-------------- this is from the first file
    </tuv>
    <tuv xml:lang="FR">
    <seg>Ligne numéro un</seg>-------------- this is from the second file
    </tuv>
    </tu>
    </body>
    </tmx>


    There are tools that exist to do this (using Java) but they all run into memory problems if I try to merge text files of 20 MB.

    Is this something that I could do with AMS / LUA ?
    Should I read the TXT files to a table first, and then merge the tables to XML?

    If anyone can give me advice or point me to some example code, that would surely help.

    thanks

    Gert
  • TJ_Tigger
    Indigo Rose Customer
    • Sep 2002
    • 3159

    #2
    LUA is very fast when processing data. I have found however that it slows down when you have to read in or output files. What you are doing should be easy to do within AMS. I would recommend reading the files to a table, then you can reference the same index in a for loop for both files.

    How many lines are you attempting to combine?
    TJ-Tigger
    "A common mistake that people make when trying to design something completely foolproof was to underestimate the ingenuity of complete fools."
    "Draco dormiens nunquam titillandus."
    Map of IR Forum Users - IR Project CodeViewer - Online Help - TiggTV - QuizEngine

    Comment

    • gvanassche

      #3
      TJ,

      the 2 source files are often 20 to 60 MB each... The TMX generated from it can be 150 to 200 MB.

      I've no idea how to do this, even not in AMS.
      Should I read one file to one table, the other to another one, and then combine the two?
      And how to turn the "merged" table into an XML structure...

      I tested a couple of things, and I'm not sure the UTF-8 support is great... But that can be because of my lack of knowledge...

      thanks

      gert

      Comment

      • TJ_Tigger
        Indigo Rose Customer
        • Sep 2002
        • 3159

        #4
        I have not worked through the code yet but here is how I would approach this task.

        Read text file one to tableone
        Read text file two to tabletwo
        compare tableone and tabletwo to ensure they have the same number of lines
        create the xml file that you will use to add this information to
        You can create a string in your program with the basic structure and load that into the XML file
        Once this is loaded you can step through your files and add them to xml file
        use a for loop to step through the table for x = 1, table.count(tableone) do
        Grab line x from file one and add/insert this into your XML file
        Grab line x from file two and add/insert this into your XML file

        use the variable x to populate your tuid in the following line
        <tu tuid="1">

        Then use file one to populate the appropriate language and reverence tableone[x]
        <tuv xml:lang="EN">
        <seg>Line number one from file one</seg>
        </tuv>

        Then do the same thing for file two tabletwo[x]
        <tuv xml:lang="FR">
        <seg>Line number one from file two</seg>
        </tuv>

        I hope that helps to start your coding. Try to put some code together and post it here if you get stuck. If I have the time I will attempt to put something together.

        Tigg
        TJ-Tigger
        "A common mistake that people make when trying to design something completely foolproof was to underestimate the ingenuity of complete fools."
        "Draco dormiens nunquam titillandus."
        Map of IR Forum Users - IR Project CodeViewer - Online Help - TiggTV - QuizEngine

        Comment

        • gvanassche

          #5
          TJ,

          thanks for the feedback... I was ill for almost a week, and now I'm running behing all schedules

          As soon as I have some time, I will try to follow your advice.

          Thanks

          gert

          Comment

          Working...
          X