Indigo Rose Software

Professional Software Development Tools

 
+ Reply to Thread
Results 1 to 5 of 5
  1. #1
    Join Date
    Nov 2004
    Location
    Belgium, Leuven
    Posts
    145

    TXT to Table to XML

    Dear all,

    this is what I want to do:

    I have 2 plain text files (UTF-8) with the same number of lines in both files. One is a source file, the other is a translation of the source file. Line X in the source file corresponds to line X in the target file.

    I would like to create an XML file (TMX format defined by LISA) with this simple structure:


    <?xml version="1.0"?>
    <tmx version="1.4">
    <header creationtool="AMS" datatype="PlainText" segtype="sentence">
    </header>
    <body>
    <tu tuid="1">----------------------------- this is the line numer
    <tuv xml:lang="EN">
    <seg>Line number one</seg>-------------- this is from the first file
    </tuv>
    <tuv xml:lang="FR">
    <seg>Ligne numéro un</seg>-------------- this is from the second file
    </tuv>
    </tu>
    </body>
    </tmx>


    There are tools that exist to do this (using Java) but they all run into memory problems if I try to merge text files of 20 MB.

    Is this something that I could do with AMS / LUA ?
    Should I read the TXT files to a table first, and then merge the tables to XML?

    If anyone can give me advice or point me to some example code, that would surely help.

    thanks

    Gert

  2. #2
    Join Date
    Sep 2002
    Location
    Sol 3
    Posts
    3,160
    LUA is very fast when processing data. I have found however that it slows down when you have to read in or output files. What you are doing should be easy to do within AMS. I would recommend reading the files to a table, then you can reference the same index in a for loop for both files.

    How many lines are you attempting to combine?
    TJ-Tigger
    "A common mistake that people make when trying to design something completely foolproof was to underestimate the ingenuity of complete fools."
    "Draco dormiens nunquam titillandus."
    Map of IR Forum Users - IR Project CodeViewer - Online Help - TiggTV - QuizEngine

  3. #3
    Join Date
    Nov 2004
    Location
    Belgium, Leuven
    Posts
    145
    TJ,

    the 2 source files are often 20 to 60 MB each... The TMX generated from it can be 150 to 200 MB.

    I've no idea how to do this, even not in AMS.
    Should I read one file to one table, the other to another one, and then combine the two?
    And how to turn the "merged" table into an XML structure...

    I tested a couple of things, and I'm not sure the UTF-8 support is great... But that can be because of my lack of knowledge...

    thanks

    gert

  4. #4
    Join Date
    Sep 2002
    Location
    Sol 3
    Posts
    3,160
    I have not worked through the code yet but here is how I would approach this task.

    Read text file one to tableone
    Read text file two to tabletwo
    compare tableone and tabletwo to ensure they have the same number of lines
    create the xml file that you will use to add this information to
    You can create a string in your program with the basic structure and load that into the XML file
    Once this is loaded you can step through your files and add them to xml file
    use a for loop to step through the table for x = 1, table.count(tableone) do
    Grab line x from file one and add/insert this into your XML file
    Grab line x from file two and add/insert this into your XML file

    use the variable x to populate your tuid in the following line
    <tu tuid="1">

    Then use file one to populate the appropriate language and reverence tableone[x]
    <tuv xml:lang="EN">
    <seg>Line number one from file one</seg>
    </tuv>

    Then do the same thing for file two tabletwo[x]
    <tuv xml:lang="FR">
    <seg>Line number one from file two</seg>
    </tuv>

    I hope that helps to start your coding. Try to put some code together and post it here if you get stuck. If I have the time I will attempt to put something together.

    Tigg
    TJ-Tigger
    "A common mistake that people make when trying to design something completely foolproof was to underestimate the ingenuity of complete fools."
    "Draco dormiens nunquam titillandus."
    Map of IR Forum Users - IR Project CodeViewer - Online Help - TiggTV - QuizEngine

  5. #5
    Join Date
    Nov 2004
    Location
    Belgium, Leuven
    Posts
    145
    TJ,

    thanks for the feedback... I was ill for almost a week, and now I'm running behing all schedules

    As soon as I have some time, I will try to follow your advice.

    Thanks

    gert

Similar Threads

  1. Parsing Data from xml URL to a txt file
    By jimmy guilfoyle in forum AutoPlay Media Studio 7.5
    Replies: 1
    Last Post: 04-08-2008, 11:32 AM
  2. How to use a web exe with an external txt data file
    By DaSoulRed in forum AutoPlay Media Studio 7.5
    Replies: 0
    Last Post: 12-06-2007, 10:53 AM
  3. Right from table to TXT
    By dgould in forum AutoPlay Media Studio 6.0
    Replies: 5
    Last Post: 05-02-2007, 07:02 AM
  4. Spotlight: XML Actions Plugin
    By Desmond in forum AutoPlay Media Studio 5.0
    Replies: 1
    Last Post: 03-15-2004, 03:56 PM
  5. Creating a Table of Contents
    By Desmond in forum AutoPlay Media Studio 5.0 Examples
    Replies: 0
    Last Post: 10-03-2003, 11:35 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts