[REL] StrEdit : V

» Wed Nov 28, 2012 1:33 am

StrEdit

Introduction

StrEdit is a minimalistic editor of the .STRINGS, .ILSTRINGS and .DLSTRINGS files that TES V: Skyrim plugins use to provide support for multiple languages. StrEdit can be used to create or update translations of strings files, without requiring their associated plugin to be present.

StrEdit does not contain any plugin localisation or delocalisation functionality, ie. you need to create the strings files before StrEdit can be used to edit them. http://skyrim.nexusmods.com/mods/25859/ is recommended for doing that.

Downloads

https://github.com/WrinklyNinja/stredit/downloads

Screenshots

https://raw.github.com/WrinklyNinja/stredit/master/images/main.png
https://raw.github.com/WrinklyNinja/stredit/master/images/open-files.png

Future Plans

Support for translations of StrEdit's interface.
Some sort of translation progress information in the status bar (eg. "51 of 400 strings translated").

» Wed Nov 28, 2012 4:31 am

What are the accepted filename language substring of strings files? E.g. Skyrim_English.STRINGS is valid, is Skyrim_French.STRINGS also valid?

First of all I must encourage you for the development of this potentially useful third-party tool.

About STRINGS files : all Skyrim localized STRINGS files are yet present in the "Skyrim - Interface.bsa" archive while they are overriden and invalidated by the ones present as loose files. They are slightly different from the loose ones, however. I don't really know in which way they are different but it is surely better not to use them "as is". Well, to answer your question : yes, STRINGS files names have this format, and USKP french ones are (as generated by TES5Edit) :

"Unofficial Skyrim Patch_French.DLSTRINGS"
"Unofficial Skyrim Patch_French.ILSTRINGS"
"Unofficial Skyrim Patch_French.STRINGS"

Obtaining all localized STRINGS files for Skyrim and its Update (vanilla) only requires to switch game language in Steam settings. It is however very time consuming as Steam will process downloads for every localized files (mainly localized voice acting bsa archives). I can easily send you the french STRINGS files if you wish. Even if they are unusable "as they are" they remain parts of the original game and may fall under copyrights, except if a moderator or administrator explicitly states that they can be shared on this thread.

DLC STRINGS, however, do not have loose versions and are all embeded in the associated DLC bsa archive.

» Wed Nov 28, 2012 8:17 am

Does anyone know if editing a string via the CK changes its ID?

IIRC, the CK - in its infinite wisdom - re-embeds strings into their corresponding records on saving a delocalized plugin.

» Tue Nov 27, 2012 5:21 pm

IIRC, the CK - in its infinite wisdom - re-embeds strings into their corresponding records on saving a delocalized plugin.

Well, delocalised plugins aren't my concern really (unless the modder goes localized -> delocalized -> localized). I suppose there could be a more advanced mode which takes the old original strings file, the new original strings file and a translated strings file and matches all the strings up to their new IDs.

That's something I don't understand, actually - how does one tell the CK to switch between a localized plugin and a delocalized plugin? What happens if you save a delocalized plugin as a delocalized plugin, do the IDs alter even then?

...I suppose I should really be messing with the CK to find out, now that I have Skyrim...

EDIT: @ nico2137: I don't need you to send me anything, the info on how to get the different localisations should be enough.

» Tue Nov 27, 2012 8:32 pm

That's something I don't understand, actually - how does one tell the CK to switch between a localized plugin and a delocalized plugin? What happens if you save a delocalized plugin as a delocalized plugin, do the IDs alter even then?

Whoops, I think I may have added an unintended 'de' to my previous post. What I meant to say was that the CK will always re-embed strings on saving. One would have use the editor's command line to export the string data and compile that into a string table, after each save.

» Tue Nov 27, 2012 7:22 pm

If you're curious, the way Bash goes about looking for Strings files:

Look for loose files with the appropriate name:
```
_.STRINGS_.DLSTRINGS_.ILSTRINGS
```
Wrye Bash reads the Skryim.ini for 'language' but obviously that's not appropriate in this case.
If it's not there, look inside any BSA associated with that plugin (.bsa). Note: That reminds me, I should check to see what all BSA names will automatically be loaded for a given plugin...
Check all the registered BSAs - read Skyrim.ini, the [Archive] section, 'sResourceArchiveList' and 'sResourceArchiveList2', generate a list of all the BSAs, then search in reverse order.

Anyway, you probably have most stuff figured out already, think you're looking for UI suggestions, which I don't have right now.

» Wed Nov 28, 2012 3:54 am

Well, delocalised plugins aren't my concern really (unless the modder goes localized -> delocalized -> localized). I suppose there could be a more advanced mode which takes the old original strings file, the new original strings file and a translated strings file and matches all the strings up to their new IDs.

Exactly what TES5Edit does when translating. I also tried to use http://en.wikipedia.org/wiki/Levenshtein_distance instead of exact match to achieve a better translations, and results were good indeed, but it was taking too much time to translate even a small plugin, so I removed that code.
About strings IDs, in TES5Edit they are autoincremented indexes of lstrings occurence in a plugin when traversing records/subrecords starting from TES4 header. Don't know the way CK generates them though.
So matching on ID is a bad idea, string matching is much more consistent but still not the best. You should really try that levenstein or other similar method, this will work very good in StrEdit since the amount of data is much less when only 2 strings files loaded and compared. You can add an option with "match percentage" for comparison, "google translate" button will help a lot too I think.

» Wed Nov 28, 2012 7:34 am

Whoops, I think I may have added an unintended 'de' to my previous post. What I meant to say was that the CK will always re-embed strings on saving. One would have use the editor's command line to export the string data and compile that into a string table, after each save.

Hmm, I had assumed that delocalisation was fairly straightforward. Well, it shouldn't be an issue, since this will be dealing with strings after localisation. The advanced mode will probably be a requirement though.

If you're curious, the way Bash goes about looking for Strings files:

Look for loose files with the appropriate name:
```
_.STRINGS	_.DLSTRINGS	_.ILSTRINGS
```
Wrye Bash reads the Skryim.ini for 'language' but obviously that's not appropriate in this case.
If it's not there, look inside any BSA associated with that plugin (.bsa). Note: That reminds me, I should check to see what all BSA names will automatically be loaded for a given plugin...
Check all the registered BSAs - read Skyrim.ini, the [Archive] section, 'sResourceArchiveList' and 'sResourceArchiveList2', generate a list of all the BSAs, then search in reverse order.

Anyway, you probably have most stuff figured out already, think you're looking for UI suggestions, which I don't have right now.

Looking for strings files won't be an issue, they'll be picked by the user. I'd forgotten about BSAs though. Well, I can add support for looking inside them after I get loose string file reading done. Thanks for the reminder.

Exactly what TES5Edit does when translating. I also tried to use http://en.wikipedia.org/wiki/Levenshtein_distance instead of exact match to achieve a better translations, and results were good indeed, but it was taking too much time to translate even a small plugin, so I removed that code.
About strings IDs, in TES5Edit they are autoincremented indexes of lstrings occurence in a plugin when traversing records/subrecords starting from TES4 header. Don't know the way CK generates them though.
So matching on ID is a bad idea, string matching is much more consistent but still not the best. You should really try that levenstein or other similar method, this will work very good in StrEdit since the amount of data is much less when only 2 strings files loaded and compared. You can add an option with "match percentage" for comparison, "google translate" button will help a lot too I think.

I'm not entirely sure we're on the same page, so just to be clear, here are the usage cases I've been considering:

A user has A.esp and a corresponding A_English.STRINGS file, but they'd like a French translation. They then open A_English.STRINGS in StrEdit and translate the strings inside, then save the translation as A_French.STRINGS. All that's changed is the text, the IDs must match between the files because otherwise the plugin will load the wrong strings for each record, or fail to find strings.

If the user doesn't translate all the strings in one go, but decides to finish the translation off later, they then load both files in StrEdit - it matches the original and translation using their associated IDs, so that the user can see which strings they've already translated. They can then continue from where they left off and save the updated translation file.

The author of A.esp then releases an update, and the user finds that his translation no longer works with the update because the IDs have changed, or more things have been added so the translation is now incomplete again. The user then opens 3 files in StrEdit: their translation file, the strings file for the new version and the strings file for the old version. StrEdit uses IDs to match the translated strings to the old untranslated strings, then uses the strings themselves to match the old untranslated strings to the new untranslated strings, allowing it to match the translated strings to the new IDs.

The new and old strings might differ slightly, so I might use Levenstein matching - I hadn't considered that possibility, so thanks.

» Wed Nov 28, 2012 4:29 am

The user then opens 3 files in StrEdit: their translation file, the strings file for the new version and the strings file for the old version. StrEdit uses IDs to match the translated strings to the old untranslated strings, then uses the strings themselves to match the old untranslated strings to the new untranslated strings, allowing it to match the translated strings to the new IDs.
The new and old strings might differ slightly, so I might use Levenstein matching - I hadn't considered that possibility, so thanks.

We are talking just about the same. That exactly what happens in TES5Edit, when translating it has 2 sets of strings files from previous plugin version (and/or other plugins to increase vocabulary like skyrim.esm strings) and a new list of strings from plugin being localized. But exact matching won't always work in this case, levenstein will do a better job if you'll manage to make it work fast. Go for it :biggrin:

» Wed Nov 28, 2012 1:54 am

Thanks, yet again, for your spirit of initiative wrinklyninja. You have all my support! I'll post later, if I do find anything worth suggesting!

» Tue Nov 27, 2012 9:17 pm

OK, I've decided to do this, and I've created a repository for the code (link in OP). It won't compile yet, but I think I've identified all the functions/structures I'll need.

» Wed Nov 28, 2012 7:48 am

https://www.dropbox.com/s/1qjiei2buhet9bo/Screenshot%20from%202012-11-24%2021%3A06%3A05.png

...sort of. The ugliness is because I'm running it in Wine, but aside from the list building and clicking on rows, nothing is implemented yet.

I need to find a fast Levenshtein implementation in C++, but aside from that it's only UI work that needs doing.

» Wed Nov 28, 2012 6:50 am

Amazing work.

» Wed Nov 28, 2012 4:53 am

Still To Do:

- Implement Levenshtein distance calculation (there must be an existing library I can use somewhere).
- "Open file..." UI.
- Documentation.
- Keyboard navigation.
- Support for translations of StrEdit.

However, I have a question: ATM when edits are saved, the resulting strings file is a complete copy of the original strings file with the edited strings changed. Would it be better if the resulting strings file only contained the edited strings, or should the user be able to choose?

I might just get this done today...

» Tue Nov 27, 2012 8:05 pm

However, I have a question: ATM when edits are saved, the resulting strings file is a complete copy of the original strings file with the edited strings changed. Would it be better if the resulting strings file only contained the edited strings, or should the user be able to choose?

For translation purpose I would say that a complete copy of the original strings file with the edited strings changed is what I expect.

I however have 2 suggestions that may be very convenient for huge mods to translate (such as the USKP) :

1) Rather than making innumerable texts "copy/paste" would it be possible to make just a simple drag 'n drop feature ? (with eventually an undo/redo feature for the x few past modifications)

2) When porting the translated USKP (yes, him again) from a version to its next update it may be necessary, convenient and safe to implement a "Strings ID correspondance" function. I give more details :

- Let's say I have the USKP v1.2.4 fully translated to french. Thanks to TES5Edit all Strings are reembeded in the esp, no more Strings files, and the esp is fully functional. It is however possible to regenerate them with TES5Edit if necessary.

- Let's say Arthmoor releases the USKP v1.2.5 in two weeks. In order to translate it to french it may be extremely convenient to re-use the strings from the translated USKP v1.2.4 but, as I said on the TES5Edit thread, that would be safe and possible only if the Strings ID are identical from one version to the next. As the USKP v1.2.5 may contain new edits (and thus new english strings) these new Strings ID may mess with the USKP v1.2.4 ones. As Zilav stated, the Strings ID are generated by an incremental feature, but from what ? What is the origin ? How is the first USKP-proper additionnal String-ID generated ? Is it the next available after the ones present in Skyrim vanilla ?
And let's say a specific String ID generated for USKP v1.2.5 points to a weapon name. As USKP v1.2.5 may contain new edits what is the guarantee that the corresponding String ID from the translated USKP v1.2.4 doesn't point to (let's say) an ingredient name ?

I will double-post this report on the TES5Edit thread.

» Wed Nov 28, 2012 7:44 am

For translation purpose I would say that a complete copy of the original strings file with the edited strings changed is what I expect.

I however have 2 suggestions that may be very convenient for huge mods to translate (such as the USKP) :

1) Rather than making innumerable texts "copy/paste" would it be possible to make just a simple drag 'n drop feature ? (with eventually an undo/redo feature for the x few past modifications)

2) When porting the translated USKP (yes, him again) from a version to its next update it may be necessary, convenient and safe to implement a "Strings ID correspondance" function. I give more details :

- Let's say I have the USKP v1.2.4 fully translated to french. Thanks to TES5Edit all Strings are reembeded in the esp, no more Strings files, and the esp is fully functional. It is however possible to regenerate them with TES5Edit if necessary.

- Let's say Arthmoor releases the USKP v1.2.5 in two weeks. In order to translate it to french it may be extremely convenient to re-use the strings from the translated USKP v1.2.4 but, as I said on the TES5Edit thread, that would be safe and possible only if the Strings ID are identical from one version to the next. As the USKP v1.2.5 may contain new edits (and thus new english strings) these new Strings ID may mess with the USKP v1.2.4 ones. As Zilav stated, the Strings ID are generated by an incremental feature, but from what ? What is the origin ? How is the first USKP-proper additionnal String-ID generated ? Is it the next available after the ones present in Skyrim vanilla ?
And let's say a specific String ID generated for USKP v1.2.5 points to a weapon name. As USKP v1.2.5 may contain new edits what is the guarantee that the corresponding String ID from the translated USKP v1.2.4 doesn't point to (let's say) an ingredient name ?

I will double-post this report on the TES5Edit thread.

1) Not sure what you're referring to with the innumerable copy/pastes - why would you be doing any copy/pastes?

2) If you have the original strings file from 1.2.4, your translated strings file from 1.2.4, and the 1.2.5 strings file, it is possible to match up your translation to the strings in the 1.2.5 strings file via a three-way merge. The OP touches on the method. It might not always get an exact or correct match, but most of the time it should be right, and that's a lot better than having to manually match everything up.

» Wed Nov 28, 2012 3:39 am

- Implement Levenshtein distance calculation (there must be an existing library I can use somewhere).

I found these two links ( http://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm | http://cplus.about.com/od/programmingchallenges/a/Programming-Challenge-39-Calculate-Levenshtein-Distance.htm ) last year on the Levenshtein distance while I was doing a bit of research (somewhat) related to a C++ class I was taking at the time. No library or anything, but they might help.

» Tue Nov 27, 2012 7:50 pm

I need to find a fast Levenshtein implementation in C++, but aside from that it's only UI work that needs doing.

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
If it won't work fast enough, you can try to implement http://thedigitalstandard.blogspot.ru/2009/11/why-fuzzy-hashing-is-really-cool.html (http://flamingo.ics.uci.edu/releases/4.1/src/partenum/ and http://asterix.ics.uci.edu/fuzzyjoin/ too)

» Wed Nov 28, 2012 1:00 am

I found these two links ( http://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm | http://cplus.about.com/od/programmingchallenges/a/Programming-Challenge-39-Calculate-Levenshtein-Distance.htm ) last year on the Levenshtein distance while I was doing a bit of research (somewhat) related to a C++ class I was taking at the time. No library or anything, but they might help.

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
If it won't work fast enough, you can try to implement http://thedigitalstandard.blogspot.ru/2009/11/why-fuzzy-hashing-is-really-cool.html (http://flamingo.ics.uci.edu/releases/4.1/src/partenum/ and http://asterix.ics.uci.edu/fuzzyjoin/ too)

Thanks for those - I've used the implementation in the wikibook for now, I'll test them all out when I start doing some proper testing.

To Do:
- Documentation.
- Keyboard navigation.
- Support for translations of StrEdit.
- Testing.

» Tue Nov 27, 2012 6:57 pm

I think that I understood what http://www.gamesas.com/user/799441-nico2137/ is trying to say about "copy/paste" problem, and I think Wrinkly has relieved our concerns by writing what he wrote here:

2) If you have the original strings file from 1.2.4, your translated strings file from 1.2.4, and the 1.2.5 strings file, it is possible to match up your translation to the strings in the 1.2.5 strings file via a three-way merge. The OP touches on the method. It might not always get an exact or correct match, but most of the time it should be right, and that's a lot better than having to manually match everything up.

. By the way, here is a general list of what I am expecting to be able to do by using strEdit:

- Being able to extract english strings from any mod that needs a translation, and then translate every string into our language by using a simple interface, which should include at least basic functions (such as cut, copy, paste, undo/redo..) and allow us to write as many characters as we want, because it could be very useful when translating quests.
- Being able to load translated strings from a previous version of the mod and copying their contents over the strings from the new mod file (be it an update, an optional plugin, whatever.), so that Strings IDs match, and I can be sure, in most cases, that the program has not translated the name of a weapon with the name of an armor, for example. To further make sure that this doesn't happens, could you implement a sort of "side by side strings comparison" function? I mean, could I be able to make a comparison, side by side, between an english and an italian string, both taken from, for example, respectively, MOD_English.DLStrings and MOD_Italian.DLStrings? In this way, I can check whether every new string is correctly translated or not.
- Being able to AUTOSORT strings : by name(the initial letter in each string text) or by strings IDs.

After having completed our translation and having embedded translated strings into new mod esp/esm , we should use TES5EDIT in order to check for errors that the file may contain after our translation process.

This is what I need from strEdit. I think that's enough, or at least, at the moment ahah! Pardon me if what I wrote sounds repetitive or it is already stated in the OP.

» Wed Nov 28, 2012 4:21 am

Well, perhaps what I've written so far in the https://dl.dropbox.com/u/17043363/StrEdit%20Readme.html will shed some light on what StrEdit can do.

Of what you've written, I think that the only things that aren't implemented are:

- undo/redo, though you can undo/redo while editing a single string (but can't undo through the strings you've recently changed). I don't see the point of adding this, because if you mess up a translation and move on to the next one, you just move back to the one you messed up and fix the translation. It's not going to save you any time or effort undoing what you just did instead. Undo/redo also happens to be one of those things that seems trivial at first, but really isn't once you start thinking about it.

- autosort: the list is sorted in the following order: all untranslated strings, followed by all inexact matches, followed by all other strings. Within each block, the strings are sorted in the alphabetical order of their original text.

String IDs are meaningless to the end-user, they're only useful to Skyrim itself and to programmers trying to match things up like Skyrim does. A translator has no business with string IDs. (I have them in the display for debugging purposes, but they'll likely be removed once I have verified it's working correctly).

As for sorting in any order other than the one I gave above, please provide a motivation for doing so, and the orderings which would be useful. Note that there is a filter box which can be used to filter for a subset of strings, in case finding strings easier is why you want the autosort.

» Wed Nov 28, 2012 9:02 am

I talked about an autosorting feature by StringsID because, as it happens when using TES5EDIT, you have no way to see whether that string, which describes an effect, talks about a spell, an armor etc. You can just search for additional information in the readme or in mod's page, and sometimes you don't find anything about that effect. Knowing if that effect belongs to a spell or an armor is really useful, at least for translators, as we can translate text in different ways. Anyway, this shouldn't be a problem, since you have implemented that search function, which doesn't exist in TES5EDIT. About any other kind of autosort, you are right, I do not need any other than the one you are going to implement, as I didn't know about the search function!

» Tue Nov 27, 2012 8:14 pm

I talked about an autosorting feature by StringsID because, as it happens when using TES5EDIT, you have no way to see whether that string, which describes an effect, talks about a spell, an armor etc. You can just search for additional information in the readme or in mod's page, and sometimes you don't find anything about that effect. Knowing if that effect belongs to a spell or an armor is really useful, at least for translators, as we can translate text in different ways. Anyway, this shouldn't be a problem, since you have implemented that search function, which doesn't exist in TES5EDIT. About any other kind of autosort, you are right, I do not need any other than the one you are going to implement, as I didn't know about the search function!

Ah, OK - you were wanting contextual information, and using the ID to look that up in the plugin itself. It will probably still be a problem, but there's no better solution available yet. Perhaps in the future it could optionally scan a given plugin and display the record type a string is attached to.

» Wed Nov 28, 2012 1:14 am

StrEdit v0.1 released!

OP updated with a link to it. I hope people will find it useful, feedback would be appreciated.

It's v0.1 because I think it works, but I can't say that I'm absolutely sure there are no issues. If you use it, be sure to keep a backup of whatever you're working on and check to see that it hasn't broken anything.

» Tue Nov 27, 2012 8:22 pm

This looks great! :goodjob:

I'll have a try when I have enough time, have been testing wrye bash UAC supports recently. :wink: