Samples a specific project and creates a separate sample per supplier. Use cases:
- Evaluate the quality of work done in a project and per supplier
- Evaluate supplier translation quality
- Evaluate supplier machine translation post editing quality
- Many more
How it works:
The system starts with analyzing all the jobs (see filters below) in the project. This gives the list of the samples to create: One per language combination and assigned supplier.
Now the system attempts to sample segments that were worked on by all the suppliers in the different languages. The system "attempts" to select the same segments for all the suppliers (in different languages).
If the suppliers did not work on the same documents, then the samples will of course not share the same segments.
URL
(POST) /resources/segments/randomsample/new
PARAMETERS
The parameters are a JSON object included in the request body:
...
Optional list of target languages. Filters jobs with this target language. If not specified, the system will sample jobs (and their suppliers) in any source language.
Note: If you specify trgs you must also specify src.
...
suppliers
Optional array of suppliers to sample. To restrict sampling to these suppliers only. Each array element is:
- cid: The supplier company id. See companies/list to enumerate companies.
- pid: The supplier person id (internal suppliers only)
...
The expected sample size. Default is 10.
This must be a value between 1 and 50.
...
Optionally specify the segments' fields to include in the results. This is done using a layout JSON object.
If not specified, the system will include:
- Segment level properties such as all IDs, custom fields and labels
- Source language text, flags, custom fields and labels
- Source text related comments
- Target language text, flags, custom fields and labels
- Target text related comments
- Target text revisions.
...
Optional boolean. Default is false.
Only set to true if required. If true, then the results are temporarily saved and assigned a token (see sampletoken in results).
You need this token when using the QA workflow API methods in order to create a workflow/jobs for the sample.
...
Optional boolean. Default is true. If true then the returned JSON includes the result node. Otherwise only the summary statistics are returned.
If you further process results using the sampletoken you may not need the results with this call.
...
...
You can further fine tune the sample with these additional parameters:
...
Optional filter on the initial translation done. Values are:
- Any: No filter. Equivalent to dropping property.
- MachinePretranslation: The initial translation was a machine translation. This permits to get a sample of post edits.
- MemoryPretranslation: The initial translation was from a translation memory or a previous document version. This permits to get a sample of post edits.
- NoPretranslation: The initial translation is not a pretranslation (machine, memory or previous document version)
- Human: The initial translation was imported from XLIFF or other file formats and marked as human translated by the respective file filter.
...
Optional filter on the current translation. Values are:
- Any: No filter. Equivalent to dropping property.
- MachinePretranslation: The current version of the translation is a machine translation and was never post edited by a human. This filter permits to verify that the machine translation is indeed of sufficient quality and did not require correction.
- MemoryPretranslation: The current version of the translation is memory pretranslation and was never post edited by a human. This filter permits to verify that the leveraged translation is indeed of sufficient quality and did not require correction.
- NoPretranslation: The current version of the translation is not a pretranslation (and thus either a human translation or an automatic markup fix)
- Human: The current version of the translation was done by a human.
...
Optional filter on the date of last translation edit. If set, the sample will include translations edited at or after this date only.
...
boostWordsMin
boostWordsMax
This option lets you express a preferred word count of the segments to retain. The sample will then contain segments with similar word count at a higher probability than segments with less or more words (of the source text - not the translated text!).
- boostWordsMin: The minimum preferred number of words in the segment.
- boostWordsMax: The maximum preferred number of words in the segment. Optional.
Explanation:
If min is 10 and max is 15, the system will sample more segments with words in the range than other segments. Mathematically, the decrease of probability below min and above max is a Gaussian whereby the probability drops to below 0.2 beyond a certain range beyond the limits (between 3 words and twice the range width).
...
Optional, int?
int?
RESULTS
A JSON with these properties:
...
An array of samples. There is one sample per target language and per supplier.
Example: Your project has source language English and target languages French and German. You want to create samples for translation work (tsk is "TR"). If you have assigned each target language to one supplier, then you obtain 2 samples: One for translation English/French/Supplier 1 and one for translation English/German/Supplier2.
...
If assigntoken was set to true, then this field is a token. It is required to push the sample into a QA evaluation workflow (see related API methods).
...
Each samples array element has these properties:
...
Contains all the segments in the sample, information on the resources to which the segments belong as well as worker names.
- The results node is the same as the one returned by method: resources/segments/view/get
- Interesting details can be found in this page: Segment Details (Object)
...
The list of segments.
Includes main segment properties as well as the data columns specified in the layout parameter.
The format is explained further down in this page.
...
A dictionary with all documents that appear in the results.
This permits to show document names and more information per segment (see the did property of a segment).
The format is explained further down in this page.
...
A dictionary with all users/persons that are referenced by the segments included with the results.
A segment references the persons that have last changed a text, a status, a bookmark etc.
The format is explained further down in this page.
...
ACCESS RIGHTS
The user must be authorized to access the project and all its content.
EXAMPLE
Here we want to sample translations done in a project. We expect one sample per supplier and language combination. If your project has 3 target languages and 1 translator for each language then we will obtain 3 samples. The condition here is that a segment shall only be retained if all translators were involved: The first for the first target language, the second for the second and the third for the third. If a segment was translated into just one target language, it is excluded. The idea is to sample the exact same segments for all the suppliers/languages. Given this condition, the system may not be able to sample your project if your suppliers did not work on the same segments.
Code Block |
---|
POST /resources/segments/randomsample/new
BODY:
{
"type": "Project",
"pid": 1678,
"tsk": "TR",
"size": 2
} |
The method returns the requested sample. To keep the json below small, we removed the sample details in the results node.
The system identified translation jobs for 2 target languages and 2 suppliers. It thus created 2 samples.
Code Block |
---|
Warning |
This API method is for multilingual projects, where the documents to translate are all assigned to the same suppliers (e.g. one per each target language). If your project does not follow this simple assignment pattern, the method may not produce the results you are looking for. In that case you are better off with New sample: Internal or external supplier. The method lets you sample translators, revisers or other workflow steps. |
How it works:
The system starts with analyzing all the jobs (see filters below) in the project. This gives a list of all suppliers to sample as well as the different language combinations.
The system now attempts to find segments that were worked on by ALL the suppliers. If there is no such segment, the system will remove one supplier from the list and try again (this is done in a loop). If there is still no "shared" segment available, you will receive an error message. The latter may happen for example if a project has 2 documents and each is translated into just 1 target language: There is then no segment shared by the 2 target languages.
If the suppliers did not work on the same documents, then the samples will of course not share the same segments.
If shared segments are found, the system returns a list of samples. One per supplier and language combination. Each of these samples contain the same segments.
URL
(POST) /resources/segments/sampling/new
PARAMETERS
The parameters are a JSON object included in the request body:
type | Value must be: Project | Mandatory, string |
pid | The project id. See projects/list to enumerate or find projects. | Mandatory, int |
Filter task type and languages | ||
tsk | The task code such as "TR", "RV" etc. Only such jobs are considered. | Mandatory, string |
src | Optional source language. Filters jobs with this source language. If not specified, the system will sample jobs (and their suppliers) in any source language. | Optional, string |
trgs | Optional list of target languages. Filters jobs with this target language. If not specified, the system will sample jobs (and their suppliers) in any project target language. Note: If you specify trgs you must also specify src. | Optional, string[]? |
jobsdone | Optional boolean. Default is true. If true, sampling considers completed supplier jobs only. | Optional, bool? |
Filter suppliers | ||
suppliers | Optional array of suppliers to sample. To restrict sampling to these suppliers only. Each array element is:
| Optional, object[]? |
size | The expected sample size. Default is 10. This must be a value between 1 and 50. | Optional, int? |
layout | Optionally specify the segments' fields to include in the results. This is done using a layout JSON object. If not specified, the system will include:
| Optional, object? |
persist | Optional boolean. Default is false. Only set to true if required. If true, then the results are temporarily saved and assigned a token (see sampletoken in results). | Optional, bool? |
includeresults | Optional boolean. Default is true. If true then the returned JSON includes the result node. Otherwise only the summary statistics are returned. If you further process results using the sampletoken you may not need the results with this call. | Optional, bool? |
Excerpt | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
You can further fine tune the sample with these additional parameters:
|
RESULTS
A JSON with these properties:
samples | An array of samples. There is one sample per target language and per supplier. Example: Your project has source language English and target languages French and German. You want to create samples for translation work (tsk is "TR"). If you have assigned each target language to one supplier, then you obtain 2 samples: One for translation English/French/Supplier 1 and one for translation English/German/Supplier2. | int |
sampletoken | If assigntoken was set to true, then this field is a token. It is required to push the sample into a QA evaluation workflow (see related API methods). | string? |
Each samples array element has these properties:
segments | Total segments in sample. Note that this number will be less than the expected sample count if there is no or not enough data or the filter is too restrictive. | int |
words | Total source text words in sample. | int |
tsk | The task code such as "TR", "RV" etc. of the sample. | string |
src | The source language of the sample | string |
trg | The target language of the sample | string |
cid | The supplier company id | int |
uid | The internal supplier person id | int? |
dsid | The resource id. This is by definition the project memory. | int |
pid | The project id | int |
info | Usually null unless no segments could be sampled for this specific supplier and languages. A typical message here would be:
| string? |
result | Contains all the segments in the sample, information on the resources to which the segments belong as well as worker names.
| object[] |
result.rows | The list of segments. Includes main segment properties as well as the data columns specified in the layout parameter. The format is explained further down in this page. | object[] |
result.docs | A dictionary with all documents that appear in the results. This permits to show document names and more information per segment (see the did property of a segment). The format is explained further down in this page. | object |
result.users | A dictionary with all users/persons that are referenced by the segments included with the results. A segment references the persons that have last changed a text, a status, a bookmark etc. The format is explained further down in this page. | object |
columns | An array with the columns in the result.rows property. Each array element describes one column, see here: Spreadsheet Column (Object) | object[] |
ACCESS RIGHTS
The user must be authorized to access the project and all its content.
EXAMPLE 1
Here we want to sample translations done in a project. We expect one sample per supplier and language combination. If your project has 3 target languages and 1 translator for each language then we will obtain 3 samples. The condition here is that a segment shall only be retained if all translators were involved: The first for the first target language, the second for the second and the third for the third. If a segment was translated into just one target language, it is excluded. The idea is to sample the exact same segments for all the suppliers/languages. Given this condition, the system may not be able to sample your project if your suppliers did not work on the same segments.
Code Block |
---|
POST /resources/segments/randomsample/new
BODY:
{
"type": "Project",
"pid": 1678,
"tsk": "TR",
"trgs": ["es"],
"size": 2
} |
The method returns the requested sample. To keep the json below small, we removed the sample details in the results node.
The system identified translation jobs for 2 target languages and 2 suppliers. It thus created 2 samples.
Code Block |
---|
{ "samples": [ { "segments": 2, "words": 50, "src": "en", "trg": "es", "tsk": "TR", "cid": 102, "uid": null, "dsid": 1849, "pid": 1678, "result": { "rows": [], "docs": {}, "nameusers": "Anglais",{} }, "loccolumns": "en",[ { "loc_rtl": false, "loc_cmplx"index": false0, "loc_eafkey": false"1~en~0", }, "fkeyLayout": "1~en~0", { "index": 1, "fkey": "1~es~0", "fkeyLayout": "1~es~0", ""ftype": 1, "fqualifier": 0, "name": "EspagnolAnglais", "loc": "esen", "loc_rtl": false, "loc_cmplx": false, "loc_ea": false }, { "index": 21, "fkey": "12~es~01~es~0", "fkeyLayout": "12~es~01~es~0", "ftype": 121, "fqualifier": 0, "name": "Revisions - Espagnol", "loc": "es", "loc_rtl": false, "loc_cmplx": false, "loc_ea": false }, { "index": 32, "fkey": "9~en~012~es~0", "fkeyLayout": "9~en~012~es~0", "ftype": 912, "fqualifier": 0, "name": "CommentsRevisions - AnglaisEspagnol", "loc": "enes", "loc_rtl": false, "loc_cmplx": false, "loc_ea": false }, { "index": 43, "fkey": "9~es~09~en~0", "fkeyLayout": "9~es~09~en~0", "ftype": 9, "fqualifier": 0, "name": "Comments - EspagnolAnglais", "loc": "esen", "loc_rtl": false, "loc_cmplx": false, "loc_ea": false }, ] { }, { "index": 4, "segments": 2, "wordsfkey": 50"9~es~0", "srcfkeyLayout": "en9~es~0", "trgftype": "fr"9, "tskfqualifier": "TR"0, "cidname": 75,"Comments - Espagnol", "uid": null, "dsidloc": 1849,"es", "pidloc_rtl": 1678false, "result "loc_cmplx": false, { "rowsloc_ea": [],false "docs": {}, ] "users": {} }, },{ "columnssegments": [2, "words": 50, { "indexsrc": 0"en", "fkeytrg": "1~en~0fr", "fkeyLayouttsk": "1~en~0TR", "ftype""cid": 175, "fqualifieruid": 0null, "namedsid": "Anglais"1849, "locpid": "en"1678, "loc_rtl"result": false, { "loc_cmplxrows": false[], "loc_eadocs": false{}, }"users": {} }, "columns": [ { "index": 10, "fkey": "1~fr~01~en~0", "fkeyLayout": "1~fr~01~en~0", "ftype": 1, "fqualifier": 0, "name": "FrançaisAnglais", "loc": "fren", "loc_rtl": false, "loc_cmplx": false, "loc_ea": false }, { "index": 21, "fkey": "12~fr~01~fr~0", "fkeyLayout": "12~fr~01~fr~0", "ftype": 121, "fqualifier": 0, "name": "Revisions - Français", "loc": "fr", "loc_rtl": false, "loc_cmplx": false, "loc_ea": false }, { "index": 32, "fkey": "9~en~012~fr~0", "fkeyLayout": "9~en~012~fr~0", "ftype": 912, "fqualifier": 0, "name": "CommentsRevisions - AnglaisFrançais", "loc": "enfr", "loc_rtl": false, "loc_cmplx": false, "loc_ea": false }, { "index": 43, "fkey": "9~fr~09~en~0", "fkeyLayout": "9~fr~09~en~0", "ftype": 9, "fqualifier": 0, "name": "Comments - FrançaisAnglais", "loc": "fren", "loc_rtl": false, "loc_cmplx": false, "loc_ea": false }, { ] "index": 4, "fkey": "9~fr~0", }"fkeyLayout": "9~fr~0", ] } |
...
"ftype": 9,
"fqualifier": 0,
"name": "Comments - Français",
"loc": "fr",
"loc_rtl": false,
"loc_cmplx": false,
"loc_ea": false
}
]
}
]
} |
EXAMPLE 2
If you sample work of translators in a project, you may get a result like the following:
Code Block |
---|
{
"error": {
"operation": null,
"date": "2018-08-01T16:41:25.1227825Z",
"title": "The list of candidate segments is empty. A sample cannot be created.",
"details": null,
"errorurl": null,
"messages": []
}
} |
This means that the system was simply not able to find any segment that was worked on by ALL suppliers in the project. For example, if you assign a document to 3 translators in 3 languages, the segments of this document are shared. However if a 4th translator works on a different document, then there is no single segment worked on by all 4 translators. You then get the message above.
To resolve this you would delimit the suppliers in your query (see the suppliers parameter).