Estimating space requirements
For new customers, Wordbee Beebox is a 'Software as a Service' (SaaS) application. This section of the documentation is retained for existing customers who have the application installed on a local server.
Always configure a maximum memory usage limit in the Beebox settings page (coming soon). This significantly optimizes performance by consolidating repetitions and eliminating legacy low quality or unedited MT translations.
The figures below give theoretical worst case memory and disk usage figures. Memory usage is not real-world as you would then apply a maximum memory usage limit. For example if worst case tells you 8 GB RAM requirements, you may put a 4 GB limit if you know that text repetition rates are 30%, that 20% of documents from last year are legacy data and - in any case - you own copies of all translations due to the use of a TMS system. Avoid limits above 32 GB per Beebox server.
Start with estimating the following parameters:
Parameter | Description |
DocsSizeGB | Total size of documents in GB saved to the Beebox "in" directory at any given time. Note that files can be deleted once translated to reduce space requirements. Example: A typical text heavy document of 40 pages may have about 80.000 characters and a file size of 0.5 MB. 1000 such source documents would consume 0.5 GB. |
Langs | Average number of target languages into which files are translated. |
TMSegs | Total translation memory (TM) size in terms of segments: By default all segments extracted from files by the Beebox are saved to the internal translation memory. Deleting a physical file does not delete the contents from the TM. Adding a new version of a document replaces the memory contents (thus does not add new contents). |
TMSegSize | Average characters per source segment. A standard average may be 40 characters. |
TMRepetitionRate | Approximate segment repetition rate. If each source sentence is unique, the repetition rate is 0%. If you do not know the repetition rate, assume a conservative value such as 20%. |
Then estimate worst case space requirements:
Disk space requirements (GB) | = (DocsSizeGB * (Langs + 1)) * 1.2 |
Memory requirements (GB) | = 1 + (TMSegs * TMSegSize * (Langs + 1) * 3) / (1024 * 1024 * 1024) |
Worst case example 1 - Average data volume
At any given time there are 1000 Word documents of about 40 pages each, each 500 KB in size. A document is translated into 2 target languages, at average. 100 new documents are added per month. 100 documents are removed from the Beebox per month.
Disk space requirements (GB) | = (0.5 * 1000 * 3 / 1024) = approximately 1.5 GB |
Memory requirements (GB) | = 1 + (TMSegs * TMSegSize * (Langs + 1) * 3) / (1024 * 1024 * 1024) Each document has 80.000 characters or 2000 segments (assuming 40 characters per segment). Over a period of year1 and 2, a total of 100 * 12 * 2 = 2400 (12 months, 2 years, 100 documents per month) are added to the TM. Thus we obtain TMSegs = 2400 * 2000 = 4 800 000 segments And, TMSegs * TMSegSize * Langs * 4.2 = 1.6 billion bytes or 1.5 GB RAM |
Worst case example 2 - High data volume
At any given time there are 10 000 Word documents of about 40 pages each, each 500 KB in size. A document is translated into 2 target languages, at average. 5 000 new documents are added per month. Old documents are removed from the Beebox server. We assume that there are no repetitions in the documents. The space requirement after 2 years of operation is:
Disk space requirements (GB) | = ((0.5 / 1024) * 5000 * 3) = approximately 7.3 GB |
Memory requirements (GB) | = 1 + (TMSegs * TMSegSize * (Langs + 1) * 3) / (1024 * 1024 * 1024) Each document has 80.000 characters or 2000 segments (assuming 40 characters per segment). Over a period of 2 years, a total of 5000 * 12 * 2 = 120000 (12 months, 2 years, 5000 documents per month) are added to the TM. Thus we obtain TMSegs = 2400 * 2000 = 240 million segments And, TMSegs * TMSegSize * (Langs + 1) * 3 = 84 GB RAM This is a worst case figure if:
|
Copyright Wordbee - Buzzin' Outside the Box since 2008