Sunday, December 04, 2011

Microsoft Translator | Windows Azure Marketplace

Microsoft Translator | Windows Azure Marketplace



"Marketplace for data/services",
powered by Microsoft Azure,
a useful tool: Translate

1tx = 1000 characters
Free: 2000 tx/month, .
$40/month: 4000 tx/month
$60/month: 6000 tx/month
...
There is also: 2 months free trial

currently supported languages

By the way,
Google Translate service
is also not free anymore, and the price is the same: 1 M characters of text: $20.00
Google just does not appear to offer any free options or promotions...
Also, Google's service is still in "labs", that is could change at any time...
It is quite usable, but result translation does require human editing.


There could be some "moral dilemma" here, since machine translation services
are trained on already translated web sites,
that was in many cases very hard and expensive to make.
Also, in many cases Google provides an option to "suggest better translation".

Clearly there is a significant value-add
by machine learning algorithms implemented by Google, Microsoft,
as well as in the real-time service provided,
but there is some value in original "data".
It could be mutually beneficial to have
original data available in a "free form".
Like Wikipedia.

Mentioned services likely have learned
from existing Wikipedia translations.
Services could be made available for further translation of Wikipedia,
and then translation programs can learn more from human editing feedback...

JSON-C YouTube API

Developer's Guide: JSON-C / JavaScript - YouTube APIs and Tools - Google Code

YouTube site has not only a very rich content,
but also a very powerful set of APIs!

Besides very popular and standardized "ATOM" (XML) REST API,
there are also JSON and JSON-C ("C" stands for "Compact", most likely).
While there is no much "talk" about "JSON-C" to be found by googling,
the format does appear to be much more efficient for storage: good engineering, YouTube!

In addition to much smaller size (= less time to process), it makes programming simpler!

Comparing JSON and JSON-C

  • JSON-C feeds can exclude duplicate, irrelevant or easily calculated values.
  • do not preserve XML namespaces or schema information
  • minimize the number of JSON objects that are created in favor of simple properties


    Atom (XML):
    <category scheme='http://gdata.youtube.com/schemas/2007/categories.cat'
      term='Sports' label='Sports'/>
    <category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat' term='dog'/>
    <category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat' term='catch'/>
    
    <category scheme='http://gdata.youtube.com/schemas/2007/keywords.cat' term='frisbee'/>
    ...
    <media:group>
      <media:category label='Sports'
        scheme='http://gdata.youtube.com/schemas/2007/categories.cat'>Sports</media:category>
      <media:keywords>dog, catch, frisbee</media:keywords>
      ...
    
    <media:group>
    
    JSON: "category":[ {"scheme":"http://gdata.youtube.com/schemas/2007/categories.cat", "term":"Sports", "label":"Sports"}, {"scheme":"http://gdata.youtube.com/schemas/2007/keywords.cat", "term":"dog"}, {"scheme":"http://gdata.youtube.com/schemas/2007/keywords.cat", "term":"catch"}, {"scheme":"http://gdata.youtube.com/schemas/2007/keywords.cat", "term":"frisbee"} ], "media$group": { "media$category":[{"$t":"Sports", "label":"Sports", "scheme":"http://gdata.youtube.com/schemas/2007/categories.cat"], "media$keywords":{"$t":"dog, catch, frisbee"} }
    JSON-C: "category":"Sports", "tags":["dog", "catch", "frisbee"]