Post subject: archive upload from the command line
Player (66)
Joined: 4/21/2011
Posts: 232
This is a work in progress. I can create an archive page with this, but it is a "data" entry in community_texts.
curl --header "authorization: LOW access:secret" ^
     -i -X PUT http://s3.us.archive.org/lethalenforcers-tas-phil/
I uploaded the files with this, but it didn't update the metadata.
curl --location ^
     --header "authorization: LOW access:secret" ^
     --header 'x-archive-meta01-collection:speed_runs' ^
     --header 'x-archive-meta-mediatype:movies' ^
     --header 'x-archive-queue-derive:0' ^
     --upload-file lethalenforcers-tas-phil.mkv ^
     http://s3.us.archive.org/lethalenforcers-tas-phil/lethalenforcers-tas-phil.mkv
This is the flag that should create the page and upload at the same time. I tried it a few different ways before resorting to PUT, but I couldn't get it to work.
--header 'x-amz-auto-make-bucket:1'
This is the flag that should enable changes of metadata, but I couldn't get it working either.
--header 'x-archive-ignore-preexisting-bucket:1'
No progress bar, which is frustrating.
Editor, Emulator Coder, Site Developer
Joined: 5/11/2011
Posts: 1108
Location: Murka
I think this is a very useful thing to pursue. I don't have any other useful input, though ><
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
In advance, will it be faster than manual uploading? Do you still need to enter filenames (which are different each time)? How to pick the Item name from the submission page automatically?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Player (66)
Joined: 4/21/2011
Posts: 232
feos wrote:
In advance, will it be faster than manual uploading?
Probably no change in kbps, but with a programatic interface you could start this upload as soon as the first encode is done and it will likely finish before the next encode.
How to pick the Item name from the submission page automatically?
It is part of the url. By PUT-ing to http://s3.us.archive.org/lethalenforcers-tas-phil/ I created an item called lethalenforcers-tas-phil. The title can be set separately.
Do you still need to enter filenames (which are different each time)?
Hopefully you'd only enter it once (or maybe scrape it from the #S page) P.S. this is a python script I made a wile ago, that can go from a movie number to the author's nickname.
Language: python

import requests from bs4 import BeautifulSoup import re movie_number = 22 movie_url = "http://tasvideos.org/" + str(movie_number) + "M.html" movie_page_src = BeautifulSoup(requests.get(movie_url).text) submission_tag = movie_page_src.find("a", text=re.compile("^Submission #[0-9]+$")) submission_url = "http://tasvideos.org/" + submission_tag.get('href') submission_page_src = BeautifulSoup(requests.get(submission_url).text) nickname_tag = submission_page_src.find("th", text=re.compile("^Author's nickname: $")) nickname = nickname_tag.next_sibling.text print(nickname)
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
Cool, though we still can't name encodes automatically, because only a human can come up with their names, following the guidelines. But yeah, the Item header must be autocopypastable from the submission title (or just equal to the encode names?)
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Player (66)
Joined: 4/21/2011
Posts: 232
The encode name is unique enough for archive. Letting archive convert the title to the item name makes for some long and ugly urls.
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
On the side note, what to do with cross-platform TASes of the same title by the same author? They're allowed now, but how to name encodes for, say, SNES Ghouls'n'Ghosts by Nach and Arcade Ghouls'n'Ghosts by Nach? natt?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Emulator Coder, Skilled player (1114)
Joined: 5/1/2010
Posts: 1217
nanogyth wrote:
P.S. this is a python script I made a wile ago, that can go from a movie number to the author's nickname.
I presume Python has JSON decoder. If so, maybe use things like http://tasvideos.org/subinfo/2000M.json and http://tasvideos.org/subinfo/3500S.json. Those things are designed to be machine-parseable. The submission id is the second element of the id field in xxxxM.json files. The submission author nickname is the first element of the player field in xxxxS.json files. Also, until the problem of it uploading to community_texts is resolved, please don't use it. opensource_movies, even if it is wrong category is considerably less problematic (of course, speed_runs is perferred).
Player (66)
Joined: 4/21/2011
Posts: 232
The ftp version seems to be more well behaved. (docs) The first curl is used to upload each of the files. The second curl gets archive.org to process them.
set "curl=C:\mf\NGcode\lib\curl.exe"
set "user=jstout.physics@gmail.com"
set "pass=12345"
set "item=deathduel-tas-trask"
set "file1=%item%_files.xml"
set "file2=%item%_meta.xml"
set "file3=%item%.mkv"
set "file4=%item%_10bit444.mkv"
set "file5=%item%_512kb.mp4"

%curl% -v -T %file1% --user %user%:%pass% ^
       --ftp-create-dirs ^
       ftp://items-uploads.archive.org/%item%/

pause

%curl% -v "http://archive.org/services/contrib-submit.php?user_email=%user%&server=items-uploads.archive.org&dir=%item%"

pause
deathduel-tas-trask_meta.xml
Language: xml

<metadata> <mediatype>movies</mediatype> <collection>opensource_movies</collection> <title>Genesis Death Duel (USA) in 08:24.73 by Trask</title> <identifier>deathduel-tas-trask</identifier> <creator>Trask</creator> <rerecord_count>1048</rerecord_count> <subject>Tool-Assisted Speedrun; Genesis; Death Duel; Trask</subject> <runtime>10 minutes</runtime> <description> ''Death Duel'' is a side-scrolling first person shooter where the player controls a mech versus a set of nine other mechs. There is a bonus stage after each battle where the player must qualify for the next battle. The player has to purchase repairs, weapons and ammunition with the credits that are earned for destroying the enemy body parts and completing the rounds quickly. In this run, Trask takes out the enemies mostly through well placed missiles and the more expensive homing missiles. At one point a missile is used to punch through a wall to open up a path for more missiles. </description> <contact> This is a tool-assisted speedrun. For more information visit http://tasvideos.org/1291S.html </contact> </metadata>
deathduel-tas-trask_files.xml
Language: xml

<files> <file name="deathduel-tas-trask.mkv"> <format>Matroska</format> </file> <file name="deathduel-tas-trask_10bit444.mkv"> <format>Matroska</format> </file> <file name="deathduel-tas-trask_512kb.mp4"> <format>512Kb MPEG4</format> </file> </files>
If this doesn't prevent the auto derived files, then simply "<files/>" would work as well.
Emulator Coder, Skilled player (1114)
Joined: 5/1/2010
Posts: 1217
nanogyth wrote:
deathduel-tas-trask_files.xml
Language: xml

<files> <file name="deathduel-tas-trask.mkv"> <format>Matroska</format> </file> <file name="deathduel-tas-trask_10bit444.mkv"> <format>Matroska</format> </file> <file name="deathduel-tas-trask_512kb.mp4"> <format>512Kb MPEG4</format> </file> </files>
If this doesn't prevent the auto derived files, then simply "<files/>" would work as well.
As far as I can tell from the docs, derivations are disabled using file called "_rules.conf" or something like that (documented on the deriver page).
Player (66)
Joined: 4/21/2011
Posts: 232
Then the other question is whether disabling derivatives is beneficial. Deriving from both the primary and the 10bit is a waste, but it might be worthwhile to let archive make one set of derivatives. Next thing I'll try is making the archive with the ftp upload and the 512. Then use the s3 upload for the primary and 10bit with the no derive flag set on the primary.
Player (66)
Joined: 4/21/2011
Posts: 232
feos wrote:
On the side note, what to do with cross-platform TASes of the same title by the same author? They're allowed now, but how to name encodes for, say, SNES Ghouls'n'Ghosts by Nach and Arcade Ghouls'n'Ghosts by Nach?
We could refer to the archive item by its submission number: archive.org/download/TASVideos-181/jackiechankungfu-tasv2-jeffc.mkv The old way is ugly: archive.org/download/NesJackieChansActionKungFuusaIn1738.25ByArc/jackiechankungfu-tasv2-jeffc.mkv What I've been doing recently is redundant and possible name clashes: archive.org/download/jackiechankungfu-tasv2-jeffc/jackiechankungfu-tasv2-jeffc.mkv Here is the python I've been working on recently. Turns the submission number into a _meta.xml file.
Language: python

import requests import json from lxml import objectify, etree import re sub_number = 2136 #1582 #1319 #3772 sub_url = "http://tasvideos.org/subinfo/" + str(sub_number) + "S.json" data = json.loads(requests.get(sub_url).text) console = data['system'][0] console_long = data['system'][1] game = data['game'][0] branch = data['game'][1] version = data['game'][2] author = data['player'][0] rerecord = data['movie'][0][1] secs = data['movie'][0][2] hours, minutes, seconds = int(secs//3600), int(secs%3600//60), secs%60 if hours > 1: time_form = '{0}:{1:02d}:{2:05.2f}' runt_form = '{0} hours {1} minutes' elif hours > 0: time_form = '{0}:{1:02d}:{2:05.2f}' runt_form = '1 hour {1} minutes' elif minutes == 1: time_form = '{1:02d}:{2:05.2f}' runt_form = '1 minute {2} seconds' else: time_form = '{1:02d}:{2:05.2f}' runt_form = '{1} minutes {2} seconds' time = time_form.format(hours, minutes, seconds) if branch: title_form = '{0} {1} ({2}) "{3}" in {4} by {5}' ident_form = '{0}-tas-{1}-{2}' else: title_form = '{0} {1} ({2}) in {4} by {5}' ident_form = '{0}-tas-{2}' multi_author = re.sub(', and |, | & ', '_', author) raw_ident = ident_form.format(game, branch, multi_author) E = objectify.ElementMaker(annotate=False) meta = E.metadata( E.mediatype('movies'), E.collection('opensource_movies'), (E.title(title_form. format(console, game, version, branch, time, author))), E.identifier(re.sub('[^a-z0-9._-]', '', raw_ident.lower())), E.creator(author), E.rerecord_count(str(rerecord)), (E.subject('Tool-Assisted Speedrun; {}; {}; {}'. format(console_long, game, author))), E.runtime(runt_form.format(hours, minutes, int(seconds))), E.description('This is a tool-assisted speedrun.'), (E.contact('For more information visit http://tasvideos.org/{}S.html'. format(sub_number))) ) print(etree.tostring(meta, pretty_print=True).decode("utf-8")) #with open(identifier + '_meta.xml', 'w') as f: # f.write(etree.tostring(meta, pretty_print=True).decode("utf-8"))