#archiveteam-bs 2013-06-23,Sun

↑back Search

Time Nickname Message
00:55 🔗 joepie91 http://www.bitcointrezor.com/
00:55 🔗 joepie91 this is kinda awesome
02:10 🔗 godane i grabed this website today: https://archive.org/details/iiipercent.blogspot.com-20130622
02:10 🔗 godane i did so cause he was on glenn beck show yesterday
05:28 🔗 omf_ So I am going to do a podcast episode on ArchiveTeam. The question is what should I talk about? SketchCow has already done great work explaining what AT is and the origin story. I was thinking of tackling the more technical issues
05:36 🔗 xmc sure
05:39 🔗 GLaDOS How about the usual process that happens whenever we hear of a shutdown, and the possible ways that we have for archiving sites (warrior, manual scripts, etc.)
05:39 🔗 omf_ I could literally spend more than one, one hour episode just on crawling
05:40 🔗 GLaDOS Also, how we deal with the resistance that we sometimes come up against
05:40 🔗 GLaDOS xmc: did you get my text I sent on Wednesday?
05:40 🔗 xmc I have no SMSes from you ... ?
05:41 🔗 GLaDOS ..must have an outdated number then.
05:41 🔗 xmc what're the last few digits of the number you sent to?
05:41 🔗 GLaDOS 4499
05:41 🔗 xmc that's a landline :P
05:41 🔗 GLaDOS hurr!
05:41 🔗 xmc :P
05:42 🔗 GLaDOS Well, that's what I get for not knowing how the American phone system assigns numbers.
05:42 🔗 xmc there's no way to tell, actually
05:42 🔗 GLaDOS ah
05:43 🔗 omf_ yeah and you can switch a number from being land line to cell and back again
05:43 🔗 GLaDOS Well that's handy.
05:43 🔗 xmc indeed
05:44 🔗 GLaDOS In 'straya, if a number starts with 04, it's a mobile.
05:46 🔗 arrith1 omf_: could spend a small amount of time doing an overview, but still a technical overview, of each stage in the archiving process
05:55 🔗 omf_ The only problem with an overview is the details I might miss because some of the information is still only in a few brains instead of also being on the wiki
06:02 🔗 arrith1 omf_: iirc best way to go about building a comprehensive thing is doing an outline then filling in each part as detailed as possible
06:06 🔗 GLaDOS Alternatively, http://pad.archivingyoursh.it/p/atpodcast
06:22 🔗 omf_ GLaDOS, and I are fucking banging it out
06:22 🔗 GLaDOS WOO IDEAS
06:22 🔗 omf_ you got deets we need to know, speak now
06:32 🔗 xmc i-motherfucking-deas
06:33 🔗 arrith1 tehy speak english in 'what'
06:36 🔗 arrith1 that's a good outline
07:35 🔗 winr4r DFJustin: by the way, thanks for your additions to the In The Media page :)
07:37 🔗 winr4r any AT wiki admins here?
07:45 🔗 GLaDOS Sup
07:45 🔗 GLaDOS winr4r
07:47 🔗 winr4r GLaDOS: would you link the words "press and discussion" on the main page to [[In The Media]]? :)
07:50 🔗 GLaDOS Done
07:50 🔗 GLaDOS I think we should possibly update the quote.
07:51 🔗 GLaDOS "It sounds like you're holding hands with your userbase on the beach and walking with them into the sunset, when in fact you're choking them to death in the ocean" how about?
07:51 🔗 winr4r "Google is a library or an archive like a supermarket is a food museum."
07:51 🔗 winr4r ^ our jason
07:53 🔗 GLaDOS Hm..
08:00 🔗 arrith1 hahah
08:00 🔗 arrith1 that quote is frame-worthy
08:02 🔗 winr4r jason is very quotable
08:15 🔗 arrith1 indeed
08:30 🔗 arrith1 "
08:30 🔗 arrith1 was just opening a "xul" file in gvim and whispered to myself "only zuul", but then right at the top it says " xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
08:31 🔗 omf_ The first javascript debugger/ide was called venkman
08:32 🔗 arrith1 haha neat
08:34 🔗 omf_ about:robots in firefox make an Asmov and Battlestar Galatica reference
08:37 🔗 arrith1 wow and a futurama reference
08:37 🔗 arrith1 also the day after tomorrow. iirc there was some about:authors thing in fx for a while
08:40 🔗 omf_ just check about:about
08:47 🔗 godane i decide to not mirror the mp3 of www.boilingfrogpost.com
08:47 🔗 godane only so i can get the html dump first cause that would be smaller
08:47 🔗 godane then i can sed the hell out of it for a mp3 list and stuff
08:48 🔗 Cameron_D http://neocities.org/ heh
08:55 🔗 arrith1 Cameron_D: should preemptively maintain a crawl of that i'd think
08:55 🔗 Cameron_D yeah
08:55 🔗 winr4r Cameron_D: yeah, that can only end well
08:55 🔗 Cameron_D it should be somewhat easy, as it lists all the accounts
08:56 🔗 arrith1 Copenhagen Suborbitals Sapphire Launch (rocket launch) somewhere between 6min and like 25min http://copenhagensuborbitals.com/ http://www.youtube.com/watch?v=ADq5sOfq-lI (live stream)
08:58 🔗 winr4r arrith1: stream is frozen for me
08:58 🔗 winr4r weird
08:58 🔗 arrith1 winr4r: just froze for me too. cuts in and out
08:58 🔗 arrith1 their budget is odd
09:01 🔗 winr4r okay, so it just went through T-00.00.00 and then looped around
09:01 🔗 arrith1 haha yeah, last i heard was reset to '18min'
09:03 🔗 winr4r so am i seeing the same thing as you?
09:03 🔗 winr4r guys on a boat
09:03 🔗 winr4r not sure how that's going into orbit
09:03 🔗 omf_ https://www.youtube.com/watch?v=Kjf4RogusJc
09:03 🔗 omf_ completely relevant
09:10 🔗 arrith1 hm the navy must love that song
09:21 🔗 arrith1 Copenhagen Suborbitals - The Open Source, non-profit, and volunteer based space initiative
10:00 🔗 arrith1 2min 30sec
10:02 🔗 arrith1 reset to 10min nvm
10:41 🔗 Smiley hmmm
10:41 🔗 Smiley 2 min until what?
10:41 🔗 Smiley well, 10 min until what?
10:42 🔗 arrith1 Smiley: rocket launch :)
10:42 🔗 arrith1 Copenhagen Suborbitals Sapphire Launch (rocket launch) somewhere between 6min and like 25min http://copenhagensuborbitals.com/ http://www.youtube.com/watch?v=ADq5sOfq-lI (live stream)
10:42 🔗 Smiley Nice
10:42 🔗 arrith1 goin now
10:42 🔗 Smiley This the one which you can have it display a photo?
10:42 🔗 arrith1 1min
10:42 🔗 arrith1 Smiley: not sure
10:42 🔗 arrith1 40s
10:42 🔗 arrith1 youtube livetream
10:42 🔗 Smiley Launching from a ship?
10:42 🔗 arrith1 audio is a bit ahead
10:42 🔗 Smiley yey launch \o./
10:43 🔗 arrith1 yeah, brirish sea
10:43 🔗 Smiley uhoh the timer overflowed :D
10:45 🔗 Smiley anymore to see, ?
10:46 🔗 arrith1 Smiley: i'm not sure.. maybe not
10:46 🔗 arrith1 was a test thing, i guess wasn't going to space
10:47 🔗 arrith1 not sure if cameras are on the rocket, no livestream from them if there are ;/
10:48 🔗 Smiley still very dcool
10:49 🔗 arrith1 yeah :)
11:49 🔗 joepie91 so
11:49 🔗 joepie91 funfair here
11:49 🔗 joepie91 has poptstations
11:49 🔗 joepie91 lol
11:49 🔗 joepie91 popstations *
11:50 🔗 joepie91 http://imgur.com/a/j88jz
12:06 🔗 antomatic Podcast sounds interesting!
14:17 🔗 GLaDOS if anyone wants an x@archivingyoursh.it address, ask me.
15:04 🔗 joepie91 uploading _all_ the funfair videos!
15:04 🔗 joepie91 :P
15:05 🔗 winr4r GLaDOS: hey awesome
15:06 🔗 joepie91 https://www.youtube.com/playlist?list=PLT324XogjOA9rpuZBrX37SC3xwDybH-BT
15:09 🔗 winr4r sup joepie
15:37 🔗 joepie91 hai
16:09 🔗 dashcloud this is pretty cool: http://www.vesalia.de/e_indivisionagamk2.htm
16:26 🔗 joepie91 https://www.youtube.com/watch?v=w2BssnWcUlQ&list=PLT324XogjOA9rpuZBrX37SC3xwDybH-BT&index=15
16:26 🔗 joepie91 ^ the fun part starts at 1:25
19:37 🔗 Schbirid -buttsmiley
19:39 🔗 Smiley :D
19:52 🔗 Aranje why is that not an option
19:52 🔗 Aranje `wget-warc --buttsmiley` should just do everything automatically
19:56 🔗 Smiley @:D
19:56 🔗 Schbirid :D
19:56 🔗 Schbirid nighty
19:56 🔗 Aranje I liked the at-hat better
19:56 🔗 Aranje (almost said ass-hat)
19:57 🔗 Smiley arrrfuck we seriously need to be so much faster with xanga D:
19:57 🔗 Smiley tomorrow I might try building a EC2 instance again
19:57 🔗 Smiley I just don't get how you pass the variables in D:
20:08 🔗 ivan` what variables
20:40 🔗 godane now i'm parnodia now
20:40 🔗 godane one of files uploaded at 100% but there is no history of the item at all
20:41 🔗 godane thoses files are going to get lost if i think 100% means upload but it turns out archive didn't thank it for some reason
20:46 🔗 omf_ Were you using your custom scripts to do the upload?
20:50 🔗 omf_ Someone on HN said Archive Team is not serious
20:50 🔗 omf_ lol, yeah cause we don't do shit around here
20:52 🔗 Aranje lol taking anything HN says even 1/4 seriously
20:53 🔗 Aranje I wrote a rant that got good coverage, and there were probably 3 good comments
20:53 🔗 Aranje not positive, but ones that added to a discussion
20:54 🔗 godane omf_: i sadly always will have to use custom scripts
20:55 🔗 godane mostly for the g4 videos cause i have to sed the desc and stuff to it
20:56 🔗 godane i have thought of a way to find out what is missing
20:56 🔗 godane my a index.txt file of all the urls i uploaded and download the html
20:56 🔗 godane then sed for what is not uploaded
21:06 🔗 omf_ Collecting and preparing metadata has nothing to do with actually uploading the file and verifying the upload
21:06 🔗 omf_ Your upload part of the process appears to be failing so fix it or replace it
21:06 🔗 godane just give me the line of code then?
21:07 🔗 godane cause i don't know it
21:07 🔗 godane use curl to upload my stuff
21:12 🔗 godane the only way to verifying a upload file is to check for the meta txt file
21:12 🔗 omf_ well there is your first mistake. A programmer for the Internet Archive already wrote an uploader script that handles a lot of the use cases for bulk file uploading so that no one would have to mess with curl on the command line. ias3upload is the program name and quite a few people in here use it on a regular basis because it has a simple interface
21:13 🔗 omf_ wrong, you can check the response code on the s3 query after the upload finishes
21:13 🔗 omf_ So there are at least 2 ways, also you could hit the json api and check it exists and the file size
21:14 🔗 omf_ or the md5sum or the sha1sum
21:14 🔗 godane but you have to make a meta.xml file for the upload script
21:15 🔗 godane i'm doing very bulk uploading and i don't see how to work t
21:15 🔗 godane *it
21:15 🔗 omf_ no you make a metadata.csv for ias3upload so normal people can use a spreadsheet to set it up
21:15 🔗 omf_ I have uploaded hundreds of gigabytes and thousands of files using this method because it is the one recommended for bulk uploading on the Internet Archive's Wiki
21:16 🔗 godane still not getting it
21:17 🔗 godane also i only understand bash not perl
21:17 🔗 omf_ you do not write any code to use it
21:17 🔗 omf_ that is how simple it is
21:17 🔗 omf_ The repo includes full docs and an example metadata.csv for getting started https://github.com/kimmel/ias3upload
21:22 🔗 omf_ Take Smiley for example, he knows bash and does not know Perl. He has uploaded 700+ gb of content using ias3upload and has never bothered with the code
21:22 🔗 godane yes but my did is in warc.gz
21:22 🔗 omf_ so
21:22 🔗 godane how do i sed that data and put into meta
21:22 🔗 godane i have to sed it to get the video file the right info
21:23 🔗 godane bash i understand
21:23 🔗 omf_ okay I can explain this to you. First tell me which meta files you put in your file
21:23 🔗 godane this i don't not
21:23 🔗 godane its automated
21:23 🔗 godane i search for the videokey when a new one needs to be uploaded
21:23 🔗 omf_ You missed the point. What meta fields are generated by your automated script?
21:24 🔗 godane yes
21:24 🔗 omf_ That wasn't a yes or no question.
21:24 🔗 godane oh
21:25 🔗 godane the title and desc are in meta data
21:25 🔗 omf_ So the standard needed data to make an item?
21:27 🔗 omf_ Did you build your scripts off the examples in http://archive.org/help/abouts3.txt ?
21:27 🔗 godane here is my script: http://pastebin.com/N6AASNSB
21:27 🔗 godane yes
21:27 🔗 godane but again very custom to my needs
21:28 🔗 godane i don't want to have took at my scripts when uploading stuff
21:30 🔗 omf_ I am just blown away by overly complicated that whole script is
21:30 🔗 godane all i want is this to have error codes of status
21:30 🔗 omf_ how overly complicated that script is. Just using the ias3upload script gets rid of lines 20 through 37
21:31 🔗 omf_ and builds in numerous levels of error checking and correction that this script doesn't have
21:31 🔗 godane but you don't understand
21:32 🔗 godane i may not even have some of right perl stuff to use it
21:32 🔗 godane and i'm still on slitaz for right now
21:32 🔗 omf_ godane, I have 27 years experience as a programmer. I understand the entire problem just by looking at this script. You scrape urls for videos from warcs using command line tools and brittle regex
21:32 🔗 omf_ and I already knew you use slitaz
21:33 🔗 omf_ then download the video and upload it to IA
21:34 🔗 godane its still not going to fix past failed videos
21:35 🔗 godane i will upload what i can now then do a warc.gz sed of all the archive.org posts to see what failed when i think i'm done
21:35 🔗 omf_ One problem at a time, you just realize that things were failing because you didn't build in any kind of error checking.
21:35 🔗 omf_ if you have a video list in a file you can use ia-dirdiff to check the items on IA
21:37 🔗 godane i will look at it later
21:37 🔗 omf_ failed uploads are a known problem and I already wrote a script to solve it. Been using it for months. https://github.com/kimmel/ia-tools
21:39 🔗 omf_ There are many of us constantly updating the tools we use to make things easier and faster because any problem that has to be solved more than once should and is automated away.
21:44 🔗 godane looks like i can't use the script
21:45 🔗 godane i have no utf8/all.pm file
21:46 🔗 omf_ you have to install the dependencies, the command is in the docs: cpan autodie utf8::all HTTP::Tiny JSON::XS
21:48 🔗 godane i'm going to bed for now
21:48 🔗 godane bbl
22:21 🔗 omf_ To everyone, is it crazy to want us all to use the same open source tools so we can have multiple knowledgeable people to help each other out?
22:21 🔗 omf_ A lot of the problems I keep hearing about with archiving is solvable
22:23 🔗 Aranje no need to reinvent the wheel, usually
22:24 🔗 omf_ Reinventing the wheel has two valid use cases from my experiences. One is if you want to learn how it works end to end. Two if you know there is a design flaw with an existing wheel and the new one is going to fix that
22:28 🔗 Aranje aye, and there's no shortage of brilliant people around to do that if necessary
22:31 🔗 omf_ I think Archive Team could easily use 4-5 more programmers who are going to stick around for the long term
23:19 🔗 arrith1 omf_: if you're referring to the comment by "chid" here https://news.ycombinator.com/item?id=5925970 i don't read it as a slight against AT, but more commenting that Snapjoy's data is private rather than public, so it's quite hard for AT to do a more traditional grab like we do
23:24 🔗 balrog Aranje: try site:snapjoy.com in google
23:24 🔗 arrith1 also having the tools everyone is supposed to use front-and-center on a short 'how to emergency download a site'
23:25 🔗 Aranje urg :(
23:25 🔗 balrog arrith1: *
23:26 🔗 balrog Aranje: sorry! :(
23:26 🔗 arrith1 balrog: ah, comments i'm seeing are calling things private. i guess that's quite incorrect. thanks for the correction
23:27 🔗 balrog most things appear to be private.
23:27 🔗 balrog some users did have public sets.
23:48 🔗 arrith1 ping omf_, re:cuil data for Google Reader archiving
23:50 🔗 ivan` I can provide the domain prefixes and some Python script to parse out what I need, let me know if I can make it even more convenient

irclogger-viewer