#archiveteam 2016-03-09,Wed

↑back Search

Time Nickname Message
00:03 🔗 JW_work eddiedean: btw, you can get unmodified versions of wayback pages by suffixing id_ to the date part (see http://www.archiveteam.org/index.php?title=Internet_Archive#Downloading_from_archive.org )
00:03 🔗 eddiedean JW_work yes, that is what my script does.
00:04 🔗 JW_work ah, good — it wasn't clear if you were trying to retroactively clean up the modified versions, or were just downloading the unmodified ones.
00:05 🔗 eddiedean I may explained myself bad, english isn't my first language :P
00:05 🔗 JW_work and the github link doesn't seem to go anywhere. :-/
00:06 🔗 eddiedean It is fun to monitor the tool, so I can get performance data. Some guy is retrieving library.gnome.org, 32K of files huhuhu
00:07 🔗 eddiedean JW_work haven't published it yet, the icons in the footer are WIP right now :P
00:07 🔗 JW_work heh
00:07 🔗 JW_work I tried python.org and ludios.org — both 504'ed.
00:08 🔗 eddiedean When?
00:08 🔗 JW_work just now
00:08 🔗 eddiedean Hmm, let me check :)
00:09 🔗 Famicoma1 has joined #archiveteam
00:10 🔗 eddiedean It might be the API. 1 min, I'm going to test something
00:20 🔗 eddiedean JW_work retry now :). It was some Google Cloud issue, now it is solved :)
00:23 🔗 JW_work heh, ok
00:24 🔗 JW_work hm, the wayback machine appears to be having some maintainance issues right now
00:24 🔗 xmc yeah they just rebooted it like 20 minutes ago
00:24 🔗 xmc reeeelllllaaaaaxxxxx
00:25 🔗 JW_work eddiedean: so, I'd think people would be … reticient … about provinding you with email addresses, rather than you just, you know, providing a normal download link.
00:26 🔗 eddiedean They can use a disposable one. But I could provide a temp link that shows the download link when it is ready. The thing is that each download request goes to a queue, and it takes some time to be processed because each snapshot has different size
00:27 🔗 JW_work I think a download link would be preferable.
00:28 🔗 eddiedean Good idea. I'll add a download link option too, so people can choose.
00:30 🔗 eddiedean I guess that the issues that I'm having right now are because archive.org servers have been rebooted?
00:31 🔗 eddiedean Because I've added some caching to this, and a previous requested URL can be shown, but not a new one.
00:51 🔗 dashcloud has quit IRC (Read error: Operation timed out)
00:54 🔗 dashcloud has joined #archiveteam
00:56 🔗 chfoo0 is now known as chfoo
01:10 🔗 BnA-Rob1n Anyone here who can add some more items for fotolog?
01:29 🔗 bai_ is now known as bai
01:30 🔗 eddiedean has quit IRC (Ping timeout: 260 seconds)
01:36 🔗 pft has joined #archiveteam
01:38 🔗 JesseW has joined #archiveteam
01:40 🔗 dxrt_ is now known as dxrt
01:48 🔗 dashcloud has quit IRC (Read error: Operation timed out)
01:48 🔗 hawc145 is now known as HCross
01:52 🔗 dashcloud has joined #archiveteam
01:57 🔗 Guest_ has joined #archiveteam
02:36 🔗 tomwsmf-a has joined #archiveteam
02:36 🔗 arkiver has quit IRC (Ping timeout: 260 seconds)
02:37 🔗 Guest_ has quit IRC (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
02:43 🔗 dxrt has quit IRC (Read error: Operation timed out)
02:43 🔗 dxrt has joined #archiveteam
02:43 🔗 dxrt- sets mode: +o dxrt
02:44 🔗 Peetz0r_ has quit IRC (Read error: Operation timed out)
02:44 🔗 Jonimus has quit IRC (Read error: Operation timed out)
02:44 🔗 mhazinsk has quit IRC (Read error: Operation timed out)
02:44 🔗 bai has quit IRC (Read error: Operation timed out)
02:44 🔗 bai has joined #archiveteam
02:45 🔗 aMunster has quit IRC (Read error: Operation timed out)
02:45 🔗 vegbrasil has quit IRC (Read error: Operation timed out)
02:46 🔗 beardicus has quit IRC (Read error: Operation timed out)
02:47 🔗 maseck has quit IRC (Read error: Operation timed out)
02:47 🔗 toad1 has quit IRC (Read error: Operation timed out)
02:48 🔗 phuz has quit IRC (Read error: Operation timed out)
02:48 🔗 SimpBrai1 has quit IRC (Read error: Operation timed out)
02:49 🔗 is- has joined #archiveteam
02:49 🔗 jmad980 has joined #archiveteam
02:49 🔗 closure has quit IRC (Ping timeout: 633 seconds)
02:50 🔗 jmad980_ has quit IRC (Read error: Operation timed out)
02:50 🔗 Peetz0r has joined #archiveteam
02:50 🔗 nwf has quit IRC (Read error: Operation timed out)
02:51 🔗 maseck has joined #archiveteam
02:52 🔗 MMovie has quit IRC (Ping timeout: 633 seconds)
02:52 🔗 is-_ has quit IRC (Read error: Connection reset by peer)
02:54 🔗 phuzion has joined #archiveteam
02:56 🔗 arkiver has joined #archiveteam
02:57 🔗 SimpBrai1 has joined #archiveteam
03:07 🔗 VADemon has quit IRC (Quit: left4dead)
03:12 🔗 toad1 has joined #archiveteam
03:14 🔗 beardicus has joined #archiveteam
03:14 🔗 vegbrasil has joined #archiveteam
03:14 🔗 winr4r has quit IRC (Ping timeout: 260 seconds)
03:14 🔗 winr4r has joined #archiveteam
03:15 🔗 closure has joined #archiveteam
03:30 🔗 JesseW has quit IRC (Quit: Leaving.)
03:31 🔗 aMunster has joined #archiveteam
03:35 🔗 MMovie has joined #archiveteam
03:58 🔗 SketchCow * rebooted
03:58 🔗 SketchCow * rsync back up
03:58 🔗 bwn has quit IRC (Ping timeout: 492 seconds)
04:03 🔗 ErkDog nice
04:03 🔗 ErkDog but 100k/sec still :(
04:05 🔗 SketchCow Is it.
04:06 🔗 ErkDog yeah I got failed rsynch messages
04:06 🔗 ErkDog then it connected back
04:06 🔗 ErkDog and 70-100 K/sec
04:06 🔗 ErkDog http://puu.sh/nzYJA/b677b392d7.png
04:06 🔗 SketchCow Well, let's assume it's you, and that's why mommy and daddy aren't together.
04:06 🔗 SketchCow Guess we'll have to wait when I'm onsite.
04:06 🔗 SketchCow Unless you want to be an extra credit traceroute noodle
04:07 🔗 ErkDog sure
04:08 🔗 ErkDog http://paste.nerds.io/cumoqoloqi.avrasm
04:08 🔗 ErkDog goku.ecansol.net will get you back this way
04:09 🔗 dxrt It's still buggered from France, 30KB/s.
04:11 🔗 nwf has joined #archiveteam
04:17 🔗 Jonimus has joined #archiveteam
04:19 🔗 bwn has joined #archiveteam
04:20 🔗 bwn has quit IRC (Client Quit)
04:23 🔗 mhazinsk has joined #archiveteam
04:31 🔗 FalconK if anyone wants, the bits I am running are https://github.com/falconkirtaran/ArchiveBot
04:31 🔗 xXx_ndidd has quit IRC (Read error: Connection reset by peer)
04:31 🔗 FalconK my pipeline is (slowly) emptying its buffer out.
04:31 🔗 FalconK 17GB free now.
04:34 🔗 logan2 has joined #archiveteam
04:36 🔗 logan has quit IRC (Read error: Operation timed out)
04:36 🔗 khaoohs_ has joined #archiveteam
04:39 🔗 Froggypwn has quit IRC (Read error: Operation timed out)
04:39 🔗 vtyl has joined #archiveteam
04:40 🔗 Froggypwn has joined #archiveteam
04:40 🔗 lytv has quit IRC (Read error: Operation timed out)
04:41 🔗 khaoohs has quit IRC (Read error: Operation timed out)
04:48 🔗 Simpbrai_ has quit IRC (Read error: Operation timed out)
04:48 🔗 maseck has quit IRC (Read error: Connection reset by peer)
04:48 🔗 Simpbrai_ has joined #archiveteam
04:49 🔗 maseck has joined #archiveteam
05:01 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
05:11 🔗 acridAxid has quit IRC (marauder)
05:12 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:12 🔗 acridAxid has joined #archiveteam
05:15 🔗 ErkDog How is a message board with 200,000 threads considered "small" lol
05:19 🔗 Sk1d has joined #archiveteam
05:45 🔗 logan2 has quit IRC (Read error: Operation timed out)
05:46 🔗 logan has joined #archiveteam
06:16 🔗 Famicoma1 has quit IRC (Ping timeout: 260 seconds)
06:25 🔗 WinterFox has joined #archiveteam
06:28 🔗 JesseW has joined #archiveteam
07:20 🔗 metalcamp has joined #archiveteam
07:22 🔗 davidar has joined #archiveteam
07:23 🔗 davidar ping arkiver
07:24 🔗 davidar or Fletcher
07:31 🔗 JesseW ?
07:33 🔗 davidar hey JesseW
07:34 🔗 davidar just need to clarify some more details on how this pdf crawl will work
07:34 🔗 davidar arkiver said the pdfs can be delivered to an rsync target
07:34 🔗 davidar are we also able to attach original URLs to those pdfs?
07:37 🔗 Famicoma1 has joined #archiveteam
07:38 🔗 JesseW we certainly could, yeah
07:39 🔗 JesseW simplest way is probably to make the filename a normalized form of the URL.
07:40 🔗 JesseW i.e. if the URL is http://forge.fh-potsdam.de/~IFLA/INSPEL/96-1riea.pdf the resulting file name would be: forge.fh-potsdam.de_~IFLA_INSPEL_96-1riea.pdf
07:41 🔗 JesseW (with the http:// removed and slashes converted into underscores)
07:42 🔗 JesseW If you wanted to more reverseably normalize them, you could convert the slashes to _SLASH_ , instead.
07:42 🔗 JesseW davidar:
07:46 🔗 davidar that's true
07:47 🔗 davidar could we also decide on identifiers for each URL beforehand, and then just use that?
07:47 🔗 davidar (not sure if that would be simpler than normalising URLs)
07:50 🔗 davidar JesseW: but the filename thing sounds fine (so long as it doesn't introduce any ambiguity)
07:52 🔗 JesseW eh, we could prefix a hash of the url to the filename, but that wouldn't really get us anything more than including the actual url. :-)
07:52 🔗 JesseW but IDK about how warrior jobs work in detail -- so this may already be handled
07:53 🔗 JesseW Assuming no URL actually contains "_SLASH_" there wouldn't be any ambiguity.
07:53 🔗 JesseW davidar:
07:54 🔗 davidar cool
08:00 🔗 JesseW has quit IRC (Quit: Leaving.)
08:14 🔗 signius has joined #archiveteam
08:35 🔗 FalconK yo, anyone know any details about archive.org rationing on S3 API uploads?
08:36 🔗 xmc nope, never hit it
08:36 🔗 xmc i think it's when the edge boxes get congested?
08:36 🔗 FalconK I'm wondering what the rationing enabled condition means exactly and how long it might taje to clear
08:36 🔗 FalconK **take
08:36 🔗 xmc probably bitrate
08:36 🔗 FalconK probably
08:36 🔗 xmc i never hit it uploading a zillion tiny items
08:36 🔗 FalconK there are two expressions of it
08:36 🔗 FalconK one is whether rationing is enabled, and another is whether you hit your limit in particular
08:37 🔗 xmc mmm ok
08:37 🔗 FalconK the implementation seems poor: if you begin an upload while you are not at your limit, it will happily cancel a large upload halfway done (even knowing your size hint), only to let you begin anew and do the same thing
08:38 🔗 FalconK so I told it to block whenever rationing is enabled at all, but I wonder if there is a better answer
08:38 🔗 FalconK I mean, I'm hitting it with a *lot* of traffic
08:39 🔗 FalconK like in the past 6 hours, maybe as much as 10GB/hr
08:41 🔗 roninski has joined #archiveteam
08:42 🔗 yipdw_ I've received 503 Slow Down before
08:42 🔗 yipdw_ usually the condition will reset once the machines are ok to take more
08:42 🔗 roninski has left
08:43 🔗 xmc it cancels uploads midstream?
08:43 🔗 xmc are you sending a size hint?
08:43 🔗 roninski has joined #archiveteam
08:43 🔗 xmc x-archive-size-hint:19327352832 (in bytes)
08:44 🔗 yipdw_ as far as I know, detecting when to resume isn't possible client-side; megawarc factory handles this by just applying its usual retry strategy
08:45 🔗 yipdw_ this does mean that you might get consecutive failures, but so far nobody has complained
08:46 🔗 midas also, dont upload huge files. s3 hates it. you might not get a slowdown but a disk is full really sucks when you upload a 1TB tar file
08:53 🔗 SketchCow This is why I cut megawarc files down to 50gb
08:53 🔗 SketchCow Big enough to not make 100,000 objects, small enough the system doesn't fucking explode.
08:54 🔗 schbirid has joined #archiveteam
08:56 🔗 SketchCow So. I just turned on the archivebot uploading.
08:57 🔗 FalconK I'm only uploading 5GB warcs at the moment
08:58 🔗 FalconK you can query the status of the throttling, which exposes counters at the API key, bucket, and general levels, and indicates whether your API key in particular is at its limit right now, which I believe means an upload right now would get a 503 error
08:59 🔗 yipdw_ oh
08:59 🔗 FalconK I don't really understand what the counters mean and I don't see any documentation on them
08:59 🔗 * yipdw_ never bothered
08:59 🔗 FalconK well I noticed that my uploader would upload like 500mb, and then get 503 and curl would fail out
08:59 🔗 FalconK then it would upload the same 500mb and do it again
08:59 🔗 yipdw_ I've noticed that sometimes as well
08:59 🔗 FalconK so that it would keep wasting everyone's bandwidth
08:59 🔗 yipdw_ I just let it loop
08:59 🔗 midas ^
08:59 🔗 midas that
09:00 🔗 FalconK so I wrote a little thing that queries first, and tries to predict what will happen, in a loop.
09:00 🔗 yipdw_ because I figure the more intelligence I add, the more it will fuck up
09:00 🔗 FalconK right now if it sees that it's throttled or that rationing is globally on, it waits 5 seconds and asks again
09:00 🔗 HCross2 FalconK: can you pm me that?
09:01 🔗 FalconK HCross2: it's in github - https://github.com/falconkirtaran/ArchiveBot/blob/master/uploader/uploader.py#L35
09:01 🔗 midas S3 status messages are just as informative as a blank paper, i wouldnt bother and just keep pushing it in untill it accepts
09:01 🔗 yipdw_ eh the code's there
09:02 🔗 yipdw_ if it works, whichever
09:02 🔗 FalconK it's trying to tell me *something*
09:02 🔗 yipdw_ I just never bothered for the above reasons
09:02 🔗 Guest__ has joined #archiveteam
09:02 🔗 FalconK over_limit is for sure informative as it means any activity will block
09:02 🔗 FalconK er, will 503
09:04 🔗 yipdw_ it looks like over_limit is the only flag that you can rely on
09:04 🔗 FalconK I would be interested to know what constitutes a "task" for these counters
09:05 🔗 yipdw_ so I guess it's probably fine to just wait until it's cleared and assume zero means go-ahead
09:05 🔗 yipdw_ if it cuts you off midstream, oh well, what's a few gigabytes between friends
09:05 🔗 FalconK well, yes, but it looked like any attempt at all to upload giant things at the rate I am capable of uploading them will exceed my ration
09:05 🔗 yipdw_ today just might not be a good day
09:06 🔗 FalconK it depends on "task"
09:06 🔗 yipdw_ I've pushed stuff in at ~600 Mbps
09:06 🔗 yipdw_ it does, but the detail field is also explicitly documented as internal
09:06 🔗 FalconK yeah, while rationing_engaged=0, it doesn't seem to care what kind of mess I make
09:06 🔗 yipdw_ as a result it doesn't seem like it's anything to rely on
09:07 🔗 SketchCow s3 is overloaded, so there's that.
09:07 🔗 FalconK I'm trying to be a good netizen and not create ops problems ;)
09:07 🔗 FalconK what is it overloaded by? semantic processing? bandwidth? data migration?
09:09 🔗 FalconK it looks as though people are enqueueing tasks at a very high rate tonight
09:09 🔗 midas http://i.imgur.com/Q45H7.gif <-- s3 graph
09:10 🔗 FalconK lol
09:10 🔗 midas not to worry, it will unclog at a certain moment
09:11 🔗 yipdw_ there might be someone at IA you can email; I'm not sure who that'd be or if they're available for this sort of thing
09:11 🔗 schbirid i hope s3 gave their consent
09:11 🔗 schbirid ;P
09:11 🔗 midas lol
09:11 🔗 FalconK ... it seems as though it should be possible to approximate the desired rate by seeing that accesskey_tasks_queued<accesskey_ration, bucket_tasks_queued<bucket_ration, and total_tasks_queued<total_global_limit
09:11 🔗 midas SketchCow: nice work on the busy gif btw
09:12 🔗 midas https://monitor.archive.org/about/busy.gif
09:12 🔗 FalconK and long polling while querying that until the condition is true
09:12 🔗 FalconK but that is WAY too much work
09:12 🔗 FalconK and I can't do that while shelling out to curl for uploads really
09:12 🔗 pikhq has quit IRC (Ping timeout: 506 seconds)
09:12 🔗 FalconK and that would really put too much logic in there
09:12 🔗 FalconK ugh
09:17 🔗 midas what you can do FalconK is have a look at http://monitor.archive.org/stats/s3.php
09:18 🔗 midas but that's mostly old data, as in not live feed
09:18 🔗 yipdw_ I don't know how you'd use that in a script
09:18 🔗 midas mostly to check with your eyes and see "well this seems to be a bad time to upload something" ;-)
09:19 🔗 FalconK not even a bit amenable to automation but it's nice to have the stats.
09:19 🔗 yipdw_ nobody really does that with the archivebot uploader
09:19 🔗 yipdw_ hell I forget it's running
09:19 🔗 yipdw_ I'd prefer to keep forgetting that it's there
09:19 🔗 FalconK :P
09:19 🔗 midas thats because the archivebot uploader keeps looping anyway
09:20 🔗 yipdw_ anyway good luck I guess -- I just haven't hit 503 Slow Down often enough to really look into optimizing backoff/retry
09:20 🔗 yipdw_ if you hear back from (say) IA staff it'd be cool to have that info
09:21 🔗 yipdw_ it'd be super-cool if http://archive.org/help/abouts3.txt was updated with detail info, but the way that's written sounds like the rate limiting policy is in flux
09:21 🔗 HCross2 I'm talking to Mark from the IA tonight. I'll bring up the slowdowns and see
09:21 🔗 FalconK it doesn't look well-correlated with anything in particular
09:22 🔗 FalconK yeah!
09:22 🔗 FalconK it would be nice to know like
09:22 🔗 SketchCow Mark will not be able to help you.
09:22 🔗 FalconK anything at all that relates upload size to tasks
09:22 🔗 bwn has joined #archiveteam
09:22 🔗 SketchCow I mean, I know why FOS is slow and why S3 has problems.
09:23 🔗 FalconK it'd be awful nice if it could use my size hint to dispose of the request fast
09:28 🔗 HCross2 Ok
09:35 🔗 SketchCow s3 unjammed
09:35 🔗 FalconK so it is!
09:35 🔗 FalconK I also notice that most of my uploads have been flagged for admin intervention due to being on full disks as derive.php started
09:37 🔗 FalconK or things like connect to host iw600504 port 22: No route to host
09:37 🔗 FalconK guessing it's related to the downtime today and derivation will continue anon.
09:55 🔗 roninski1 has joined #archiveteam
09:56 🔗 MMovie has quit IRC (Read error: Operation timed out)
09:57 🔗 vitzli has joined #archiveteam
10:01 🔗 roninski has quit IRC (Read error: Operation timed out)
10:06 🔗 metalcamp has quit IRC (Ping timeout: 250 seconds)
10:52 🔗 atomotic has joined #archiveteam
11:11 🔗 metalcamp has joined #archiveteam
11:21 🔗 db48x has quit IRC (Read error: Connection reset by peer)
11:41 🔗 metalcamp has quit IRC (Ping timeout: 250 seconds)
11:53 🔗 roninski has joined #archiveteam
11:58 🔗 roninski1 has quit IRC (Read error: Operation timed out)
12:00 🔗 khaoohs_ has quit IRC (Read error: Operation timed out)
12:00 🔗 khaoohs has joined #archiveteam
12:07 🔗 roninski1 has joined #archiveteam
12:12 🔗 MMovie has joined #archiveteam
12:13 🔗 roninski has quit IRC (Read error: Operation timed out)
12:31 🔗 jmad980 has quit IRC (Read error: Operation timed out)
12:33 🔗 metalcamp has joined #archiveteam
12:35 🔗 Jonimus has quit IRC (Read error: Operation timed out)
12:35 🔗 nwf has quit IRC (Read error: Operation timed out)
12:36 🔗 aMunster has quit IRC (Read error: Operation timed out)
12:36 🔗 toad1 has quit IRC (Read error: Operation timed out)
12:36 🔗 mhazinsk has quit IRC (Read error: Operation timed out)
12:37 🔗 MMovie has quit IRC (Read error: Operation timed out)
12:37 🔗 vegbrasil has quit IRC (Read error: Operation timed out)
12:38 🔗 closure has quit IRC (Read error: Operation timed out)
12:38 🔗 vtyl has quit IRC (Read error: Operation timed out)
12:41 🔗 beardicus has quit IRC (Read error: Operation timed out)
12:42 🔗 lytv has joined #archiveteam
12:42 🔗 toad1 has joined #archiveteam
12:49 🔗 jmad980 has joined #archiveteam
12:51 🔗 metal_cam has joined #archiveteam
12:51 🔗 metalcamp has quit IRC (Ping timeout: 258 seconds)
12:54 🔗 WinterFox has quit IRC (Remote host closed the connection)
12:57 🔗 metal_cam is now known as metalcamp
13:04 🔗 VADemon has joined #archiveteam
13:05 🔗 beardicus has joined #archiveteam
13:06 🔗 vegbrasil has joined #archiveteam
13:08 🔗 closure has joined #archiveteam
13:08 🔗 dserodio has joined #archiveteam
13:16 🔗 aMunster has joined #archiveteam
13:54 🔗 maseck has quit IRC (Quit: No Ping reply in 180 seconds.)
13:55 🔗 maseck has joined #archiveteam
14:05 🔗 vitzli has quit IRC (Leaving)
14:06 🔗 Jonimus has joined #archiveteam
14:10 🔗 MMovie has joined #archiveteam
14:15 🔗 mhazinsk has joined #archiveteam
14:23 🔗 tomwsmf-a has joined #archiveteam
14:26 🔗 nwf has joined #archiveteam
14:35 🔗 pgoetz has quit IRC (Remote host closed the connection)
15:06 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
15:07 🔗 Start http://www.theverge.com/2016/3/9/11184518/flickr-photo-uploader-now-paid-feature
15:10 🔗 Start has quit IRC (Quit: Disconnected.)
15:15 🔗 Ungstein1 has quit IRC (Read error: Connection reset by peer)
15:17 🔗 Morbus has joined #archiveteam
15:21 🔗 Ungstein has joined #archiveteam
15:24 🔗 SketchCow Bad sign
15:28 🔗 * ersi starts the Death Watch
15:36 🔗 Ungstein has quit IRC (Quit: Leaving.)
15:41 🔗 Ungstein has joined #archiveteam
15:45 🔗 Start has joined #archiveteam
16:23 🔗 RichardG_ has quit IRC (Ping timeout: 258 seconds)
16:30 🔗 RichardG has joined #archiveteam
16:34 🔗 arkiver2 has joined #archiveteam
16:50 🔗 vOYtEC has quit IRC (rm -r *)
16:51 🔗 Start_ has joined #archiveteam
16:51 🔗 Start has quit IRC (Read error: Connection reset by peer)
16:51 🔗 Start_ is now known as Start
16:59 🔗 JesseW has joined #archiveteam
17:03 🔗 MMovie has quit IRC (Read error: Connection reset by peer)
17:05 🔗 MMovie has joined #archiveteam
17:07 🔗 Start has quit IRC (Quit: Disconnected.)
17:13 🔗 JesseW has quit IRC (Quit: Leaving.)
17:20 🔗 arkiver2 has quit IRC (Ping timeout: 258 seconds)
17:23 🔗 metalcamp has quit IRC (Ping timeout: 258 seconds)
17:28 🔗 HCross FalconK, does your s3 thingy need the secret key, or just the access key?
17:29 🔗 godane SketchCow: 2010 mp3s of kpfa will be all uploaded by tonight
17:29 🔗 godane i'm up to 2010-12-09 right now
17:31 🔗 atomotic has joined #archiveteam
17:42 🔗 metalcamp has joined #archiveteam
17:44 🔗 xmc FalconK: are you uploading with the magic flag set that blocks derive operations until you're done putting files into the item?
17:44 🔗 xmc it could be that you are putting too many derive jobs in the queue
17:46 🔗 VADemon has quit IRC (Quit: left4dead)
17:51 🔗 vOYtEC has joined #archiveteam
17:51 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
17:58 🔗 Frogging SketchCow: I thought it was already clear that Flickr was in trouble since Yahoo told their shareholders they were going to kill it :p
18:01 🔗 HCross just the words "owned by Yahoo" puts anything close to death
18:02 🔗 Frogging mhmm
18:02 🔗 philpem has joined #archiveteam
18:20 🔗 johtso is there anywhere that caches of github repositories can be found?
18:28 🔗 MrRadar Cached as in the Git repository or cached as in the other Github stuff (issues, pull requests, etc)?
18:39 🔗 jut has joined #archiveteam
18:40 🔗 Froggypwn has quit IRC (Ping timeout: 258 seconds)
18:42 🔗 johtso MrRadar: the files themselves, I can see the code in google's cache, but can't get at the one DLL file I need :(
18:43 🔗 johtso https://webcache.googleusercontent.com/search?q=cache:7NqX3jT4LUQJ:https://github.com/jgoewert/USBCD-Module+&cd=1&hl=en&ct=clnk&gl=uk
18:43 🔗 johtso a shame internet archive doesn't crawl github :(
18:47 🔗 MrRadar Yeah, though GitHub repositories are gigantic if you actually capture the source code through the web interface
18:47 🔗 MrRadar It doesn't look like anyone forked that, unfortunately
18:47 🔗 DFJustin there have been people here pulling github repos but I don't know what the current status is
18:50 🔗 GChriss has joined #archiveteam
18:51 🔗 GChriss a shutdown notice from pyvideo.org that hasn't been added to the wiki yet: http://bluesock.org/~willkg/blog/pyvideo/status_20160115.html
18:51 🔗 MrRadar I think we already grabbed that through ArchiveBot
18:52 🔗 MrRadar Yes: http://archive.fart.website/archivebot/viewer/job/eilj3
18:52 🔗 GChriss yes, checking degree of completeness now
18:53 🔗 GChriss youtube embedding is broken but I'm guessing that's expected
18:58 🔗 bwn has quit IRC (Read error: Operation timed out)
19:01 🔗 yipdw_ if you're checking completeness, try multiple replay tools
19:04 🔗 JW_work Hm; it might be worth making a tool to automatically pick through the github firehose and fork any repository that didn't have any forks after a couple of days.
19:05 🔗 JW_work (if one has enough space, cloning them privately might be even better, to avoid cases where github decides they don't want to host the repo *or* mirrors they know about)
19:05 🔗 phuzion Project name: Giterdun
19:06 🔗 JW_work :-)
19:09 🔗 schbirid has quit IRC (Quit: Leaving)
19:14 🔗 joepie91 lol
19:23 🔗 bwn has joined #archiveteam
19:28 🔗 jut has quit IRC (Read error: Connection reset by peer)
19:28 🔗 schbirid has joined #archiveteam
19:46 🔗 roninski has joined #archiveteam
19:50 🔗 roninski1 has quit IRC (Read error: Operation timed out)
20:38 🔗 Start has joined #archiveteam
20:45 🔗 Start has quit IRC (Quit: Disconnected.)
21:02 🔗 K4k has quit IRC (Ping timeout: 260 seconds)
21:06 🔗 K4k has joined #archiveteam
21:07 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
21:10 🔗 Lord_Nigh has joined #archiveteam
21:11 🔗 balrog sets mode: +o Lord_Nigh
21:16 🔗 metalcamp has quit IRC (Ping timeout: 258 seconds)
21:39 🔗 schbirid has quit IRC (Quit: Leaving)
21:40 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
22:01 🔗 johtso or clone to a private bitbucket repo
22:28 🔗 lysobit has quit IRC (Quit: that hurt deep, nsh)
22:32 🔗 lysobit has joined #archiveteam
22:38 🔗 ndiddy has joined #archiveteam
22:44 🔗 Kenshin has quit IRC (Ping timeout: 260 seconds)
22:48 🔗 Kenshin has joined #archiveteam
22:49 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:50 🔗 arkiver Is someone using the IA CDX API for anything flickr related currently?
22:57 🔗 dashcloud has joined #archiveteam
23:09 🔗 godane https://archive.org/details/ET_And_Friends_1982
23:26 🔗 MMovie has quit IRC (Read error: Operation timed out)
23:26 🔗 aMunster has quit IRC (Read error: Operation timed out)
23:27 🔗 vegbrasil has quit IRC (Read error: Operation timed out)
23:27 🔗 nwf has quit IRC (Read error: Operation timed out)
23:28 🔗 mhazinsk has quit IRC (Read error: Operation timed out)
23:28 🔗 beardicus has quit IRC (Read error: Operation timed out)
23:30 🔗 closure has quit IRC (Read error: Operation timed out)
23:41 🔗 Jonimus has quit IRC (Ping timeout: 633 seconds)
23:54 🔗 beardicus has joined #archiveteam
23:57 🔗 vegbrasil has joined #archiveteam

irclogger-viewer