#archiveteam 2016-08-18,Thu

↑back Search

Time Nickname Message
00:01 πŸ”— DoomTay Did it ever allow for multiple pages on a site?
00:02 πŸ”— * joepie91 walks in
00:02 πŸ”— ErkDog hey Joe, we're trying to figure out the most graceful way of saving YTMD
00:03 πŸ”— joepie91 yeah, saw in -bs
00:03 πŸ”— arkiver max: do you think you can provide a list of sites?
00:04 πŸ”— ErkDog when you say list, you mean the list of YTMD's?
00:04 πŸ”— ErkDog YTMND*'s?
00:04 πŸ”— nicolas17 arkiver: I think you could go through http://blah.ytmnd.com/info/{number}/json incrementally?
00:04 πŸ”— * joepie91 is currently reading
00:04 πŸ”— ErkDog should be pretty easy actually he could just send us the zone file
00:04 πŸ”— ErkDog since literally every YTMND is on it's own host
00:04 πŸ”— nicolas17 ErkDog: I assume he has a wildcard *.ytmnd.com at the DNS level
00:04 πŸ”— joepie91 max: do you by any chance still have the source of the HTML5 version, even if it's incomplete?
00:04 πŸ”— ErkDog ohhhhhh good point :(
00:05 πŸ”— ErkDog ohhhh well then in his database table
00:05 πŸ”— arkiver Thanks for pointing that out nicolas17
00:05 πŸ”— joepie91 max: if released as open-source it might drive people to continue developing it, even if just as a future-proof way of viewing the YTMBD stuff
00:05 πŸ”— joepie91 er
00:05 πŸ”— arkiver max: would the http://blah.ytmnd.com/info/{number}/json method get us all sites?
00:05 πŸ”— joepie91 YTMND *
00:05 πŸ”— ErkDog he should have <wahtever>.ytmnd.com so it could match
00:05 πŸ”— arkiver Or are there some special cases
00:08 πŸ”— Frogging http://archiveteam.org/index.php?title=YTMND
00:08 πŸ”— arkiver Awesome Frogging!
00:09 πŸ”— ErkDog wlel I've been trying random numbers in the json
00:09 πŸ”— ErkDog up to 25000 with a response so far
00:09 πŸ”— ErkDog some have said error no site
00:09 πŸ”— arkiver ok
00:09 πŸ”— xmc hey max do you have a list of names that you could share?
00:10 πŸ”— arkiver ^that would be very helpful
00:10 πŸ”— ErkDog https://puu.sh/qF8Bx/9d14b6c213.png
00:11 πŸ”— ErkDog so basically we could crawl the json's up
00:11 πŸ”— ErkDog and it gives you the "domain" in the json
00:11 πŸ”— Froggypwn has quit IRC (Read error: Operation timed out)
00:11 πŸ”— ErkDog then we crawl the domain.ytmnd.com for WARCing
00:13 πŸ”— ErkDog https://puu.sh/qF8MY/96a6c609bd.png
00:13 πŸ”— ErkDog https://puu.sh/qF8NY/de3ee558f9.png
00:13 πŸ”— ErkDog I joined YTMND 12 yrs ago, jesus
00:19 πŸ”— ErkDog http://ateam-test-1.ytmnd.com/
00:19 πŸ”— ErkDog just made that
00:20 πŸ”— ErkDog OK I just made one it's ID is: 1008765
00:22 πŸ”— howdoicom has quit IRC (Quit: Page closed)
00:24 πŸ”— ErkDog wow you have got to be kidding me
00:24 πŸ”— ErkDog I can't edit the YTMND wiki page
00:24 πŸ”— ErkDog https://puu.sh/qF9sQ/fa44f78678.png
00:25 πŸ”— ErkDog because YTMND.com is blacklisted external site
00:25 πŸ”— joepie91 lol
00:26 πŸ”— ErkDog Frogging must have super powers
00:26 πŸ”— Petri152 has joined #archiveteam
00:28 πŸ”— ErkDog I had to put spaces in the URLs I guess someone will have to fix it besides me
00:28 πŸ”— RedType_ has quit IRC (Read error: Operation timed out)
00:29 πŸ”— arkiver ytmnd is on the tracker page
00:30 πŸ”— arkiver max: any limits or special status codes?
00:30 πŸ”— ErkDog I summarised the info we had so far that would allow a sucessful crawl arkiver
00:30 πŸ”— arkiver thanks
00:30 πŸ”— ErkDog barring additional resources/info provided by Max
00:32 πŸ”— ErkDog Also if it's only 1.7 TB, I could crawl and push that out in a few days, if you didn't want to do all the extra crazy stuff to add it into the warrior
00:42 πŸ”— nicolas17 ErkDog: I think the 1.7TB includes stuff that isn't publicly or easily accessible
00:43 πŸ”— nicolas17 so crawling you would get less :P
00:43 πŸ”— DoomTay Hmm.. for a handful of sites, forcing HTML5 results an error message saying the audio could not be decoded
00:51 πŸ”— SketchCow max: Jason Scott here, we can also do a full version of the 1.7tb collection and put it into the Internet Archive's dark archives for safekeeping.
00:51 πŸ”— arkiver We should do both
00:51 πŸ”— SketchCow We should. That's what I'm saying.
00:51 πŸ”— arkiver awesome
00:52 πŸ”— arkiver I think we're going to start the crawls from for example http://ytmnd.com/sites/991586/profile
00:53 πŸ”— arkiver I'm off
00:54 πŸ”— arkiver max: if you can, please provide a list of sites, users and keywords (if that isn't easily possible, we can extract some ourselves too)
00:54 πŸ”— * arkiver is afk for the night
00:59 πŸ”— JesseW has joined #archiveteam
01:10 πŸ”— DoomTay has quit IRC (Quit: Page closed)
01:21 πŸ”— tomwsmf has quit IRC (Read error: Operation timed out)
01:26 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
01:34 πŸ”— RedType has joined #archiveteam
01:37 πŸ”— DoomTay has joined #archiveteam
01:50 πŸ”— JesseW has joined #archiveteam
02:00 πŸ”— Froggypwn has joined #archiveteam
02:01 πŸ”— Honno has joined #archiveteam
02:02 πŸ”— DoomTay has quit IRC (Quit: Page closed)
02:06 πŸ”— DoomTay has joined #archiveteam
02:45 πŸ”— DoomTay has quit IRC (Quit: Page closed)
02:51 πŸ”— tuankiet6 has joined #archiveteam
02:52 πŸ”— tuankiet6 is now known as tuankiet
03:04 πŸ”— RichardG has quit IRC (Read error: Connection reset by peer)
03:04 πŸ”— RichardG has joined #archiveteam
03:12 πŸ”— RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
03:15 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
03:39 πŸ”— RichardG has joined #archiveteam
03:50 πŸ”— mutoso has quit IRC (Ping timeout: 260 seconds)
03:58 πŸ”— mutoso has joined #archiveteam
04:16 πŸ”— JesseW has joined #archiveteam
04:18 πŸ”— nicolas17 has quit IRC (Read error: Operation timed out)
04:22 πŸ”— Sk1d has quit IRC (Ping timeout: 194 seconds)
04:30 πŸ”— Sk1d has joined #archiveteam
04:42 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
04:52 πŸ”— RedType has quit IRC (Read error: Operation timed out)
05:02 πŸ”— JesseW has joined #archiveteam
05:31 πŸ”— RichardG has quit IRC (Read error: Operation timed out)
05:31 πŸ”— RichardG has joined #archiveteam
05:54 πŸ”— dan- has quit IRC (Ping timeout: 633 seconds)
05:54 πŸ”— RedType has joined #archiveteam
05:57 πŸ”— dan- has joined #archiveteam
06:25 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
06:29 πŸ”— uosdwis has joined #archiveteam
06:31 πŸ”— uosdwis hi. I'm running the warrior but it's not getting any items. is the tracker down?
06:32 πŸ”— aschmitz has quit IRC (Read error: Operation timed out)
06:39 πŸ”— aschmitz has joined #archiveteam
07:02 πŸ”— uosdwis has quit IRC (Quit: Page closed)
07:11 πŸ”— RichardG has quit IRC (Ping timeout: 255 seconds)
07:21 πŸ”— xmc tracker is up, but we complete projects faster than we can start them
07:21 πŸ”— xmc but uosdwis is gone anyway
07:22 πŸ”— Atluxity we are too good
07:25 πŸ”— BlueMaxim has quit IRC (Read error: Operation timed out)
07:26 πŸ”— tuankiet Anyone got this error while running googlecode-grab? Lua runtime error: googlecode.lua:375: invalid use of '%' in replacement string. (It's just on Arch Linux I think, my VPS running Ubuntu 16.04 doesn't have this error)
07:26 πŸ”— BlueMaxim has joined #archiveteam
07:32 πŸ”— les has joined #archiveteam
07:32 πŸ”— les WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
07:33 πŸ”— xmc yahoosucks
07:33 πŸ”— les got it, thanks
07:33 πŸ”— les has quit IRC (Client Quit)
07:52 πŸ”— db48x has quit IRC (Read error: Operation timed out)
08:02 πŸ”— kristian_ has joined #archiveteam
08:33 πŸ”— fie has joined #archiveteam
08:38 πŸ”— RichardG has joined #archiveteam
08:47 πŸ”— BlueMaxim has quit IRC (Read error: Operation timed out)
08:48 πŸ”— BlueMaxim has joined #archiveteam
09:26 πŸ”— SketchCow Go forth
10:23 πŸ”— kristian_ has quit IRC (Leaving)
10:36 πŸ”— Peetz0r_ is now known as Peetz0r
10:41 πŸ”— db48x has joined #archiveteam
10:49 πŸ”— db48x has quit IRC (Remote host closed the connection)
10:59 πŸ”— wp494 has quit IRC (Read error: Connection reset by peer)
11:00 πŸ”— WinterFox has joined #archiveteam
11:14 πŸ”— db48x has joined #archiveteam
11:41 πŸ”— fie_ has joined #archiveteam
11:43 πŸ”— fie has quit IRC (Read error: Operation timed out)
11:58 πŸ”— swonsy has joined #archiveteam
12:01 πŸ”— swonsy Hello everybody. tell me, addons.mozilla.org already archived? if so, where you can download files? Thank you
12:01 πŸ”— swonsy where can i download*
12:04 πŸ”— Igloo^ Try the way backmachine?
12:04 πŸ”— Igloo^ It might be archived.
12:05 πŸ”— swonsy Could you give me a link to the archives?
12:06 πŸ”— db48x https://web.archive.org/web/*/addons.mozilla.org
12:07 πŸ”— db48x but the actual extensions aren't archived
12:09 πŸ”— swonsy but i need the archives files the extensions, not just pages of extensions to the AMO
12:09 πŸ”— swonsy yes, i know
12:10 πŸ”— swonsy This is bad
12:12 πŸ”— swonsy then tell me, your team will be archived AMO with extensions files? in future
12:14 πŸ”— swonsy archive*
12:17 πŸ”— db48x it's a good idea
12:19 πŸ”— swonsy of course))
12:21 πŸ”— swonsy because Mozilla dies
12:23 πŸ”— swonsy many extensions already disappeared from the AMO site, as well with the developers sites
12:25 πŸ”— swonsy you need to preserve at least those that have
12:39 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
12:43 πŸ”— vitzli has joined #archiveteam
12:43 πŸ”— WinterFox has quit IRC (Read error: Operation timed out)
12:58 πŸ”— max so i was thinking if you want to warc the entire site, i can write a quick script that reads from the db and just generates a massive list of every page on ytmnd.com as well as all the subdomains
12:59 πŸ”— max SketchCow: also hi!
13:00 πŸ”— max also there's an API that has been down for a few years because no one ever used it and it was a pain to maintain, but it could give a ton of access to otherwise hidden data if i turned it back on and made it work.
13:00 πŸ”— max or i could make the json on the subdomains include more information
13:01 πŸ”— max i colocate and bandwidth is cheap, so frankly i dont care if the archiving is ddosing the site.
13:01 πŸ”— max that said, i can copy the 1.7tb assets archive to you guys faster than using single http gets
13:01 πŸ”— arkiver We should do both
13:02 πŸ”— max the whole asset dir is /#/#/<md5>.<gif|jpg|mp3|etc>
13:02 πŸ”— max and they're immutable
13:02 πŸ”— arkiver If we don't crawl this through HTTP GETs it will not be in the wayback machine
13:02 πŸ”— max ok
13:03 πŸ”— arkiver A copy of the data would be nice to have too, besides the crawl
13:03 πŸ”— max the only data im hesitant to provide is email/password hashes/private messages
13:04 πŸ”— arkiver I'm not sure about a dump, but it won't be in the crawled data if it is not publicly available information
13:04 πŸ”— arkiver As for the dump and private information,
13:04 πŸ”— max people dated people they met on ytmnd, so there are likely some very personal messages
13:05 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
13:05 πŸ”— arkiver Internet Archive can keep items dark ('inaccessible'), so that might be what you want for a copy for the data
13:05 πŸ”— arkiver Or you can encrypt the private data only and send it that way
13:06 πŸ”— arkiver Or leave it out, of course
13:06 πŸ”— max im just not sure i see the value in archiving private messages
13:06 πŸ”— arkiver SketchCow ^
13:06 πŸ”— max i checked last night and the view data on sites is the majority of the data, at 460 million rows
13:07 πŸ”— arkiver ok
13:07 πŸ”— arkiver I think it'd be good to talk with SketchCow about what we should do with the copy of the data.
13:08 πŸ”— arkiver For the crawl, it would be great if you can create that list of every page and subdomain
13:08 πŸ”— max and i guess it might be worth making the html5 player a bit nicer just so people in the future can see them
13:11 πŸ”— BartoCH has joined #archiveteam
13:16 πŸ”— arkiver That might be a nice idea
13:31 πŸ”— Frogging this is exciting :)
14:02 πŸ”— nicolas17 has joined #archiveteam
14:18 πŸ”— wp494 has joined #archiveteam
14:30 πŸ”— tuankiet has quit IRC (Ping timeout: 246 seconds)
14:38 πŸ”— tuankiet6 has joined #archiveteam
15:37 πŸ”— DoomTay has joined #archiveteam
15:37 πŸ”— DoomTay max: Out of curiosity, how does YTMND handle busy servers?
15:38 πŸ”— DoomTay Because we once dealt with a website where in such a situation, the site would instead serve a "servers are busy" message while still having a status code of 200
15:49 πŸ”— max well
15:50 πŸ”— max it doesn't do anything special
15:50 πŸ”— max but spidering would technically pollute the view data
15:50 πŸ”— max not that it's really that important
15:50 πŸ”— max but it was tuned to get a lot more traffic than it does now so it should probably be fine in that regard
15:51 πŸ”— vitzli has quit IRC (Quit: Leaving)
15:55 πŸ”— DoomTay Anyway, it looks like they're going to look at JSONs with the domain always being on picard.ytmnd.com, which I'm not sure sure about. Unless they will eventually crawl the JSON at a given site's actual domain, it will probably wind up broken once the site is on Wayback Machine
16:02 πŸ”— SketchCow max: Can you provide Internet Archive a copy of the data with all private messages removed?
16:06 πŸ”— Frogging if the private messages are removed then can the dump be made public?
16:07 πŸ”— arkiver <max>but spidering would technically pollute the view data
16:08 πŸ”— arkiver It's probably best to first get a copy to IA and after that do the crawl, so the original statistics are saved
16:17 πŸ”— tomwsmf has joined #archiveteam
16:26 πŸ”— joepie91 max: what are your thoughts on my question last night, regarding providing the HTML5 player as an open-source thing so that people can continue to develop it?
16:27 πŸ”— joepie91 (if they so desire)
16:52 πŸ”— Morbus has quit IRC (Quit: http://www.disobey.com/)
17:16 πŸ”— SketchCow GAWKER.COM closes down last week.
17:16 πŸ”— SketchCow Anything left to grab? We were pretty comprehensive.
17:17 πŸ”— riordan has joined #archiveteam
17:26 πŸ”— W has joined #archiveteam
17:27 πŸ”— riordan @SketchCow Forgive me for being a perma-n00b/admirer but when you grabbed gawker, did you also grab the whole gawker media/kinja network?
17:27 πŸ”— riordan There’s a bunch of real weird shit in there like dog.gawker.com that’s well… fascinating
17:27 πŸ”— SketchCow We're likely to double-check
17:28 πŸ”— riordan also tons of their posts are crazy reliant on embedded content (embedded tweets)
17:28 πŸ”— riordan awesome
17:28 πŸ”— kristian_ has joined #archiveteam
17:30 πŸ”— riordan On behalf of the staff of old-school cultural heritage orgs: thank you all for doing this when we wont
17:30 πŸ”— riordan because computers scare us and we’ve been told they’re very expensive
17:30 πŸ”— xmc <3
17:31 πŸ”— xmc embedded tweets archive pretty well
17:31 πŸ”— xmc because they're a <blockquote> with some javascript that makes it look fancy
17:32 πŸ”— bithippo has joined #archiveteam
17:32 πŸ”— max joepie91: it's sort of already open source. i originally wanted to make all the code open source but never ended up doing it because i was ashamed of some of the older code
17:33 πŸ”— nicolas17 "available and non-obfuscated if you click 'view source'" != "open source and under a free license" :)
17:33 πŸ”— max right
17:34 πŸ”— W has quit IRC (Ping timeout: 268 seconds)
17:34 πŸ”— max it's open source minus the license, i'd have no problem making it gpl or whatever you guys suggest
17:34 πŸ”— bithippo Is ArchiveTeam picking up gawker.com? gawker.com/gawker-com-to-end-operations-next-week-1785455712
17:37 πŸ”— SketchCow bithippo: Can we give you a task?
17:37 πŸ”— bithippo I accept all sorts of tasks.
17:37 πŸ”— SketchCow 1. Sit in this channel
17:38 πŸ”— SketchCow 2. For the next 12 hours, when someone with a new name comes in and goes "WHAT ABOUT THE GAWKERZ"
17:38 πŸ”— SketchCow 3. You say "We're on it!"
17:38 πŸ”— bithippo Point taken :) My apologies.
17:38 πŸ”— SketchCow No point
17:38 πŸ”— SketchCow I'm assigning you this task
17:38 πŸ”— SketchCow Pretty simple one
17:38 πŸ”— nicolas17 we were talking about it literally right before you joined :P
17:39 πŸ”— bithippo Engage maximum regret.
17:39 πŸ”— Frogging [13:16:24] <@SketchCow> GAWKER.COM closes down last week.
17:39 πŸ”— Frogging is it last week or next week :p
17:39 πŸ”— SketchCow Next week.
17:39 πŸ”— SketchCow I'm ..... distracted today.
17:39 πŸ”— Frogging I thought it was a metaphor or something, heh
17:41 πŸ”— DoomTay The conbination of tense and time frame was pretty confusing
17:41 πŸ”— verifiedJ has joined #archiveteam
17:42 πŸ”— SketchCow sets mode: +b *!*webchat@*.res.bhn.net
17:42 πŸ”— DoomTay was kicked by SketchCow (DoomTay)
17:42 πŸ”— SketchCow (I'm interested if he sticks around if he's just in #archivebot)
17:43 πŸ”— nicolas17 o.o
17:44 πŸ”— nicolas17 why was that?
17:44 πŸ”— SketchCow nicolas17.
17:44 πŸ”— SketchCow If you come into Act 2 of the play
17:44 πŸ”— SketchCow Please avoid trying to ask why everyone's doing everything on stage
17:45 πŸ”— SketchCow https://archive.org/download/Uptime_Magazine_Volume_11_Number_5_1985_Side_1/screenshot_00.jpg
17:48 πŸ”— Morbus has joined #archiveteam
17:51 πŸ”— schbirid has joined #archiveteam
17:54 πŸ”— ikreymer has joined #archiveteam
17:57 πŸ”— alembic has joined #archiveteam
18:00 πŸ”— phuzion Who is the main point of contact for archiving Gawker at this point?
18:01 πŸ”— joepie91 max: one sec
18:01 πŸ”— joepie91 max: have a look here: http://cryto.net/~joepie91/blog/2013/03/21/licensing-for-beginners/
18:01 πŸ”— joepie91 max: and don't be afraid about code quality, I can assure you that people would much rather have crappy code be open-source, than not available/reusable at all :)
18:01 πŸ”— joepie91 (and that's assuming that it's crappy to begin with)
18:01 πŸ”— joepie91 at least when it's open-source, they can safely improve it
18:02 πŸ”— joepie91 (also, technically speaking, something cannot be "open-source" unless it's licensed under an OSI-compliant license :P)
18:02 πŸ”— joepie91 (er, sorry, OSD)
18:04 πŸ”— gfscott has joined #archiveteam
18:05 πŸ”— Nemo_bis joepie91: CC0 is not a license
18:06 πŸ”— joepie91 it is
18:06 πŸ”— Nemo_bis No.+
18:06 πŸ”— joepie91 it is an attempt at public domain dedication that falls back to a license
18:06 πŸ”— Atluxity lets not discuss that here
18:06 πŸ”— * Nemo_bis shuts up the nitpicker before it gets too late.
18:06 πŸ”— xmc ^
18:09 πŸ”— m4rk3r has joined #archiveteam
18:10 πŸ”— ikreymer has quit IRC ()
18:11 πŸ”— ikreymer has joined #archiveteam
18:16 πŸ”— AlexLehm has joined #archiveteam
18:32 πŸ”— kristian_ has quit IRC (Leaving)
18:48 πŸ”— riordan_ has joined #archiveteam
18:50 πŸ”— riordan has quit IRC (Read error: Operation timed out)
18:50 πŸ”— riordan_ is now known as riordan
18:55 πŸ”— SketchCow CC0 is a license.
18:55 πŸ”— SketchCow There, we're done.
18:56 πŸ”— SketchCow It's allowed to be a license you think is a fucking joke, just like POSIX is a joke
18:56 πŸ”— SketchCow (Get up get up get and get down / POSIX is a joke in your town)
18:56 πŸ”— SketchCow So, I'm on a show tonight.
18:56 πŸ”— SketchCow http://amyontheradio.com/
18:58 πŸ”— nicolas17 SketchCow: I heard RMS regrets renaming the POSIX_ME_HARDER environment variable to POSIXLY_CORRECT
19:05 πŸ”— alembic has quit IRC (Ping timeout: 268 seconds)
19:09 πŸ”— riordan has quit IRC (riordan)
19:10 πŸ”— riordan_ has joined #archiveteam
19:17 πŸ”— riordan_ has quit IRC (Read error: Operation timed out)
19:28 πŸ”— riordan has joined #archiveteam
19:49 πŸ”— swonsy has quit IRC (Quit: Page closed)
19:56 πŸ”— riordan has quit IRC (riordan)
19:57 πŸ”— riordan has joined #archiveteam
19:58 πŸ”— riordan_ has joined #archiveteam
19:58 πŸ”— riordan has quit IRC (Read error: Operation timed out)
20:11 πŸ”— riordan_ has quit IRC (Ping timeout: 633 seconds)
20:21 πŸ”— AlexLehm SketchCow: will the radio show be archived by you?
20:22 πŸ”— SketchCow Well, by archive team
20:23 πŸ”— HCross What time are you on?
20:25 πŸ”— bithippo "Who Will Archive ArchiveTeam?"
20:25 πŸ”— schbirid has quit IRC (Quit: Leaving)
20:25 πŸ”— arkiver http://www.deeptalkradio.com/network-schedule/
20:27 πŸ”— AlexLehm i wonder if i can just start wget and keep it open, the show time is too late for europe
20:28 πŸ”— HCross 9pm ET
20:29 πŸ”— HCross or 2am London
20:30 πŸ”— Kaz bithippo: we do
20:31 πŸ”— bithippo Kaz: Should've added the /s, sorry about that
20:31 πŸ”— Kaz To be fair though, (I say this because I haven't seen you around here before), there were/are plans to back up the IA
20:32 πŸ”— schbirid has joined #archiveteam
20:32 πŸ”— Kaz so, you joke but there is some actual project there :)
20:32 πŸ”— bithippo I joke, but I know you're entirely serious. One of my projects on the backburner is to figure out how to dynamically assign IA torrents to torrent client endpoints that exist solely to backup a shard of the IA
20:32 πŸ”— bithippo _in my spare time_
20:33 πŸ”— xmc so like ia.bak but a different way
20:33 πŸ”— bithippo Similar to ArchiveTeam warriors, but for distributed storage
20:33 πŸ”— bithippo yeah
20:33 πŸ”— bithippo So you'd spin up the VM on a machine with a lot of store, and IA would hand you torrents to consume and backup locally that were currently least distributed to backup clients.
20:37 πŸ”— arrith has joined #archiveteam
21:19 πŸ”— ikreymer has quit IRC (Read error: Connection reset by peer)
21:20 πŸ”— ikreymer has joined #archiveteam
21:27 πŸ”— bithippo has quit IRC (Quit: Page closed)
21:48 πŸ”— verifiedJ has left
22:09 πŸ”— Jogie has joined #archiveteam
22:15 πŸ”— m4rk3r has quit IRC (m4rk3r)
22:17 πŸ”— gfscott has quit IRC (gfscott)
22:26 πŸ”— Stiletto has quit IRC (Ping timeout: 246 seconds)
22:38 πŸ”— BlueMaxim has joined #archiveteam
22:39 πŸ”— ikreymer has quit IRC (Remote host closed the connection)
22:41 πŸ”— ikreymer has joined #archiveteam
22:46 πŸ”— ikreymer has quit IRC (Remote host closed the connection)
22:47 πŸ”— ikreymer has joined #archiveteam
22:49 πŸ”— William has joined #archiveteam
22:49 πŸ”— William Does Archiveteam plan on jamming Gawker.com into the warrior? - http://gawker.com/gawker-com-to-end-operations-next-week-1785455712
22:50 πŸ”— godane William: its done: https://archive.org/search.php?query=subject%3A%22gawker.com%22
22:51 πŸ”— William Says sitemap, is the content downloaded?
22:52 πŸ”— godane http://gawker.com/sitemap_bydate.xml?startTime=2016-08-01T00:00:00&endTime=2016-12-31T23:59:59
22:52 πŸ”— godane all gawker.com sites have sitemaps
22:53 πŸ”— William has quit IRC (Client Quit)
23:08 πŸ”— joepie91 actually, I'll post it here as well
23:08 πŸ”— joepie91 https://searx.me/
23:08 πŸ”— joepie91 this search engine lets you get results as JSON
23:08 πŸ”— joepie91 can be useful for discovery
23:09 πŸ”— Honno has quit IRC (Read error: Operation timed out)
23:43 πŸ”— Stiletto has joined #archiveteam
23:56 πŸ”— W has joined #archiveteam

irclogger-viewer