Time |
Nickname |
Message |
00:01
π
|
DoomTay |
Did it ever allow for multiple pages on a site? |
00:02
π
|
* |
joepie91 walks in |
00:02
π
|
ErkDog |
hey Joe, we're trying to figure out the most graceful way of saving YTMD |
00:03
π
|
joepie91 |
yeah, saw in -bs |
00:03
π
|
arkiver |
max: do you think you can provide a list of sites? |
00:04
π
|
ErkDog |
when you say list, you mean the list of YTMD's? |
00:04
π
|
ErkDog |
YTMND*'s? |
00:04
π
|
nicolas17 |
arkiver: I think you could go through http://blah.ytmnd.com/info/{number}/json incrementally? |
00:04
π
|
* |
joepie91 is currently reading |
00:04
π
|
ErkDog |
should be pretty easy actually he could just send us the zone file |
00:04
π
|
ErkDog |
since literally every YTMND is on it's own host |
00:04
π
|
nicolas17 |
ErkDog: I assume he has a wildcard *.ytmnd.com at the DNS level |
00:04
π
|
joepie91 |
max: do you by any chance still have the source of the HTML5 version, even if it's incomplete? |
00:04
π
|
ErkDog |
ohhhhhh good point :( |
00:05
π
|
ErkDog |
ohhhh well then in his database table |
00:05
π
|
arkiver |
Thanks for pointing that out nicolas17 |
00:05
π
|
joepie91 |
max: if released as open-source it might drive people to continue developing it, even if just as a future-proof way of viewing the YTMBD stuff |
00:05
π
|
joepie91 |
er |
00:05
π
|
arkiver |
max: would the http://blah.ytmnd.com/info/{number}/json method get us all sites? |
00:05
π
|
joepie91 |
YTMND * |
00:05
π
|
ErkDog |
he should have <wahtever>.ytmnd.com so it could match |
00:05
π
|
arkiver |
Or are there some special cases |
00:08
π
|
Frogging |
http://archiveteam.org/index.php?title=YTMND |
00:08
π
|
arkiver |
Awesome Frogging! |
00:09
π
|
ErkDog |
wlel I've been trying random numbers in the json |
00:09
π
|
ErkDog |
up to 25000 with a response so far |
00:09
π
|
ErkDog |
some have said error no site |
00:09
π
|
arkiver |
ok |
00:09
π
|
xmc |
hey max do you have a list of names that you could share? |
00:10
π
|
arkiver |
^that would be very helpful |
00:10
π
|
ErkDog |
https://puu.sh/qF8Bx/9d14b6c213.png |
00:11
π
|
ErkDog |
so basically we could crawl the json's up |
00:11
π
|
ErkDog |
and it gives you the "domain" in the json |
00:11
π
|
|
Froggypwn has quit IRC (Read error: Operation timed out) |
00:11
π
|
ErkDog |
then we crawl the domain.ytmnd.com for WARCing |
00:13
π
|
ErkDog |
https://puu.sh/qF8MY/96a6c609bd.png |
00:13
π
|
ErkDog |
https://puu.sh/qF8NY/de3ee558f9.png |
00:13
π
|
ErkDog |
I joined YTMND 12 yrs ago, jesus |
00:19
π
|
ErkDog |
http://ateam-test-1.ytmnd.com/ |
00:19
π
|
ErkDog |
just made that |
00:20
π
|
ErkDog |
OK I just made one it's ID is: 1008765 |
00:22
π
|
|
howdoicom has quit IRC (Quit: Page closed) |
00:24
π
|
ErkDog |
wow you have got to be kidding me |
00:24
π
|
ErkDog |
I can't edit the YTMND wiki page |
00:24
π
|
ErkDog |
https://puu.sh/qF9sQ/fa44f78678.png |
00:25
π
|
ErkDog |
because YTMND.com is blacklisted external site |
00:25
π
|
joepie91 |
lol |
00:26
π
|
ErkDog |
Frogging must have super powers |
00:26
π
|
|
Petri152 has joined #archiveteam |
00:28
π
|
ErkDog |
I had to put spaces in the URLs I guess someone will have to fix it besides me |
00:28
π
|
|
RedType_ has quit IRC (Read error: Operation timed out) |
00:29
π
|
arkiver |
ytmnd is on the tracker page |
00:30
π
|
arkiver |
max: any limits or special status codes? |
00:30
π
|
ErkDog |
I summarised the info we had so far that would allow a sucessful crawl arkiver |
00:30
π
|
arkiver |
thanks |
00:30
π
|
ErkDog |
barring additional resources/info provided by Max |
00:32
π
|
ErkDog |
Also if it's only 1.7 TB, I could crawl and push that out in a few days, if you didn't want to do all the extra crazy stuff to add it into the warrior |
00:42
π
|
nicolas17 |
ErkDog: I think the 1.7TB includes stuff that isn't publicly or easily accessible |
00:43
π
|
nicolas17 |
so crawling you would get less :P |
00:43
π
|
DoomTay |
Hmm.. for a handful of sites, forcing HTML5 results an error message saying the audio could not be decoded |
00:51
π
|
SketchCow |
max: Jason Scott here, we can also do a full version of the 1.7tb collection and put it into the Internet Archive's dark archives for safekeeping. |
00:51
π
|
arkiver |
We should do both |
00:51
π
|
SketchCow |
We should. That's what I'm saying. |
00:51
π
|
arkiver |
awesome |
00:52
π
|
arkiver |
I think we're going to start the crawls from for example http://ytmnd.com/sites/991586/profile |
00:53
π
|
arkiver |
I'm off |
00:54
π
|
arkiver |
max: if you can, please provide a list of sites, users and keywords (if that isn't easily possible, we can extract some ourselves too) |
00:54
π
|
* |
arkiver is afk for the night |
00:59
π
|
|
JesseW has joined #archiveteam |
01:10
π
|
|
DoomTay has quit IRC (Quit: Page closed) |
01:21
π
|
|
tomwsmf has quit IRC (Read error: Operation timed out) |
01:26
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
01:34
π
|
|
RedType has joined #archiveteam |
01:37
π
|
|
DoomTay has joined #archiveteam |
01:50
π
|
|
JesseW has joined #archiveteam |
02:00
π
|
|
Froggypwn has joined #archiveteam |
02:01
π
|
|
Honno has joined #archiveteam |
02:02
π
|
|
DoomTay has quit IRC (Quit: Page closed) |
02:06
π
|
|
DoomTay has joined #archiveteam |
02:45
π
|
|
DoomTay has quit IRC (Quit: Page closed) |
02:51
π
|
|
tuankiet6 has joined #archiveteam |
02:52
π
|
|
tuankiet6 is now known as tuankiet |
03:04
π
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
03:04
π
|
|
RichardG has joined #archiveteam |
03:12
π
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
03:15
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
03:39
π
|
|
RichardG has joined #archiveteam |
03:50
π
|
|
mutoso has quit IRC (Ping timeout: 260 seconds) |
03:58
π
|
|
mutoso has joined #archiveteam |
04:16
π
|
|
JesseW has joined #archiveteam |
04:18
π
|
|
nicolas17 has quit IRC (Read error: Operation timed out) |
04:22
π
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:30
π
|
|
Sk1d has joined #archiveteam |
04:42
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
04:52
π
|
|
RedType has quit IRC (Read error: Operation timed out) |
05:02
π
|
|
JesseW has joined #archiveteam |
05:31
π
|
|
RichardG has quit IRC (Read error: Operation timed out) |
05:31
π
|
|
RichardG has joined #archiveteam |
05:54
π
|
|
dan- has quit IRC (Ping timeout: 633 seconds) |
05:54
π
|
|
RedType has joined #archiveteam |
05:57
π
|
|
dan- has joined #archiveteam |
06:25
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
06:29
π
|
|
uosdwis has joined #archiveteam |
06:31
π
|
uosdwis |
hi. I'm running the warrior but it's not getting any items. is the tracker down? |
06:32
π
|
|
aschmitz has quit IRC (Read error: Operation timed out) |
06:39
π
|
|
aschmitz has joined #archiveteam |
07:02
π
|
|
uosdwis has quit IRC (Quit: Page closed) |
07:11
π
|
|
RichardG has quit IRC (Ping timeout: 255 seconds) |
07:21
π
|
xmc |
tracker is up, but we complete projects faster than we can start them |
07:21
π
|
xmc |
but uosdwis is gone anyway |
07:22
π
|
Atluxity |
we are too good |
07:25
π
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
07:26
π
|
tuankiet |
Anyone got this error while running googlecode-grab? Lua runtime error: googlecode.lua:375: invalid use of '%' in replacement string. (It's just on Arch Linux I think, my VPS running Ubuntu 16.04 doesn't have this error) |
07:26
π
|
|
BlueMaxim has joined #archiveteam |
07:32
π
|
|
les has joined #archiveteam |
07:32
π
|
les |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
07:33
π
|
xmc |
yahoosucks |
07:33
π
|
les |
got it, thanks |
07:33
π
|
|
les has quit IRC (Client Quit) |
07:52
π
|
|
db48x has quit IRC (Read error: Operation timed out) |
08:02
π
|
|
kristian_ has joined #archiveteam |
08:33
π
|
|
fie has joined #archiveteam |
08:38
π
|
|
RichardG has joined #archiveteam |
08:47
π
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
08:48
π
|
|
BlueMaxim has joined #archiveteam |
09:26
π
|
SketchCow |
Go forth |
10:23
π
|
|
kristian_ has quit IRC (Leaving) |
10:36
π
|
|
Peetz0r_ is now known as Peetz0r |
10:41
π
|
|
db48x has joined #archiveteam |
10:49
π
|
|
db48x has quit IRC (Remote host closed the connection) |
10:59
π
|
|
wp494 has quit IRC (Read error: Connection reset by peer) |
11:00
π
|
|
WinterFox has joined #archiveteam |
11:14
π
|
|
db48x has joined #archiveteam |
11:41
π
|
|
fie_ has joined #archiveteam |
11:43
π
|
|
fie has quit IRC (Read error: Operation timed out) |
11:58
π
|
|
swonsy has joined #archiveteam |
12:01
π
|
swonsy |
Hello everybody. tell me, addons.mozilla.org already archived? if so, where you can download files? Thank you |
12:01
π
|
swonsy |
where can i download* |
12:04
π
|
Igloo^ |
Try the way backmachine? |
12:04
π
|
Igloo^ |
It might be archived. |
12:05
π
|
swonsy |
Could you give me a link to the archives? |
12:06
π
|
db48x |
https://web.archive.org/web/*/addons.mozilla.org |
12:07
π
|
db48x |
but the actual extensions aren't archived |
12:09
π
|
swonsy |
but i need the archives files the extensions, not just pages of extensions to the AMO |
12:09
π
|
swonsy |
yes, i know |
12:10
π
|
swonsy |
This is bad |
12:12
π
|
swonsy |
then tell me, your team will be archived AMO with extensions files? in future |
12:14
π
|
swonsy |
archive* |
12:17
π
|
db48x |
it's a good idea |
12:19
π
|
swonsy |
of course)) |
12:21
π
|
swonsy |
because Mozilla dies |
12:23
π
|
swonsy |
many extensions already disappeared from the AMO site, as well with the developers sites |
12:25
π
|
swonsy |
you need to preserve at least those that have |
12:39
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:43
π
|
|
vitzli has joined #archiveteam |
12:43
π
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
12:58
π
|
max |
so i was thinking if you want to warc the entire site, i can write a quick script that reads from the db and just generates a massive list of every page on ytmnd.com as well as all the subdomains |
12:59
π
|
max |
SketchCow: also hi! |
13:00
π
|
max |
also there's an API that has been down for a few years because no one ever used it and it was a pain to maintain, but it could give a ton of access to otherwise hidden data if i turned it back on and made it work. |
13:00
π
|
max |
or i could make the json on the subdomains include more information |
13:01
π
|
max |
i colocate and bandwidth is cheap, so frankly i dont care if the archiving is ddosing the site. |
13:01
π
|
max |
that said, i can copy the 1.7tb assets archive to you guys faster than using single http gets |
13:01
π
|
arkiver |
We should do both |
13:02
π
|
max |
the whole asset dir is /#/#/<md5>.<gif|jpg|mp3|etc> |
13:02
π
|
max |
and they're immutable |
13:02
π
|
arkiver |
If we don't crawl this through HTTP GETs it will not be in the wayback machine |
13:02
π
|
max |
ok |
13:03
π
|
arkiver |
A copy of the data would be nice to have too, besides the crawl |
13:03
π
|
max |
the only data im hesitant to provide is email/password hashes/private messages |
13:04
π
|
arkiver |
I'm not sure about a dump, but it won't be in the crawled data if it is not publicly available information |
13:04
π
|
arkiver |
As for the dump and private information, |
13:04
π
|
max |
people dated people they met on ytmnd, so there are likely some very personal messages |
13:05
π
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
13:05
π
|
arkiver |
Internet Archive can keep items dark ('inaccessible'), so that might be what you want for a copy for the data |
13:05
π
|
arkiver |
Or you can encrypt the private data only and send it that way |
13:06
π
|
arkiver |
Or leave it out, of course |
13:06
π
|
max |
im just not sure i see the value in archiving private messages |
13:06
π
|
arkiver |
SketchCow ^ |
13:06
π
|
max |
i checked last night and the view data on sites is the majority of the data, at 460 million rows |
13:07
π
|
arkiver |
ok |
13:07
π
|
arkiver |
I think it'd be good to talk with SketchCow about what we should do with the copy of the data. |
13:08
π
|
arkiver |
For the crawl, it would be great if you can create that list of every page and subdomain |
13:08
π
|
max |
and i guess it might be worth making the html5 player a bit nicer just so people in the future can see them |
13:11
π
|
|
BartoCH has joined #archiveteam |
13:16
π
|
arkiver |
That might be a nice idea |
13:31
π
|
Frogging |
this is exciting :) |
14:02
π
|
|
nicolas17 has joined #archiveteam |
14:18
π
|
|
wp494 has joined #archiveteam |
14:30
π
|
|
tuankiet has quit IRC (Ping timeout: 246 seconds) |
14:38
π
|
|
tuankiet6 has joined #archiveteam |
15:37
π
|
|
DoomTay has joined #archiveteam |
15:37
π
|
DoomTay |
max: Out of curiosity, how does YTMND handle busy servers? |
15:38
π
|
DoomTay |
Because we once dealt with a website where in such a situation, the site would instead serve a "servers are busy" message while still having a status code of 200 |
15:49
π
|
max |
well |
15:50
π
|
max |
it doesn't do anything special |
15:50
π
|
max |
but spidering would technically pollute the view data |
15:50
π
|
max |
not that it's really that important |
15:50
π
|
max |
but it was tuned to get a lot more traffic than it does now so it should probably be fine in that regard |
15:51
π
|
|
vitzli has quit IRC (Quit: Leaving) |
15:55
π
|
DoomTay |
Anyway, it looks like they're going to look at JSONs with the domain always being on picard.ytmnd.com, which I'm not sure sure about. Unless they will eventually crawl the JSON at a given site's actual domain, it will probably wind up broken once the site is on Wayback Machine |
16:02
π
|
SketchCow |
max: Can you provide Internet Archive a copy of the data with all private messages removed? |
16:06
π
|
Frogging |
if the private messages are removed then can the dump be made public? |
16:07
π
|
arkiver |
<max>but spidering would technically pollute the view data |
16:08
π
|
arkiver |
It's probably best to first get a copy to IA and after that do the crawl, so the original statistics are saved |
16:17
π
|
|
tomwsmf has joined #archiveteam |
16:26
π
|
joepie91 |
max: what are your thoughts on my question last night, regarding providing the HTML5 player as an open-source thing so that people can continue to develop it? |
16:27
π
|
joepie91 |
(if they so desire) |
16:52
π
|
|
Morbus has quit IRC (Quit: http://www.disobey.com/) |
17:16
π
|
SketchCow |
GAWKER.COM closes down last week. |
17:16
π
|
SketchCow |
Anything left to grab? We were pretty comprehensive. |
17:17
π
|
|
riordan has joined #archiveteam |
17:26
π
|
|
W has joined #archiveteam |
17:27
π
|
riordan |
@SketchCow Forgive me for being a perma-n00b/admirer but when you grabbed gawker, did you also grab the whole gawker media/kinja network? |
17:27
π
|
riordan |
Thereβs a bunch of real weird shit in there like dog.gawker.com thatβs wellβ¦ fascinating |
17:27
π
|
SketchCow |
We're likely to double-check |
17:28
π
|
riordan |
also tons of their posts are crazy reliant on embedded content (embedded tweets) |
17:28
π
|
riordan |
awesome |
17:28
π
|
|
kristian_ has joined #archiveteam |
17:30
π
|
riordan |
On behalf of the staff of old-school cultural heritage orgs: thank you all for doing this when we wont |
17:30
π
|
riordan |
because computers scare us and weβve been told theyβre very expensive |
17:30
π
|
xmc |
<3 |
17:31
π
|
xmc |
embedded tweets archive pretty well |
17:31
π
|
xmc |
because they're a <blockquote> with some javascript that makes it look fancy |
17:32
π
|
|
bithippo has joined #archiveteam |
17:32
π
|
max |
joepie91: it's sort of already open source. i originally wanted to make all the code open source but never ended up doing it because i was ashamed of some of the older code |
17:33
π
|
nicolas17 |
"available and non-obfuscated if you click 'view source'" != "open source and under a free license" :) |
17:33
π
|
max |
right |
17:34
π
|
|
W has quit IRC (Ping timeout: 268 seconds) |
17:34
π
|
max |
it's open source minus the license, i'd have no problem making it gpl or whatever you guys suggest |
17:34
π
|
bithippo |
Is ArchiveTeam picking up gawker.com? gawker.com/gawker-com-to-end-operations-next-week-1785455712 |
17:37
π
|
SketchCow |
bithippo: Can we give you a task? |
17:37
π
|
bithippo |
I accept all sorts of tasks. |
17:37
π
|
SketchCow |
1. Sit in this channel |
17:38
π
|
SketchCow |
2. For the next 12 hours, when someone with a new name comes in and goes "WHAT ABOUT THE GAWKERZ" |
17:38
π
|
SketchCow |
3. You say "We're on it!" |
17:38
π
|
bithippo |
Point taken :) My apologies. |
17:38
π
|
SketchCow |
No point |
17:38
π
|
SketchCow |
I'm assigning you this task |
17:38
π
|
SketchCow |
Pretty simple one |
17:38
π
|
nicolas17 |
we were talking about it literally right before you joined :P |
17:39
π
|
bithippo |
Engage maximum regret. |
17:39
π
|
Frogging |
[13:16:24] <@SketchCow> GAWKER.COM closes down last week. |
17:39
π
|
Frogging |
is it last week or next week :p |
17:39
π
|
SketchCow |
Next week. |
17:39
π
|
SketchCow |
I'm ..... distracted today. |
17:39
π
|
Frogging |
I thought it was a metaphor or something, heh |
17:41
π
|
DoomTay |
The conbination of tense and time frame was pretty confusing |
17:41
π
|
|
verifiedJ has joined #archiveteam |
17:42
π
|
|
SketchCow sets mode: +b *!*webchat@*.res.bhn.net |
17:42
π
|
|
DoomTay was kicked by SketchCow (DoomTay) |
17:42
π
|
SketchCow |
(I'm interested if he sticks around if he's just in #archivebot) |
17:43
π
|
nicolas17 |
o.o |
17:44
π
|
nicolas17 |
why was that? |
17:44
π
|
SketchCow |
nicolas17. |
17:44
π
|
SketchCow |
If you come into Act 2 of the play |
17:44
π
|
SketchCow |
Please avoid trying to ask why everyone's doing everything on stage |
17:45
π
|
SketchCow |
https://archive.org/download/Uptime_Magazine_Volume_11_Number_5_1985_Side_1/screenshot_00.jpg |
17:48
π
|
|
Morbus has joined #archiveteam |
17:51
π
|
|
schbirid has joined #archiveteam |
17:54
π
|
|
ikreymer has joined #archiveteam |
17:57
π
|
|
alembic has joined #archiveteam |
18:00
π
|
phuzion |
Who is the main point of contact for archiving Gawker at this point? |
18:01
π
|
joepie91 |
max: one sec |
18:01
π
|
joepie91 |
max: have a look here: http://cryto.net/~joepie91/blog/2013/03/21/licensing-for-beginners/ |
18:01
π
|
joepie91 |
max: and don't be afraid about code quality, I can assure you that people would much rather have crappy code be open-source, than not available/reusable at all :) |
18:01
π
|
joepie91 |
(and that's assuming that it's crappy to begin with) |
18:01
π
|
joepie91 |
at least when it's open-source, they can safely improve it |
18:02
π
|
joepie91 |
(also, technically speaking, something cannot be "open-source" unless it's licensed under an OSI-compliant license :P) |
18:02
π
|
joepie91 |
(er, sorry, OSD) |
18:04
π
|
|
gfscott has joined #archiveteam |
18:05
π
|
Nemo_bis |
joepie91: CC0 is not a license |
18:06
π
|
joepie91 |
it is |
18:06
π
|
Nemo_bis |
No.+ |
18:06
π
|
joepie91 |
it is an attempt at public domain dedication that falls back to a license |
18:06
π
|
Atluxity |
lets not discuss that here |
18:06
π
|
* |
Nemo_bis shuts up the nitpicker before it gets too late. |
18:06
π
|
xmc |
^ |
18:09
π
|
|
m4rk3r has joined #archiveteam |
18:10
π
|
|
ikreymer has quit IRC () |
18:11
π
|
|
ikreymer has joined #archiveteam |
18:16
π
|
|
AlexLehm has joined #archiveteam |
18:32
π
|
|
kristian_ has quit IRC (Leaving) |
18:48
π
|
|
riordan_ has joined #archiveteam |
18:50
π
|
|
riordan has quit IRC (Read error: Operation timed out) |
18:50
π
|
|
riordan_ is now known as riordan |
18:55
π
|
SketchCow |
CC0 is a license. |
18:55
π
|
SketchCow |
There, we're done. |
18:56
π
|
SketchCow |
It's allowed to be a license you think is a fucking joke, just like POSIX is a joke |
18:56
π
|
SketchCow |
(Get up get up get and get down / POSIX is a joke in your town) |
18:56
π
|
SketchCow |
So, I'm on a show tonight. |
18:56
π
|
SketchCow |
http://amyontheradio.com/ |
18:58
π
|
nicolas17 |
SketchCow: I heard RMS regrets renaming the POSIX_ME_HARDER environment variable to POSIXLY_CORRECT |
19:05
π
|
|
alembic has quit IRC (Ping timeout: 268 seconds) |
19:09
π
|
|
riordan has quit IRC (riordan) |
19:10
π
|
|
riordan_ has joined #archiveteam |
19:17
π
|
|
riordan_ has quit IRC (Read error: Operation timed out) |
19:28
π
|
|
riordan has joined #archiveteam |
19:49
π
|
|
swonsy has quit IRC (Quit: Page closed) |
19:56
π
|
|
riordan has quit IRC (riordan) |
19:57
π
|
|
riordan has joined #archiveteam |
19:58
π
|
|
riordan_ has joined #archiveteam |
19:58
π
|
|
riordan has quit IRC (Read error: Operation timed out) |
20:11
π
|
|
riordan_ has quit IRC (Ping timeout: 633 seconds) |
20:21
π
|
AlexLehm |
SketchCow: will the radio show be archived by you? |
20:22
π
|
SketchCow |
Well, by archive team |
20:23
π
|
HCross |
What time are you on? |
20:25
π
|
bithippo |
"Who Will Archive ArchiveTeam?" |
20:25
π
|
|
schbirid has quit IRC (Quit: Leaving) |
20:25
π
|
arkiver |
http://www.deeptalkradio.com/network-schedule/ |
20:27
π
|
AlexLehm |
i wonder if i can just start wget and keep it open, the show time is too late for europe |
20:28
π
|
HCross |
9pm ET |
20:29
π
|
HCross |
or 2am London |
20:30
π
|
Kaz |
bithippo: we do |
20:31
π
|
bithippo |
Kaz: Should've added the /s, sorry about that |
20:31
π
|
Kaz |
To be fair though, (I say this because I haven't seen you around here before), there were/are plans to back up the IA |
20:32
π
|
|
schbirid has joined #archiveteam |
20:32
π
|
Kaz |
so, you joke but there is some actual project there :) |
20:32
π
|
bithippo |
I joke, but I know you're entirely serious. One of my projects on the backburner is to figure out how to dynamically assign IA torrents to torrent client endpoints that exist solely to backup a shard of the IA |
20:32
π
|
bithippo |
_in my spare time_ |
20:33
π
|
xmc |
so like ia.bak but a different way |
20:33
π
|
bithippo |
Similar to ArchiveTeam warriors, but for distributed storage |
20:33
π
|
bithippo |
yeah |
20:33
π
|
bithippo |
So you'd spin up the VM on a machine with a lot of store, and IA would hand you torrents to consume and backup locally that were currently least distributed to backup clients. |
20:37
π
|
|
arrith has joined #archiveteam |
21:19
π
|
|
ikreymer has quit IRC (Read error: Connection reset by peer) |
21:20
π
|
|
ikreymer has joined #archiveteam |
21:27
π
|
|
bithippo has quit IRC (Quit: Page closed) |
21:48
π
|
|
verifiedJ has left |
22:09
π
|
|
Jogie has joined #archiveteam |
22:15
π
|
|
m4rk3r has quit IRC (m4rk3r) |
22:17
π
|
|
gfscott has quit IRC (gfscott) |
22:26
π
|
|
Stiletto has quit IRC (Ping timeout: 246 seconds) |
22:38
π
|
|
BlueMaxim has joined #archiveteam |
22:39
π
|
|
ikreymer has quit IRC (Remote host closed the connection) |
22:41
π
|
|
ikreymer has joined #archiveteam |
22:46
π
|
|
ikreymer has quit IRC (Remote host closed the connection) |
22:47
π
|
|
ikreymer has joined #archiveteam |
22:49
π
|
|
William has joined #archiveteam |
22:49
π
|
William |
Does Archiveteam plan on jamming Gawker.com into the warrior? - http://gawker.com/gawker-com-to-end-operations-next-week-1785455712 |
22:50
π
|
godane |
William: its done: https://archive.org/search.php?query=subject%3A%22gawker.com%22 |
22:51
π
|
William |
Says sitemap, is the content downloaded? |
22:52
π
|
godane |
http://gawker.com/sitemap_bydate.xml?startTime=2016-08-01T00:00:00&endTime=2016-12-31T23:59:59 |
22:52
π
|
godane |
all gawker.com sites have sitemaps |
22:53
π
|
|
William has quit IRC (Client Quit) |
23:08
π
|
joepie91 |
actually, I'll post it here as well |
23:08
π
|
joepie91 |
https://searx.me/ |
23:08
π
|
joepie91 |
this search engine lets you get results as JSON |
23:08
π
|
joepie91 |
can be useful for discovery |
23:09
π
|
|
Honno has quit IRC (Read error: Operation timed out) |
23:43
π
|
|
Stiletto has joined #archiveteam |
23:56
π
|
|
W has joined #archiveteam |