#archiveteam-bs 2017-09-17,Sun

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***qwebirc57 has joined #archiveteam-bs [00:03]
qwebirc57unstable fucking piece of shit [00:04]
***dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
qwebirc57 is now known as dd0a13f37
Honno has quit IRC (Read error: Operation timed out)
[00:04]
dd0a13f37I missed anything? [00:07]
***dd0a13f3T has joined #archiveteam-bs
dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
[00:14]
JAANah [00:18]
***refeed has joined #archiveteam-bs
dd0a13f3T is now known as dd0a13f37
[00:21]
..... (idle for 21mn)
drumstick has quit IRC (Read error: Operation timed out) [00:43]
dd0a13f37If something is on usenet, is it considered archived? And would it be a good idea to upload library genesis torrents to archive.org, or would that be considered wasting space/bandwidth for piracy? [00:44]
JAAI've heard that there might be a copy of libgen at IA already (but not publicly available). Not sure if it's true though.
And although Usenet is safe-ish, I wouldn't consider it archived. Stuff still disappears from it sooner or later.
[00:53]
dd0a13f37You can upload a torrent to IA and have them download it, right? [00:54]
JAAYes, I believe so. [00:54]
dd0a13f37Then you could download their zip file of torrents, upload them to archive.org, then wait for them to pull it
But is it worth it? It's 30tb of data, and it will likely be hidden
The databases are archived
https://archive.org/details/libgen-meta-20150824
[00:54]
***dd0a13f3 has joined #archiveteam-bs
BlueMaxim has joined #archiveteam-bs
[00:57]
JAAI wouldn't be surprised if either https://archive.org/details/librarygenesis or https://archive.org/details/gen-lib contained a full (hidden) archive. [01:01]
***dd0a13f37 has quit IRC (Ping timeout: 268 seconds) [01:01]
dd0a13f3Should I avoid uploading it, or will it recognize and deuplicate? [01:04]
***dd0a13f3 is now known as dd0a13f37 [01:05]
dd0a13f37both of these are 3 years old, so they're outdated at any rate [01:08]
godaneso i'm going thur my web archives that i have not uploaded
or at least thought i uploaded and turned out i didn't
[01:08]
dd0a13f37Okay, so if I have a url pointing to a zip file of torrents, can I just give them the URL?
No, apparently not. How does this "derive" stuff work, can I have them unpack a zip file for me?
[01:10]
JAAdd0a13f37: That's when the collection was created, not when any items in the collection were added/last updated.
By the way, the graph for the number of items in the second collection of the two looks interesting...
[01:17]
dd0a13f37Sure, but who would update such a collection? [01:17]
JAASomeone from IA? [01:18]
dd0a13f372k items is much too small, they have 2m books. Or is it the amount of folders? [01:18]
JAAAn item can hold an arbitrary number of directories and files (more or less, there seem to be some issues if the items get very large).
If they have a copy, they certainly wouldn't throw it all into one item, and they also certainly wouldn't throw each book/article into its own item.
[01:20]
dd0a13f37The torrents are folders named XXXX000, where XXXX is the unique identified (from 0-2092) [01:21]
JAAWell, then 2k sounds about right? [01:21]
dd0a13f37So that could mean there are 2k different oflders
Yeah
Although, looking at the graph it seems more like 1.4k, or is it log?
[01:21]
JAAJAA shrugs
Looks like it might be rounded, so the top of the graph is 1.5k.
[01:24]
godanei'm reuploading my images.g4tv.com dumps [01:25]
dd0a13f37Should I upload them again then?
They're also missing sci-mag, which is around 50tb
[01:26]
JAADefinitely ask IA about this first.
But I doubt that that dataset is going to disappear anytime soon.
There are certainly several copies stored in various places.
(Including the ones publicly available via Usenet or torrents.
)
[01:26]
dd0a13f37Yes, that's true. The torrents are seeded, and various mirrors have more or less complete copies. [01:28]
godanelooks like i upload them nevermind [01:30]
dd0a13f37Sci-mag is worse off, but on the other hand they have sci-hub which has multiple servers run by people who are not subject to any jurisdiction
So both collections should be fine
[01:32]
.... (idle for 19mn)
***drumstick has joined #archiveteam-bs [01:51]
............ (idle for 58mn)
VADemon_ has quit IRC (left4dead) [02:49]
hook54321Should I check if a piece of software is already on archive.org before going through all my CDs? [02:57]
dd0a13f37To upload or to download?
If they're somehow part of a collection then it might not be such a huge deal
[03:06]
hook54321What do you mean? [03:21]
dd0a13f37If you have some collection of software on 10 different disks that you bought as a bundle then it might have historical value as a whole even if all the software exists separately [03:24]
...... (idle for 25mn)
hook54321it's mostly single disks, bought separately. [03:49]
dd0a13f37Well, it can't be that much storage wasted even if you do upload it twice
could be different versions as well
[03:57]
hook54321If it has a different cover then I would definitely upload it [03:58]
***drumstick has quit IRC (Read error: Operation timed out)
drumstick has joined #archiveteam-bs
[04:02]
..... (idle for 24mn)
hook54321arkiver: I left the channel [04:28]
.... (idle for 18mn)
***Sk1d has quit IRC (Ping timeout: 194 seconds) [04:46]
Sk1d has joined #archiveteam-bs [04:52]
refeed has quit IRC (Ping timeout: 600 seconds) [04:59]
....... (idle for 34mn)
pizzaiolo has quit IRC (Quit: pizzaiolo)
refeed has joined #archiveteam-bs
[05:33]
....... (idle for 32mn)
icedice has quit IRC (Quit: Leaving)
Dimtree has quit IRC (Read error: Operation timed out)
[06:05]
........... (idle for 50mn)
hook54321Did we grab all the duckduckgo stuff? [06:57]
***Dimtree has joined #archiveteam-bs [07:01]
..... (idle for 20mn)
Soni has quit IRC (Ping timeout: 272 seconds) [07:21]
Stilett0 has joined #archiveteam-bs
DFJustin has quit IRC (Remote host closed the connection)
DFJustin has joined #archiveteam-bs
swebb sets mode: +o DFJustin
[07:28]
......... (idle for 43mn)
Asparagir has quit IRC (Asparagir) [08:17]
kristian_ has joined #archiveteam-bs [08:25]
Honno has joined #archiveteam-bs [08:37]
.... (idle for 15mn)
kristian_ has quit IRC (Quit: Leaving) [08:52]
....... (idle for 32mn)
schbirid has joined #archiveteam-bs
refeed has quit IRC (Read error: Operation timed out)
[09:24]
tuluu has quit IRC (Read error: Operation timed out) [09:35]
.... (idle for 17mn)
underscor has joined #archiveteam-bs
swebb sets mode: +o underscor
[09:52]
tuluu has joined #archiveteam-bs [10:02]
BartoCH has joined #archiveteam-bs [10:15]
zhongfu_ has quit IRC (Ping timeout: 260 seconds)
zhongfu has joined #archiveteam-bs
[10:29]
.... (idle for 15mn)
Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam-bs
[10:44]
.... (idle for 16mn)
noirscape has joined #archiveteam-bs [11:00]
BlueMaxim has quit IRC (Quit: Leaving)
drumstick has quit IRC (Read error: Operation timed out)
[11:09]
joepie91_hook54321: definitely upload it; if it turns out to be a duplicate it can always be removed later
hook54321: there are often many different editions of the same thing
[11:19]
***Soni has joined #archiveteam-bs [11:26]
pizzaiolo has joined #archiveteam-bs [11:36]
tuluu_ has joined #archiveteam-bs
tuluu has quit IRC (Read error: Operation timed out)
[11:48]
...... (idle for 25mn)
dd0a13f37 has quit IRC (Ping timeout: 268 seconds) [12:14]
.... (idle for 19mn)
JAAhttp://www.instructables.com/id/How-to-fix-a-Samsung-external-m3-hard-drive-in-und/ :-) [12:33]
.......... (idle for 46mn)
***wp494 has quit IRC (Read error: Connection reset by peer)
wp494 has joined #archiveteam-bs
[13:19]
........... (idle for 51mn)
schbirid has quit IRC (Quit: Leaving) [14:11]
etudier has joined #archiveteam-bs
Stilett0 has quit IRC (Read error: Operation timed out)
[14:17]
............ (idle for 58mn)
etudier has quit IRC (Remote host closed the connection) [15:19]
secondThey say archive.org did a faulty job of archiving something, but they have the new forums up, can you guys archive their backup? http://gamehacking.org/ Scroll down to news for Aug 10th
Or I can archive it but where do I upload it to get it into the archive and what is the proper way to do so?
[15:26]
JAAsecond: Is GameHacking itself also in danger, or is this just about the WiiRd forum archive?
Whatever. GH isn't that big anyway. I'll throw it into ArchiveBot.
Scratch the "not that big", but it's worth archiving the entire thing. Looks like it has tons of useful resources.
[15:31]
***mls has quit IRC (Read error: Connection reset by peer)
mls has joined #archiveteam-bs
[15:39]
.... (idle for 16mn)
secondJAA: just the WiiRd forum
JAA: you're going to have a hard time archiving the gamehacking parts though
Lots of javascript on the page, I was doing it but chrome headless crashed with the setup I was using in docker w/ warcproxy
I'll redo it when I get some time and hopefully when firefox headless comes out
I have a juypter notebook with the code for doing it
going through each page of the manuals and clicking expand
If you can archive the other stuff / whatever you can that would be great because I'm only going for the cheat codes
Very useful for emulators / games old and new
There are some games which are pretty much unplayable without cheat codes because they required certain hardware things
Think pokemon trading to evolve or Django the Solar boy requiring the litteral sun
[15:56]
JAAHm, I haven't found anything that didn't work for me without JavaScript yet.
Do you have an example?
[15:59]
secondhttp://gamehacking.org/game/4366
Click the down arrows on the side
[16:02]
JAAAh yeah, just saw that now. [16:02]
secondThey require javascript and outputs the codes for each cheat device
Even includes notes
Its too bad archivebot can't accept javascript to run on each page or something like selenium commands but archivebot doesn't even work like that from what I gather
Its more like a distributed wget
perhaps one day it can be upgrade to a very lite and small browser or even a proxy that a archive browser uses to hit pages
Still a partial archive is better than no archive
JAA: is there an archive of allrecipes?
And are you adding gamehacking.org to the archive?
[16:02]
JAAArchiveBot does have PhantomJS, but that doesn't work too well and wouldn't help in this case at all.
Or to be precise, wpull supports PhantomJS, and ArchiveBot uses wpull internally.
[16:05]
secondwpull hasn't been updated in the longest!
And isn't taking pull requests either
[16:06]
JAABut that's just for scrolling and loading scripted stuff. It doesn't work for clicking on things etc.
Yes, I know. chfoo's been pretty busy, from what I gathered.
[16:06]
secondIs there a more updated version and does it work with youtube-dl now / still?
hmm they are actually in here
[16:06]
JAAI know that youtube-dl is broken on at least most pipelines. [16:07]
secondThey could try giving permissions for others to merge code in or push to the project [16:07]
JAANo idea if it works when used directly with wpull.
There's the fork by FalconK, which has a few bug fixes, but other than that I'm not aware of anyone working on it.
I've been working on URL priorisation for a while now, but I haven't spent much time on it really.
FalconK's also pretty busy currently, so yeah, nobody's even trying to maintain it.
[16:07]
secondURL priorisation?
What is everyone busy with?
Is there a good way to save wikia websites?
So I have a lot of questions, its not often I'm on efnet (maybe I'll fix that) and I've been interested in archiving for a long time
[16:11]
JAAhttps://gist.github.com/JustAnotherArchivist/b82f7848e3c14eaf7717b9bd3ff8321a
This is what I wrote a while ago about my plans.
It's semi-implemented, but there's still some stuff to do, in particular there is no plugin interface yet, which is necessary to then implement it into ArchiveBot (and grab-site).
People are busy with real-life stuff, I guess.
Wikia's just Mediawiki, isn't it? There are two ways to save that, either through WikiTeam (no idea how active that is) or through ArchiveBot.
[16:14]
secondCan the archivebot archive a flakey site which requires login? [16:16]
JAAAnd regarding your earlier questions: there is no record of an archive of allrecipes in ArchiveBot; someone shared a dump in here a few months ago, but that's not a proper archive and can't be included in the Wayback Machine.
Yes, I added gamehacking.org to ArchiveBot.
[16:17]
secondYeah, I found that one [16:18]
JAANo, login isn't supported by ArchiveBot.
Neither is CloudFlare DDoS protection and stuff like that, by the way.
[16:18]
seconddang, did not know about cloudflare
Why not cloudflare?
That is a lot of sites we can't archive then
[16:18]
JAAJust the DDoS protection bit, i.e. the "Checking your browser" message thingy.
That requires you to solve a JS challenge...
There was some discussion on this in here a few days ago.
[16:19]
secondhttps://github.com/ArchiveTeam/ArchiveBot/issues/216 [16:25]
JAAYes, but cloudflare-scrape is a really shitty and insecure solution.
second: http://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2017-09-14,Thu&sel=124-150#l120
[16:27]
***brayden has quit IRC (Read error: Connection reset by peer)
brayden has joined #archiveteam-bs
swebb sets mode: +o brayden
[16:28]
cf has quit IRC (Ping timeout: 260 seconds)
cf has joined #archiveteam-bs
[16:36]
etudier has joined #archiveteam-bs [16:51]
....... (idle for 33mn)
Stilett0- has joined #archiveteam-bs
Stilett0- is now known as Stiletto
[17:24]
chfooi haven't been feeling like maintaining wpull unfortunately :/ it became a big ball of code [17:41]
***kristian_ has joined #archiveteam-bs
dd0a13f37 has joined #archiveteam-bs
[17:44]
dd0a13f37JAA: cloudflare whitelists tor using some strange voodoo magic (it's not just the user agent and it works without JS), can we utilize this somehow?
Or, well, it depends on the protection level, but for 90% you can browse Tor. It didn't use to be this way, and if you do "copy as curl" from dev tools and paste into terminal w/ torsocks you still get the warning page
[17:46]
JAAdd0a13f37: Interesting. If we knew more about it, we could perhaps use it, yes. I wonder how reliable it is though. [17:53]
dd0a13f37It could be details in SSL is handled
That seems like the only difference I can think of
[17:53]
JAAThat would be painful to replicate. [17:53]
***balrog has quit IRC (Ping timeout: 1208 seconds) [17:54]
JAAI guess implementing joepie91_'s code in a wpull plugin is probably easier. [17:54]
dd0a13f37Even if you do "new circuit for this site" and issue the request with a cookie that shouldn't be valid for that IP it still works [17:54]
JAAHow do you get that cookie initially? [17:54]
dd0a13f37Can't you just add a hook to get a valid cookie without changing any structure?
The site sets it
[17:54]
JAAHm [17:55]
dd0a13f37You get a __cfduid cookie
when connecting to a cf site
[17:55]
JAASo the normal procedure, right. [17:55]
dd0a13f37Are those tied to IPs? [17:55]
JAAYeah, you could implement it as a hook, but the problem is that there is no proper implementation of a bypass. [17:55]
dd0a13f37Because if I copy the exact request and issue it with curl (same cookies, headers, ua) using torsocks it doesn't work
That's the spooky thing
What do you want to bypass? "one more step" or "please turn on js"?
[17:56]
JAA"Checking your browser" [17:56]
dd0a13f37Isn't there? [17:57]
JAAWhich is "please turn on JavaScript" if you have JS disabled.
Not as far as I know.
[17:57]
dd0a13f37So what does joepie91's code do? [17:57]
***balrog has joined #archiveteam-bs
swebb sets mode: +o balrog
[17:57]
JAAIt parses the challenge and calculates the correct response without executing JavaScript. [17:58]
dd0a13f37Isn't that a bypass?
Or what exactly are you looking to do?
[17:58]
JAAYes, it is.
But it's written in JavaScript, not in Python.
https://gist.github.com/joepie91/c5949279cd52ce5cb646d7bd03c3ea36
[17:58]
dd0a13f37Modify it so it prints the cookie to stdout, then just do shell exec
easy solution
[17:59]
JAAYeah, we'd like a pure-Python version so we can avoid installing NodeJS or equivalent.
I mean, it might work on ArchiveBot where we have PhantomJS anyway, but it'd also be nice to have it in the warrior, for example.
[18:00]
dd0a13f37Can't you set it up as a web service? Send challenge page-get response
You only need to do it once
[18:00]
JAAHuh, that's a nice idea actually.
A CF protection cracker API :-)
[18:00]
dd0a13f37"""protection"""
"""cracker"""
[18:01]
JAAHehe [18:01]
dd0a13f37And what about https://github.com/Anorov/cloudflare-scrape ? [18:01]
JAAThat executes CF's code in NodeJS and is inherently insecure. [18:02]
dd0a13f37So it needs node? [18:02]
JAAYou can easily trick it into executing arbitrary code, i.e. use it for RCE.
Yep
[18:02]
dd0a13f37Oh ok
So how does the script work, does it take an entire page and return a cookie?
[18:02]
JAAWhich script? [18:07]
dd0a13f37https://gist.github.com/joepie91/c5949279cd52ce5cb646d7bd03c3ea36 [18:07]
JAAI'm not sure. I've never used it, and I'm not familiar with using JavaScript like that (i.e. outside of a browser) at all. [18:09]
dd0a13f37Me neither
What is executed first? Or is it like a library, so you should look at the exports?
[18:10]
JAAAs far as I can tell, the function in index.js takes the challenge site as an HTML string as the argument and throws out the relevant parts of the JS challenge that you need to combine somehow to get the response.
The challenge looks like this, in case you're not familiar with it:
fVbMmUH={"twaBkDiNOR":+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]))};
fVbMmUH.twaBkDiNOR-=+((+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));fVbMmUH.twaBkDiNOR*=+((!+[]+!![]+!![]+[])+...
So you need to transform each of those JSFuck-like expressions into a number and then -=, *=, etc. those numbers to get the correct response.
[18:10]
dd0a13f37Can't you just use a regex to sanitize it and then execute them unsafely? [18:12]
JAAHahaha, good luck sanitising JSFuck.
I think cloudflare-scrape tries, but yeah...
[18:12]
dd0a13f37Oh, it can execute code, not just return a value?
well then you're fucked
[18:13]
JAAYeah. The code would be huge, but you can write *any* JS script with just the six characters ()[]+! used in the challenge.
https://en.wikipedia.org/wiki/JSFuck
[18:14]
dd0a13f37Was that an actual example or just randomly generated? [18:15]
JAAThat's an actual example. [18:16]
dd0a13f37Where can I find one?
A complete one
[18:16]
JAAhttps://gist.github.com/anonymous/85c9b2b57726135a2500a8425b370095 [18:18]
dd0a13f37I don't understand the purpose
Anyone who wants to do evil stuff would just use one of those scripts, and they're using a botnet so they wouln't care about cloudflare infecting them
What's the point?
[18:23]
JAAIdk either [18:24]
***etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [18:26]
dd0a13f37I don't get it, why can't you just use proxies for the really unfriendly sites? [18:28]
***Asparagir has joined #archiveteam-bs [18:28]
JAAAnd by the way, it's not just about CloudFlare serving evil code. Anyone could easily trigger cloudflare-scrape from their own server with an appropriate response. [18:29]
***svchfoo3 sets mode: +o Asparagir
svchfoo1 sets mode: +o Asparagir
[18:29]
dd0a13f37Well, I doubt you care about ACE when running a botnet [18:29]
JAASpecifically: https://github.com/Anorov/cloudflare-scrape/blob/ee17a7a145990d6975de0be8d8bf5b0abbd87162/cfscrape/__init__.py#L41-L47
Yeah, I just mean in general.
[18:30]
dd0a13f37There are commercial proxy providers with clean IPs, the cost of renting a bunch would probably be cheaper than what you spend on hard drives
Got another response from itorrents, he said he would upload database to archive.org and send link, the other three still haven't responded
[18:31]
JAA: Looking at generated jsfuck code, it's usually very long
CF is quite short
so you should be able to use a regex and limit the length
for example encoding the character a is 846 chars encoded
http://www.jsfuck.com/
And CF's brackets are always empty - [], jsfuck needs to have something inside to eval
[18:42]
JAAYeah, I'm aware of that. It's still sloppy though. [18:48]
dd0a13f37It should be safe though [18:49]
JAAI don't think you strictly need something inside the brackets to do things in JSFuck, but it probably helps shorten the obfuscated code. [18:49]
dd0a13f37You can never get the eval() you need to do bad things
It shouldn't be turing complete
[18:50]
JAAPossible
I don't really know enough about JSFuck to say for sure.
[18:53]
***arkhive has joined #archiveteam-bs [18:57]
dd0a13f37https://esolangs.org/wiki/JSFuck
it needs a big blob which is not possible to encode in under a certain amount of characters, it's ugly as fuck but it should be safe
the eval blob is 831 characters, so if you set an upper limit at 200 you should be fine
[18:57]
***etudier has joined #archiveteam-bs
etudier has quit IRC (Client Quit)
dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
[19:02]
mundusWhat's the best tool for large site archival? [19:07]
***arkhive has quit IRC (Quit: My iMac has gone to sleep. ZZZzzz…) [19:07]
.... (idle for 15mn)
JAAmundus: Define "large"? [19:22]
munduslike a million pages [19:23]
JAAwpull can handle that easily, assuming you have sufficient disk space. [19:23]
mundusOkay
I was guessing wpull
[19:23]
JAANot sure if it's the "best" tool, but it works well.
I've ran multi-million URL archivals with wpull several times.
[19:24]
mundusalright, what options do you normally use? [19:24]
JAAI think I mostly copied those used in ArchiveBot, then adapted them a bit in some cases.
https://github.com/ArchiveTeam/ArchiveBot/blob/a6e6da8ba37e733e4b10b7090b5fc4a6cffc9119/pipeline/archivebot/seesaw/wpull.py#L18-L53
[19:25]
munduscool, thanks [19:26]
joepie91_mundus: you may find grab-site useful also
sort of like a local archivebot
mundus: ref https://github.com/ludios/grab-site
[19:35]
mundusoh nice [19:36]
secondchfoo: do you have a doc explaining how wpull works with youtube-dl etc or how it should work? [19:47]
How do I become a member of the ArchiveTeam and what would that mean?
JAA: is there a doc somewhere with how the IA archives things and keeps bacups?
backups
[19:55]
***etudier has joined #archiveteam-bs
BartoCH has quit IRC (Ping timeout: 260 seconds)
[19:59]
JAAsecond: You become a member by doing stuff that aligns with AT's activities. There isn't anything formal.
There is some stuff in the "help" section of archive.org, and also some blog entries. Not sure what else exists.
I don't think the individual archival strategies etc. are documented well (publicly) though.
[20:04]
***BartoCH has joined #archiveteam-bs [20:12]
jrwrsecond: anyone can do /something/ we are more of a method then anything, what do you want to do? [20:21]
***kristian_ has quit IRC (Remote host closed the connection) [20:26]
secondnot sure, I'm more working on file categorization / curation right now
What kind of things shouldn't we archive?
[20:26]
jrwrWell
Thats a hard question
If you are doing web archival, I would make sure to save everything as WARCs
(wget supports this, so does wpull)
Anything else, just do best quality you can. the more metadata the better
make an account on IA and go to town uploading things
check out SketchCow's IA and see how he uploads things
(for things like CDs, Tapes, Paper)
[20:28]
...... (idle for 27mn)
***DFJustin has quit IRC (Remote host closed the connection) [20:58]
DFJustin has joined #archiveteam-bs
swebb sets mode: +o DFJustin
[21:08]
.... (idle for 17mn)
ZexaronS has quit IRC (Quit: Leaving) [21:25]
............ (idle for 55mn)
drumstick has joined #archiveteam-bs [22:20]
Honno has quit IRC (Read error: Operation timed out) [22:25]
Soni has quit IRC (Ping timeout: 506 seconds) [22:30]
Soni has joined #archiveteam-bs [22:41]
secondDoes the internet archive have deduplication active?
I wouldn't want to upload a bunch of stuff and waste their space
[22:41]
***ZexaronS has joined #archiveteam-bs [22:41]
secondJAA: has this been archived? https://www.reddit.com/r/opendirectories/comments/6zuk7v/alexandria_library_38029_ebooks_from_5268_author/
https://alexandria-library.space/Ebooks/Author/
https://alexandria-library.space/Ebooks/ComputerScience/
https://alexandria-library.space/Images/ww2/north-american-aviation-world-war-2/
https://alexandria-library.space/Images/
[22:43]
JAANot yet, as far as I know, but arkiver just added them to ArchiveBot. [22:45]
arkiveryeah [22:46]
***BartoCH has quit IRC (Quit: WeeChat 1.9) [22:47]
secondDid you do it because I said something or was it already added? I'm wondering if you guys watch that and other reddit(s)
Is there an archive of scihub?
[22:50]
JAAI watch some subreddits, but not opendirectories (yet). [22:53]
arkiveradded because you said it
it looks like something we want to archive
[22:53]
JAAWe were discussing libgen several times in the past few days. See the logs: http://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2017-09-17,Sun
Basically, at this point, I assume that IA has a darked copy of it, and even if they don't, the dataset won't disappear anytime soon and can still be archived *if* libgen actually gets in trouble.
[22:54]
secondIsn't libgen always possibly in trouble?
Different governments / institutions trying to shut it down
JAA are you Jason Scott?
[22:59]
JAAPossible, but I wouldn't be worried about the data until libgen actually goes offline or similar.
The data is available in (active) torrents and on Usenet...
No, that's SketchCow.
[22:59]
secondHow does one setup a Usenet account / get one, is there a guide somewhere? [23:01]
JAAFirst rule of Usenet... [23:01]
secondDammit [23:02]
JAA:-P
Check out /r/usenet. They have a ton of good information.
[23:02]
secondWill you guys archive porn? [23:03]
JAAWell, we did archive Eroshare, so there's that. [23:03]
***Soni has quit IRC (Read error: Connection reset by peer) [23:04]
JAAThere's also that 2 PB webcam archive by /u/Beaston02. [23:04]
secondEh, I found a wiki which list actors in porn but you need to login [23:04]
JAAThat's not on IA though. [23:04]
secondCan you archive it?
Why not?
All this stuff on the IA and the most viewed stuff in the art museum is vintage porn
http://95.31.3.127/pbc/Main_Page
[23:05]
JAAWell, I don't think IA is interested in spending 3-4 million dollars over the next few years for random porn webcams.
(That number is based on https://twitter.com/textfiles/status/885527796583284741 )
[23:06]
secondHow do people archive 2PB of data?! [23:11]
JAAI'm not saying it shouldn't be archived. In general, my opinion is that everything should be kept. Unfortunately though, that's not very realistic, and I think there are more important things to preserve than random porn webcams.
Amazon Cloud Drive and now Google Drive.
[23:11]
secondWait a minute, Jason Scott is the same guy behind textfiles.com, interesting [23:12]
JAASome people suspect that ACD only killed the unlimited offer because of Beaston02 storing those webcam recordings there. [23:12]
secondJAA: are there any upcoming store breakthroughs that you can think of?
Lol, "this is why we can't have nice things"
[23:13]
***ld1 has quit IRC (Read error: Connection reset by peer) [23:14]
JAANo idea really. HAMR will come, but that probably won't really reduce storage costs massively, i.e. not a real breakthrough. DNA storage is still far away, I guess. Otherwise, I don't really know too much about other technologies currently in development. [23:17]
***ld1 has joined #archiveteam-bs [23:20]
etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
etudier has joined #archiveteam-bs
[23:32]
jrwrI think DNA might be a good ROM
not WMRM
or like old school tape drives
[23:38]
JAAYeah, it sounds pretty perfect for long-term archival. [23:49]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)