#archiveteam 2014-10-02,Thu

↑back Search

Time Nickname Message
00:03 🔗 SketchCow underscor shows up and fixes things
00:03 🔗 SketchCow Although when he shows up, it usually means six weeks more winter
00:15 🔗 balrog DFJustin: I thought that was archived long ago?
00:16 🔗 balrog xmc: emirjp is still around, he's in charge of #wikiteam but is never on IRC
00:16 🔗 balrog he does respond to emails afaict
00:22 🔗 DFJustin what was
00:27 🔗 balrog DFJustin: opensolaris
00:29 🔗 DFJustin where is it then
01:05 🔗 GChriss kyan: guessing this is you? https://archive.org/details/mail.google.com-saved-1Oct2014
01:06 🔗 GChriss (just noticed author field)
01:33 🔗 kyan GChriss: you figured it out, but yes :)
01:54 🔗 SketchCow -----------------------------------------------------
01:54 🔗 SketchCow Archive Team Members Up For Being in SF on October 28th:
01:54 🔗 SketchCow https://ia601401.us.archive.org/34/items/LibraryBuildingInvitation/LibraryBuildingInvitation-nolink.html
01:54 🔗 SketchCow -----------------------------------------------------
06:40 🔗 tfgbd sup
06:42 🔗 tfgbd So are you guys like archive.org only better?
06:42 🔗 BlueMaxim More badass would be more appropriate
06:43 🔗 tfgbd I bet.
06:44 🔗 tfgbd Archive.org is indispensible for finding old software downloads.
06:44 🔗 yipdw better is probably the wrong word
06:44 🔗 yipdw IA hosts most of what we get
06:44 🔗 tfgbd I see. So they take submissions
06:45 🔗 tfgbd I wish they had competition, honestly.
06:45 🔗 yipdw yeah, get an account there, and you can upload whatever
06:45 🔗 tfgbd I guess because I've seen so many "long running" sites suddenly disappear eventually
06:45 🔗 tfgbd Do they take MHTs?
06:45 🔗 tfgbd I have TONS of those
06:45 🔗 tfgbd (unfortunately)
06:45 🔗 yipdw what's an MHT
06:45 🔗 yipdw the Microsoft thing
06:46 🔗 tfgbd It's the "save as single file" in IE
06:46 🔗 tfgbd And Opera
06:46 🔗 tfgbd And Chrome if you enable it
06:46 🔗 yipdw I mean, you can upload them, but Wayback won't read them as of yet
06:46 🔗 tfgbd Yeah, I figured
06:46 🔗 tfgbd the other problem with them is figuring out what the URL was
06:46 🔗 tfgbd I imagine google can help sometimes
06:46 🔗 yipdw they're missing a lot of vital information
06:46 🔗 yipdw like the URL
06:46 🔗 tfgbd yeah
06:46 🔗 yipdw and all the request/response headers
06:47 🔗 tfgbd it makes me regret having so many but they ARE so convienient
06:47 🔗 tfgbd and I guess I'm usually too lazy to mess with a spider
06:47 🔗 yipdw consider using instead https://webrecorder.io/
06:47 🔗 yipdw we also have archivebot
06:47 🔗 tfgbd Thanks to IA, I just got H.264 videos playing on Windows NT 3.51
06:48 🔗 tfgbd One thing I hate: robots.txt
06:49 🔗 tfgbd Also, how would one archive dynamic Web 2.0/DHTML/HTML5/cloud crap
06:50 🔗 tfgbd if it's one thing I HATE about the web, it's that stuff
06:51 🔗 yipdw https://webrecorder.io/
06:51 🔗 yipdw we also have archivebot, which has a PhantomJS mode
06:51 🔗 yipdw it is not as robust as webrecorder.io but does not require human interaciton
06:52 🔗 yipdw there are also WARC-generating proxy servers, which serve a similar purpose to webrecorder.io
06:52 🔗 tfgbd Also, is there anything that archives with multiple user agents? There is so much mobile only and wap stuff out there and it displays different depending on your UA
06:52 🔗 yipdw we have archivebot
06:52 🔗 yipdw which has a --user-agent-alias option
06:52 🔗 yipdw webrecorder.io also works with whatever browser you choose to use
06:52 🔗 tfgbd Would it even be possible to archive something like this? pcjs.org/
06:53 🔗 yipdw I don't see why not
06:53 🔗 yipdw archiving *sessions* done in the emulated PCjr environment is tricky
06:53 🔗 yipdw archiving the *code* is not that hard
06:53 🔗 yipdw in fact
06:54 🔗 tfgbd well it's open source, sure
06:54 🔗 tfgbd you can just download it
06:54 🔗 yipdw I added it to the queue in #archivebot, you can join that channel to check on it
06:54 🔗 tfgbd but how could it archive the state of the emulated pc
06:54 🔗 yipdw it won't
06:54 🔗 yipdw no tool does
06:54 🔗 tfgbd yeah, I imagine that is impossible
06:55 🔗 yipdw however, retrieving individual disk images is not hard
06:55 🔗 tfgbd the tools have not kept up with the "advances" of the web
06:55 🔗 yipdw so you can approximate state
06:55 🔗 tfgbd Do you guys do software too?
06:56 🔗 yipdw to some degree, that's inevitable. if there's a webpage that chooses to represent state in a form that isn't capturable via a URL or some other parameter not present in a request, you aren't going to get it as a request/response pair
06:56 🔗 yipdw there's no way around that one except not capturing the result as a request/response pair
06:56 🔗 yipdw so you can save it as, say, an image, or a PDF, a DOM snapshot that could perhaps be unfrozen as some later time
06:56 🔗 SketchCow Lotta talkin'
06:56 🔗 yipdw and yes, there's a huge software archive on IA now
06:56 🔗 yipdw #-bs etc
06:57 🔗 tfgbd Can you submit your own stuff too?
06:57 🔗 yipdw in fact the curator of that software collection just showed up
06:57 🔗 tfgbd in the channel?
06:59 🔗 SketchCow He's going to bed sooner rather than later.
06:59 🔗 tfgbd is there some place I can submit IRC logs?
14:19 🔗 DFJustin tfgbd: https://archive.org/upload/
15:59 🔗 SketchCow ----------------------------------------
15:59 🔗 SketchCow Which Archive Team Members want an Ello Invite
15:59 🔗 SketchCow Condition: Gotta be a troublemaker
15:59 🔗 SketchCow ----------------------------------------
16:01 🔗 yipdw I'll take one if I can sign up as index.html
16:01 🔗 yipdw or about, terms-of-service, etc
16:07 🔗 Kazzy public-beta-profiles
16:08 🔗 DFJustin robots.txt
16:09 🔗 SketchCow Msg me with an e-mail, I'll send one. They might be closing the invites, but I have them.
16:28 🔗 SketchCow I will say, by the way, these tele2 and verizon jobs are rough - just moving the files into position has taken weeks, with all the tiny files on them.
16:30 🔗 midas im still working on the verizon megawarc box
16:34 🔗 Nemo_bis When rsync is slow because of small files, tar is the solution, isn't it?
16:34 🔗 midas megawarcing is slow
16:34 🔗 Nemo_bis Oh, right
16:34 🔗 midas because of the ~10 or 20 million files
16:35 🔗 Nemo_bis Still, the sounds of "10 or 20 million files" is delicious
16:36 🔗 midas yeah but to create a 25 or 50GB warc file is a killer for your system
16:44 🔗 antomatic Remember that a lot of those files were bruteforces - so there are lots of 404s underneath the 'millions of files' headline.
16:44 🔗 antomatic Don't know if a little pre-processing of the Warcs just to extract the good stuff would help any
16:48 🔗 antomatic Or if the megawarcing process could work in size order - e.g. largest files first [most likely to have content] etc.
16:51 🔗 DFJustin looking into the content of each file to see if it's a 404 isn't likely to be faster than just catting it
16:52 🔗 antomatic I'd go by size order - that way you're getting the obvious cream off the top, then once you hit the small files that are all 404s, you can take the view on whether to continue or not.
16:52 🔗 antomatic but I don't know how practical that is, admittedly.
16:52 🔗 Arkiver2 SketchCow: you can skip some files if you want
16:53 🔗 Arkiver2 We did kind of two stages
16:53 🔗 Arkiver2 1. Bruteforce, we didn't have the sites list, so we just followed the patterns we could find
16:53 🔗 Arkiver2 2. Downloading list of sites. We got a list of sites from Tele2 and downloaded everything from the list
16:54 🔗 Arkiver2 SketchCow: You can skip the warc's downloaded with the first stage, since that's ust bruteforces and whatever has been downloaded there has also been downloaded during the second stage when we had the list of sites
16:55 🔗 Arkiver2 I'll take a look at how to find the warc's from the second stage. give me some time
16:57 🔗 SketchCow Not going to skup.
16:57 🔗 SketchCow Stop thinking of it.
17:00 🔗 Arkiver2 I thought your system might not be able to handle it or so due to the large amount of items
17:00 🔗 Arkiver2 It's always the best to do everything
17:00 🔗 Arkiver2 good luck!
17:01 🔗 SketchCow No, it can handle it, it's just boring
17:01 🔗 SketchCow Windows in screen open for weeks.
17:01 🔗 Arkiver2 Ah, that's ok then
17:02 🔗 SketchCow Forced to blast Pink Floyd "Animals" album while considering next moves.
17:03 🔗 Arkiver2 I'm going to start testing qwiki-grab. Can you please tell me how much space is still left on FOS?
17:04 🔗 Arkiver2 So we won't run out of space during the grab.
17:04 🔗 SketchCow 1.2T
17:05 🔗 SketchCow 4.8T on another drive
17:06 🔗 Arkiver2 Ok, thank you, I'll let you know how big the average item of qwiki is
17:08 🔗 yipdw DFJustin: heh, https://ello.co/robots.txt
17:15 🔗 yipdw "username may only contain letters, numbers, dashes, and underscores"
17:15 🔗 yipdw boo
17:17 🔗 xmc -- i'm also providing ello invites, just /msg me --
17:17 🔗 xmc -- no email address needed --
17:17 🔗 Rotab yipdw: and somebody has already registered ck. :(
17:18 🔗 yipdw ha
17:19 🔗 yipdw nobody got ckfight
17:19 🔗 DFJustin cks?
17:19 🔗 yipdw taken
17:20 🔗 Jonimus SketchCow: what sort of hell raising is being done at ello?
17:22 🔗 Jonimus xmc: didn't you have an amusing facebook custom url, or perhaps I'm thinking of bsparks?
17:22 🔗 DFJustin cksucker
17:22 🔗 Jonimus nice DFJustin
17:23 🔗 Rotab :D
17:27 🔗 yipdw https://ello.co/cksucker
17:27 🔗 yipdw ok done
17:27 🔗 Rotab heyooo
17:28 🔗 yipdw now I get to see how long it takes for them to ban me
17:29 🔗 yipdw in the meantime maybe I should glitchr the shit out of this account
17:30 🔗 tfgbd Do you guys know if there is any effort to archive descriptions/app versions from the various mobile app stores?
17:31 🔗 tfgbd That is one piece of content that is almost lost due to not even using the web sometimes
17:31 🔗 tfgbd ugh, stupid cloud shit
17:31 🔗 tfgbd At least the web is scrapable even if some efforts generally don't get everything
17:32 🔗 yipdw tfgbd: most of that stuff is web-based
17:32 🔗 yipdw steamcommunity, app store, etc are
17:33 🔗 tfgbd In the earlier days some of it wasn't
17:33 🔗 tfgbd it's only been SORT of recently that Google's Android Market has even been available on the website
17:34 🔗 yipdw ok, but nowadays you can access Google Play, the Apple App Store, the Steam store listing, etc all via the Web
17:34 🔗 yipdw I don't know of any particular effort to get that data, but if you want to start one feel free
17:34 🔗 tfgbd Though, even before Google had an official website, there were scrapers
17:34 🔗 tfgbd I'm wondering if there is something that downloads the apps/archives them along with the descriptions and comments
17:35 🔗 tfgbd I assume most of the early Apple iPhone stuff is lost, though
17:35 🔗 tfgbd I know there are a few mirrors of the Android Market/Play Store but they seem to have the same (current) stuff the official store had
17:36 🔗 * yipdw shrugs
17:36 🔗 yipdw no point in getting bent up about data that's gone
17:36 🔗 DFJustin https://archive.org/details/android_apps
17:37 🔗 tfgbd and yeah, there are lots of warezish sites full of ads that seem to archive some old .apk/.xap/etc for download
17:37 🔗 tfgbd but it's hardly a perfect effort
17:37 🔗 tfgbd and those can likely go down too
17:37 🔗 tfgbd lol
17:37 🔗 tfgbd it seems we need to archive the archives
17:37 🔗 tfgbd Why not cry? It's depressing to lose good information/tools/data
17:38 🔗 tfgbd s'why pisses me the most off about the damn internets
17:38 🔗 tfgbd what*
17:38 🔗 SketchCow 13:33 < tfgbd> Do you guys know if there is any effort to archive descriptions/app versions from the various mobile app stores?
17:38 🔗 tfgbd Neat, Archive.org already has an effort. Love those guys :P
17:38 🔗 SketchCow Yes, I am working with people who are downloading the entirety of Google Play
17:39 🔗 tfgbd do they have some kind of app that runs and downloads everything (free) it can?
17:39 🔗 tfgbd what about the paid stuff?
17:39 🔗 tfgbd oh, and did anyone get Handango and PocketGear?
17:39 🔗 SketchCow These are a lot of questions.
17:39 🔗 tfgbd sorry
17:40 🔗 tfgbd There was so much nice stuff there. And when they merged it also killed lots of older freeware
17:40 🔗 Arkiver2 tfgbd: if you ever come by big/small/whatever size websites that are going down inform us asap
17:41 🔗 tfgbd do I just paste it in channel?
17:41 🔗 tfgbd There is a lot I know of, probably
17:41 🔗 SketchCow I think you'll find on Archive Team that instead of crying over what is lost, we tend to focus on what needs to be saved or preserved now.
17:42 🔗 tfgbd I have a pretty nice collection of Windows CE software (including stuff I've compiled myself) but I unfortunately don't have most of the pages that hosted the tuff
17:42 🔗 SketchCow Let the many, many, many other people in the world who do nothing sit around going "oh, if only"
17:42 🔗 tfgbd stuff*
17:42 🔗 SketchCow Coffee is for Closers
17:42 🔗 tfgbd some stuff is from archive.org, anyway
17:43 🔗 Arkiver2 <tfgbd>There is a lot I know of, probably
17:43 🔗 Arkiver2 Please post it all here
17:43 🔗 tfgbd is there some keyword I should use so you guys find it?
17:43 🔗 tfgbd i mean those are a lot of logs to sort through
17:45 🔗 tfgbd While they're not going down, I can think of quite a few fairly large WinCE/geneal mobile sites that have huge collections of useful stuff
17:45 🔗 DFJustin make a hitlist on the wiki
17:45 🔗 tfgbd do I just need to make an account?
17:46 🔗 midas yep
17:46 🔗 midas secred word is yahoosucks
17:46 🔗 tfgbd www.hpcfactor.com is pretty much the only source left for support of the Handheld PC platform and some other similar WinCE devices along with some things for Win9x/old NT
17:46 🔗 tfgbd BUT it's pretty much run by one guy
17:47 🔗 tfgbd And it has lost forum posts before
17:47 🔗 yipdw wayback's latest snapshot of that site is from July 23, 2014
17:47 🔗 yipdw how recent is that considering its update rate?
17:47 🔗 tfgbd Very
17:48 🔗 tfgbd but wayback generally doesn't get the whole thing.
17:48 🔗 tfgbd it seems to have trouble with forums and the like
17:48 🔗 yipdw got most of it
17:48 🔗 yipdw probably just need to grab the forum
17:48 🔗 yipdw s
17:48 🔗 tfgbd is there any way to do it with a login so the download links can be obtained?
17:49 🔗 Arkiver2 yeah that's possible
17:49 🔗 Arkiver2 but not likely to be done
17:49 🔗 yipdw most tools have support for cookie jars and/or HTTP basic auth
17:49 🔗 tfgbd I know the author. maybe I can ask him to contribute his own archive
17:49 🔗 Arkiver2 that'd be great
17:49 🔗 tfgbd The other biggie is xda-developers.com
17:49 🔗 yipdw I mentioned last night that ArchiveBot doesn't, which is less a technical question and more of a self-imposed rule
17:49 🔗 xmc xda-developers is on my hitlist
17:50 🔗 tfgbd the are HUGE and hopefully aren't going anywhere but still, it would be a crime to lose all that info and software
17:50 🔗 xmc yes
17:50 🔗 yipdw heh
17:50 🔗 yipdw xda already loses data and self-censors at a hilarious rate
17:50 🔗 tfgbd Some stuff was already lost before due to MS requesting stuff be removed from the FTP
17:51 🔗 xmc not going to talk about it much, but i'm working on a tool to archive forums specifically
17:51 🔗 tfgbd though, I know of a few people who got the FTP before that was done
17:51 🔗 tfgbd but the other problem is that a lot of newer ROMs/tools use shit like rapidshare or mediafire for hosting
17:51 🔗 xmc tool/service i guess
17:52 🔗 tfgbd how does archive.org treat warez? Do they generally just look the other web if it ends up in their archive?
17:53 🔗 DFJustin it's the same as youtube or any other site, you're not supposed to upload it but in practice it probably won't go anywhere unless someone asks for it to be removed
17:53 🔗 tfgbd That sucks
17:53 🔗 tfgbd i assume they still take backups for posterity, though
17:53 🔗 tfgbd there needs to be an alternative to them for the illegal stuff, I guess
17:53 🔗 Arkiver2 it isn't removed
17:53 🔗 Arkiver2 it's made unavailable
17:54 🔗 tfgbd mmm
17:55 🔗 tfgbd This is another one: http://www.yetanotherhomepage.com/j7xx/
17:55 🔗 tfgbd It's been stable for years but you never know
17:55 🔗 tfgbd also, I imagine many of the links to authors sites are dead
17:55 🔗 DFJustin do a quick check with the wayback machine and if there are pages or files missing, bring it up in #archivebot
17:56 🔗 DFJustin note that wayback will auto-grab images or file downloads but you can tell if the archive date is today's date that they didn't have it
17:56 🔗 tfgbd does anyone archive cvs/svn/git stuff
17:56 🔗 tfgbd fortunately there is sourceforge but the self hosted stuff tends to randomly die
17:56 🔗 DFJustin somebody in here was downloading a large percentage of github
17:57 🔗 tfgbd does archive.org preserve the server time stamps too?
17:57 🔗 DFJustin I believe so, all of the http headers should go into the warc archive
17:57 🔗 yipdw they do
17:58 🔗 xmc i think it's time for us to have an faq
17:58 🔗 yipdw ^
17:58 🔗 tfgbd is there anyone working on a modified jdownloader that could automate mirroring mediafire/rapidshare stuff?
17:58 🔗 xmc after work today i'll put a list of faq questions up on the wiki
17:58 🔗 tfgbd can the warc tools get something like a rapidshare if you manually do the wait?
17:59 🔗 DFJustin https://github.com/espes/jdget
17:59 🔗 joepie91 due to the reliance on generate download URLs, that's unlikely to work very well, even with something like webrecorder..
17:59 🔗 xmc oh shit we do have an faq already http://archiveteam.org/index.php?title=FAQ
17:59 🔗 joepie91 on javascript to generate*
17:59 🔗 tfgbd it doesn't cover most of my questions, unfortauntely
18:00 🔗 joepie91 DFJustin: ?!
18:00 🔗 joepie91 DFJustin: native?!
18:00 🔗 xmc tfgbd: yes. it needs some expansion.
18:00 🔗 DFJustin I dunno someone linked it in here the other day, think it's unfinished though
18:00 🔗 joepie91 very cool regardless...
18:09 🔗 xmc there i added section headings or whatever
18:09 🔗 tfgbd what do you guys think of offline explorer?
18:10 🔗 xmc what is that
18:11 🔗 yipdw http://www.metaproducts.com/mp/Offline_Explorer.htm probably
18:11 🔗 tfgbd Wha? you've never heard of it?
18:11 🔗 tfgbd it's pretty famous
18:11 🔗 tfgbd supposedly it can even do dhtml and html5 now
18:13 🔗 tfgbd It also has a download while you browse function
18:13 🔗 yipdw product page doesn't detail its file format
18:13 🔗 tfgbd i can send you one if you'd like
18:14 🔗 tfgbd it mostly just seems to be html in folders
18:14 🔗 * yipdw shrugs
18:14 🔗 tfgbd with a mirror of the original url
18:14 🔗 tfgbd try it and add it to the FAQ?
18:15 🔗 tfgbd can archive.org accept that?
18:15 🔗 yipdw yes, but WARC is preferred
18:15 🔗 tfgbd does htttrack support warc now?
18:15 🔗 yipdw I don't know
18:16 🔗 tfgbd I also have a number of 4chan/other imageboard thread archives I've done with the tool ChanThreadWatch
18:16 🔗 tfgbd does archive.org take those too?
18:16 🔗 tfgbd or someone else
18:16 🔗 yipdw part of what we're trying to do here is also to archive stuff in file formats that are de jure/de facto standards
18:16 🔗 yipdw if there's stuff in older formats that's fine too
18:16 🔗 yipdw but going forward
18:17 🔗 tfgbd unfortunately, it's not always possible to use these standards
18:17 🔗 yipdw you have probably seen WARC thrown around here, and no it's not perfect, but it is also the second serious attempt at formalizing a standard for archive access
18:17 🔗 yipdw web archive access anyway
18:17 🔗 yipdw the first one being IA's original ARC
18:18 🔗 yipdw as for whether or not IA will accept it, the answer is usually yes
18:18 🔗 yipdw whether or not the data can be used by automated processes to derive other product is a harder question
18:18 🔗 yipdw one reason why WARC is attractive is that its format lends itself nicely to request replay
18:19 🔗 yipdw also, wayback/pywb/webrecorder/archivebot/wget/wpull etc all speak it
18:19 🔗 yipdw warcproxy, warcmitmproxy, warctools, ...
18:19 🔗 tfgbd is there any kind of browser plugin that will just automatically download everything you view and send it to archive.org without user intervention?
18:20 🔗 tfgbd the barrier to entry still seems too high for most people
18:20 🔗 yipdw web.archive.org/save can be turned into a bookmarklet
18:20 🔗 yipdw as for recording sessions, I'm not aware of one that isn't webrecorder.io, and that still requires a WARC download
18:20 🔗 yipdw as for barrier of entry, feel free to work on things to lower eit
18:21 🔗 tfgbd There is Offline Explorer but unfortunately it's not the same standard
18:21 🔗 yipdw no reason it couldn't be, except for its proprietary nature and whether or not its developers care about standards
18:21 🔗 tfgbd When I seriously looked into the tools available a few years ago, I was kind of surprised there wasn't more
18:21 🔗 yipdw things have changed a lot in a few years
18:21 🔗 tfgbd i guess people just don't care
18:21 🔗 tfgbd i've heard the same happens at libraries
18:21 🔗 yipdw you're indirectly insulting a lot of people here, heh
18:22 🔗 tfgbd I don't mean you guys
18:22 🔗 tfgbd I just mean the world in general
18:22 🔗 tfgbd it's amazing you were able to get geocities
18:22 🔗 yipdw oh ok
18:22 🔗 Jonimus tfgbd: I know someone is working on a tool that eats firefox history and outputs warc
18:23 🔗 Jonimus after time and bandwidth
18:23 🔗 yipdw bookmarklet to webrecorder is probably one place to start
18:24 🔗 yipdw actually, it has one
18:28 🔗 tfgbd webrecorder.io?
18:29 🔗 yipdw yeah
18:29 🔗 tfgbd I tried it yesterday but it seems kind of counterintuitive to have to download everything on the server and then download the warc yourself
18:29 🔗 tfgbd doesn't archive.org have some kind of browser or something like that
18:29 🔗 yipdw the WARC is generated as you go
18:29 🔗 yipdw none that I've seen
18:30 🔗 yipdw anyway, if you have UI suggestions, much of the code for webrecorder is open-source and the guy who maintains it is also receptive to ideas
18:30 🔗 yipdw https://github.com/ikreymer
18:31 🔗 tfgbd oh, so I can host it myself?
18:31 🔗 yipdw yes
18:31 🔗 yipdw I haven't tried to set one up yet; it's on my list-of-things-to-try-for-ArchiveBot
18:31 🔗 yipdw but the code is out there
18:31 🔗 tfgbd is it possible to get it to automatically send everything to archive.org without the end user needing to download a warc at all
18:32 🔗 yipdw maybe
18:32 🔗 yipdw if you have a set of archive.org S3 keys you can upload
18:32 🔗 yipdw that option probably doesn't exist in the software yet
18:35 🔗 DFJustin you can use httrack etc. if you route them through a warc proxy
18:36 🔗 DFJustin ia will accept anything you upload but only ARC and WARC can be ingested into the wayback machine
18:36 🔗 DFJustin so that's where we've been focusing our efforts
18:41 🔗 tfgbd so if you used httrack you'd effectively have too copies of the site?
18:41 🔗 tfgbd do you guys offer any sort of VPS hosting or anything for people who don't have access to the store space required?
18:42 🔗 DFJustin yes you would end up with two copies
18:43 🔗 joepie91 tfgbd: archivebot is generally used by people who can't or don't want to maintain their own wget+warc setup, for archiving stuff
18:43 🔗 joepie91 if you need more advanced stuff... *somebody* in here may be able to help out with a VM
18:43 🔗 joepie91 but there's no 'official' service of any kind afaik
18:43 🔗 joepie91 just random people going "oh yeah, sure, here, have a VM" occasionally
18:43 🔗 joepie91 :P
18:47 🔗 tfgbd two*
18:47 🔗 tfgbd cool
18:47 🔗 tfgbd who do I ask for a VM?
18:48 🔗 tfgbd word
18:50 🔗 joepie91 no idea, generally just post what you need in here with sufficient detail and see if anybody offers :P
18:50 🔗 joepie91 or, if you just need one temporarily, you can also consider trying digitalocean or so
18:50 🔗 tfgbd Okay, I made an account
18:51 🔗 tfgbd There is another completely free (for life supposedly) vps service but it kind of sucks and isn't a "real" vm
18:51 🔗 tfgbd and you don't get much disk space either
18:51 🔗 joepie91 which service is that?
18:52 🔗 tfgbd hold on
18:52 🔗 joepie91 host1free by any chance?
18:52 🔗 tfgbd they only offer ipv6 IPS, though
18:52 🔗 tfgbd ever heard of that one
18:52 🔗 tfgbd will check it out
18:52 🔗 joepie91 don't, host1free is awful
18:52 🔗 joepie91 lol
18:52 🔗 joepie91 and very dodgy
18:53 🔗 joepie91 they want way too much info
18:53 🔗 tfgbd it's not a "real" vps as much as a container
18:53 🔗 joepie91 openvz?
18:53 🔗 tfgbd you still get root but can't install any drivers or anything
18:53 🔗 DFJustin http://archiveteam.org/index.php?title=Clown_hosting
18:53 🔗 tfgbd yes, openvz
18:53 🔗 tfgbd I hate that shit
18:53 🔗 joepie91 openvz is fine for most cases :P
18:53 🔗 tfgbd Yeah, I'm one of those weird people who wants to run VMs in VMs
18:53 🔗 joepie91 also, no point in installing drivers in a VM anyway
18:53 🔗 joepie91 well
18:53 🔗 joepie91 you can, theoretically
18:53 🔗 joepie91 under openvz
18:53 🔗 joepie91 using qemu
18:54 🔗 joepie91 it's just going to be horrifically slow
18:54 🔗 tfgbd exactly
18:54 🔗 joepie91 anyway
18:54 🔗 joepie91 realistically
18:54 🔗 tfgbd on Hyper-V, KVM or VMWare you can usually run virtualbox, VMware, Virtual PC, etc
18:54 🔗 joepie91 well
18:54 🔗 joepie91 yes and no
18:55 🔗 tfgbd it works
18:55 🔗 tfgbd I've signed up for tons of VPS trials just to try it out
18:55 🔗 joepie91 usually, on consumer VM services, there's not really a difference between openvz and kvm in terms of what nested virt you can run
18:55 🔗 joepie91 because no virt extensions
18:55 🔗 joepie91 afaik even KVM/Hyper-V/VMWare don't do that
18:55 🔗 joepie91 that is
18:55 🔗 joepie91 emulating it
18:55 🔗 joepie91 virtualbox requires a kernel module, but that's *technically* possible on openvz
18:56 🔗 joepie91 just requires a cooperating host
18:56 🔗 tfgbd But they still virtualize a full PC
18:56 🔗 joepie91 sure, but that doesn't mean you can suddenly run all kinds of virt
18:56 🔗 tfgbd which is enough to run Virtual PC or VirtualBox
18:56 🔗 joepie91 try running KVM without virt extensions :)
18:56 🔗 tfgbd they might be a little slower but it's not QEMU slow :P
18:56 🔗 joepie91 sure
18:56 🔗 joepie91 but again, theoretically you can run virtualbox under openvz
18:56 🔗 joepie91 you just need some cooperation from the host
18:56 🔗 tfgbd Right
18:56 🔗 joepie91 (which most won't give for security reasons, but that aside)
18:56 🔗 tfgbd ain't gonna happen ;P
18:57 🔗 tfgbd that's why it's easier to just cut the middle man
18:57 🔗 tfgbd plus, a full PC is better, anyway
18:57 🔗 tfgbd you can even dual boot
18:57 🔗 joepie91 tfgbd: the best option is to just get a cheap dedi with virt ext :)
18:57 🔗 tfgbd anything so cheap it's free
18:57 🔗 tfgbd and here is the host: http://www.vps.me/free-vps
18:58 🔗 tfgbd it's free but they don't give you much space and it is ipv6 only
18:58 🔗 tfgbd I doubt they would install vmware or virtualbox if I asked
18:59 🔗 joepie91 that's not bad
18:59 🔗 joepie91 but, indeed
18:59 🔗 tfgbd maybe add it to the FAQ?
18:59 🔗 SketchCow When I finally wade into this gamergateage, I see how people are going to just jump on me.
18:59 🔗 tfgbd And hey, it's cool to know I'm not the only one who loves to mess with nested virtualization
18:59 🔗 SketchCow It's designed to wreck anyone who touches it.
18:59 🔗 joepie91 tfgbd: I only know about it from a theoretical POV, I don't actually really run nested virt :) aside from an openvz testing setup on a KVM
19:00 🔗 tfgbd you mean openvz in KVM?
19:00 🔗 joepie91 yes
19:00 🔗 tfgbd I find it hard to consider openvz real virtualization
19:00 🔗 joepie91 that's what I said :P
19:00 🔗 joepie91 it is
19:00 🔗 tfgbd app virtualization, maybe.
19:00 🔗 joepie91 no
19:00 🔗 joepie91 it just virtualizes a different subset of things
19:00 🔗 tfgbd it's just an app
19:01 🔗 joepie91 it's really not
19:01 🔗 tfgbd not much different from BSD jails or Sandboxie
19:01 🔗 joepie91 they are effectively jails with virtualized kernel calls
19:01 🔗 joepie91 / devices
19:01 🔗 tfgbd still seems pretty neat that it even works on windows, though
19:01 🔗 joepie91 but it's absolutely virtualization
19:01 🔗 joepie91 also, this belongs in #archiveteam-bs
19:01 🔗 joepie91 perhaps we should move there
19:01 🔗 joepie91 :P
20:16 🔗 Arkiver2 SketchCow: each qwiki items is around 70 megabyte. The first batch has a bit less then 1500 items, so is around 100 GB.
20:17 🔗 Arkiver2 Can I start the project and use fos? or is it too busy/full?
20:23 🔗 SketchCow No, go ahead
20:57 🔗 Arkiver2 Ok, started
21:32 🔗 kyan Following up on this http://badcheese.com/~steve/atlogs/?chan=archiveteam&day=2014-10-01 Emijrp wrote http://pastebin.com/vGmtaZe2
21:33 🔗 kyan (he wrote it to me in an email... i'm the one that pastebinned it. So as not to mislead. Not that it's probably a big deal. But I like to keep good records, anyway. Aaah, you get the idea. I'll shut up now.)
21:54 🔗 Nemo_bis I don't have a copy
21:54 🔗 Nemo_bis But Platonides won't go anywhere, at some point the files will reappear if he still has them
21:54 🔗 Nemo_bis well, unless they were on toolserver.org :o
21:54 🔗 Nemo_bis let me check
21:56 🔗 Nemo_bis but seriously, there was nothing in knol ;)
22:05 🔗 kyan Mm, that's too bad

irclogger-viewer