[00:16] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [00:17] *** pizzaiolo has joined #archiveteam-bs [00:18] *** jrra has joined #archiveteam-bs [00:18] oops [00:18] *** jrra has left part [00:29] I've been running the Hyper-V warrior for a while now Soulflare, working great [00:40] *** drumstick has quit IRC (Read error: Operation timed out) [00:45] *** BlueMaxim has joined #archiveteam-bs [01:26] *** Lagittaja has quit IRC (Quit: Leaving) [01:26] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [01:49] *** drumstick has joined #archiveteam-bs [01:59] *** odemg has quit IRC (Read error: Operation timed out) [02:08] *** odemg has joined #archiveteam-bs [02:27] oh boy Xfire was archived, I had thought it was lost [02:38] I thought we grabbed all of geocities awhile ago o_O [02:38] by no means all! [02:39] we ran out of time, and we werent' very organized [02:39] remember, geocities was our very first project [02:41] one of the stranger things about geocities was that users who were subscribed to geocities plus at the time of the shutdown had their sites up for a few more years [02:42] Was it maybe a separate cluster of servers? Seems odd [02:42] hell, the geocities clipart directory is still up [02:42] ex: http://www.geocities.com/clipart/pbi/backgrounds/Template_Pages/personalb.gif [02:44] Is it a redirect unless exact path is found? [02:45] yeah, used to be a directory listing before yahoo screwed around with their webhosting division [02:45] That's a pain :/ [02:45] look at that, some guy's geocities page is still up http://us.1.p.geocities.com/@bei-tech.com/investor/corp_gov_compensation.htm [02:48] the clear 10x10 g [02:48] if that yahoo pagebuilder used is also still up http://www.geocities.com/clipart/pbi/c.gif [02:52] *** drumstick has quit IRC (Read error: Connection reset by peer) [02:56] lol [02:59] *** drumstick has joined #archiveteam-bs [03:05] *** Asparagir has joined #archiveteam-bs [03:05] I don't think this is a geocities site... http://us.1.p.geocities.com/@/ [03:10] i'd guess they use yahoo business hosting [03:12] *** hook54321 sets mode: +o Asparagir [03:20] geocities might be like pre-internet aol where the servers are still running and the sysadmins have turned over so many times there's literally no one who knows where these servers are or how they work [03:32] Ahem. [03:33] There are a couple Geocities sites still up [03:33] The reason for this is that they paid to subscribe to a certain other Yahoo! web hosting service, so it keeps the sites up in both domains [03:43] I just discovered that you can run VirtualBox headless [03:52] I haven't seen it mentioned here, but SketchCow recently started a Patreon for a weekly podcast: https://www.patreon.com/textfiles [04:41] I'm not sure how likely this is, but we should ask a service like https://whois.domaintools.com/ if they would be willing to donate free access to us. They have reverse whois, historical whois, etc. [04:49] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:55] *** Sk1d has joined #archiveteam-bs [05:54] *** Asparagir has quit IRC (Asparagir) [07:01] *** Soni has quit IRC (Ping timeout: 272 seconds) [07:02] *** Soni has joined #archiveteam-bs [07:07] *** kevinr has quit IRC (Read error: Operation timed out) [07:15] I got a reply from the imgh.us guy [07:15] https://www.irccloud.com/pastebin/Qx4iJWyw/ [07:49] *** kevinr has joined #archiveteam-bs [07:54] xarph: cue bash.org quote? [07:54] well, nearly [07:54] *** trvz has joined #archiveteam-bs [07:57] hook54321: looks like if you knew it before nov 2016, you're thinking about something else [07:57] re: apollo [07:57] what do you mean? [07:58] oh [07:58] that's when it change from xanax to apollo [07:58] or there was/is overlap [07:58] I remember it being called xanax. [07:58] I don't think my account works anymore unfortunately though, I stopped using it for awhile. [07:58] ah. i only migrated when good ole what went down :/ [07:59] I think I may have joined right after they changed their name to apollo [07:59] i'm sure they'll reactivate you (unless you cheated or something or had terrible ratio)... big competition between them and RED [08:00] Not sure how to ask them to reactivate me, also not sure what RED is. [08:00] redacted [08:00] ah. another torrent tracker? [08:00] music-focused [08:01] Apollo is music focused as well, right? [08:01] JAA: See the response I got from the owner of imgh.us above. [08:03] yep @ APL [08:09] hook54321: Nice. I see that the imgh.us domain itself is back as well. Let's get him to send you the list and throw it into ArchiveBot. [08:09] 2M images shouldn't take too long anyway. [08:11] We can worry about the redirect to the archived version later. That shouldn't be hard, it's basically just 'redirect http://imgh.us/X to https://web.archive.org/web/date/http://imgh.us/X'. I think someone (jrwr?) set up something similar for Eroshare recently. [08:18] It's only 699,927 images [08:25] I'm hoping I don't get caught by his spam filter again. [08:40] *** luckcolor has quit IRC (Read error: Operation timed out) [08:40] *** MrRadar2 has quit IRC (Read error: Operation timed out) [08:40] *** bluesoul has quit IRC (Read error: Operation timed out) [08:43] *** luckcolor has joined #archiveteam-bs [08:44] *** tsr has quit IRC (Read error: Operation timed out) [08:48] *** bluesoul has joined #archiveteam-bs [08:51] *** MrRadar2 has joined #archiveteam-bs [08:53] *** Honno has joined #archiveteam-bs [09:00] *** tsr has joined #archiveteam-bs [09:06] *** Dimtree has quit IRC (Read error: Operation timed out) [09:10] Ah right, confused it with the number of URLs at x.vu. [09:53] *** Dimtree has joined #archiveteam-bs [10:48] *** Soni has quit IRC (Ping timeout: 272 seconds) [10:49] *** mls has quit IRC (Ping timeout: 250 seconds) [11:01] *** mls has joined #archiveteam-bs [11:03] *** BlueMaxim has quit IRC (Quit: Leaving) [11:18] *** drumstick has quit IRC (Ping timeout: 600 seconds) [11:18] *** drumstick has joined #archiveteam-bs [11:27] *** drumstick has quit IRC (Ping timeout: 255 seconds) [11:30] *** pizzaiolo has joined #archiveteam-bs [11:30] *** Soni has joined #archiveteam-bs [12:05] *** Lagittaja has joined #archiveteam-bs [12:13] *** Dimtree has quit IRC (Peace) [12:20] *** mls has quit IRC (Ping timeout: 250 seconds) [12:22] *** Dimtree_ has joined #archiveteam-bs [12:23] so, here's a question (I'm not looking for a guide, I can figure this stuff out), how "difficult" is it to use the manual scripts instead of the warrior if I use my own debian install. is it just a matter of installing the dependencies, starting the script and let it rip? does it need a lot of micromanaging? [12:25] Lagittaja: nope; it's pretty much run-and-forget other than needing to terminate/git-pull/restart for updates [12:25] also there's no automatic project switching when using manual scripts [12:25] so if you want to switch projects, you need to clone and run the new scripts yourself [12:26] alrighty then, pretty much what I expected. thank you joepie91_ [12:37] *** mls has joined #archiveteam-bs [12:38] *** Soulflare has quit IRC (Quit: http://drsclan.net) [12:38] *** Soulflare has joined #archiveteam-bs [12:39] *** Dimtree_ is now known as Dimtree [12:48] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [12:48] *** pizzaiolo has joined #archiveteam-bs [12:52] *** refeed has joined #archiveteam-bs [13:12] I keep on getting temp banned from bitly [13:13] Doesn't last for very long though, it's kinda weird. [13:14] I was not aware bitly would even do that [13:14] that's normal for me too [13:14] but i run a warrior with 6 connections [13:14] I am too, but I hardly ever have 6 jobs at once. [13:15] When I'm banned it says: [13:15] Forbidden [13:15] Uh oh, Bitly can't show you the page you are trying to access. [13:22] hmmm, btw can warrior handle thing like cloudflare ddos protection? [13:24] not currently [13:24] patches very welcome :) [13:24] @ refeed [13:25] specifically, this would need to be ported to Python: https://gist.github.com/joepie91/c5949279cd52ce5cb646d7bd03c3ea36 (assuming their algo hasn't changed in the meantime) [13:27] euhm, that's javascript :/ , I still wonder how cloudflare javascript IUA challange works [13:27] btw there's a python library that can be used for that [13:27] https://github.com/Anorov/cloudflare-scrape [13:28] Yes, but it relies on NodeJS. [13:28] I've used it before, and I thought about implementing something in pure-Python instead. [13:28] Nice code joepie91_, I might play around with porting that at some point. [13:30] And no, I don't think anything has changed. [13:30] refeed: hence 'porting to Pythoin' :) [13:30] Pythoin * [13:30] there are afaik currently no Python implementations of that [13:31] they all just shell out to phantomjs or equivalent which is a terrible solution [13:31] Yep, although it is probably more robust if CF decides to make subtle changes to the challenge. [13:31] the code I linked is an implementation that operates directly on the page source without needing an additional JS runtime, so could be ported directly to Python [13:31] JAA: sure, but at that point you can also just hard-fail [13:31] and fix the code [13:31] Indeed [13:31] detecting cloudflare interstitials is much easier than breaking them :P [13:44] okay, I didn't look to it deeper [14:44] *** sep332 has joined #archiveteam-bs [15:18] refeed: btw, the cloudflare-scrape library you linked is particularly dangerous, because it executes arbitrary JS in Node (which has access to things like the filesystem, process API, etc.) [15:19] refeed: (their idea that it is now 'secure' because they're using Object.create(null) is wrong, btw) [15:27] wew, okay, thanks for the heads-up, I just read the warning in its readme, I thought it was already secured, apparently not [15:33] btw, currently I just use it to overcome with archive.is cloudflare challange, I'm pretty sure they (cloudflare and archive.is) will not doing evil things [15:33] archive.is has a CF challenge? [15:34] *** TheLovina has quit IRC (Ping timeout: 370 seconds) [15:34] JAA: yes [15:34] refeed: Example? I've never seen one... [15:37] JAA: well, you can see it by yourself by running `$ curl -X GET "https://archive.is"`, or by visiting archive.is in your browser's incognito mode [15:39] * refeed is taking a screenshot [15:41] https://imgur.com/a/g1Dsq [16:08] refeed: That must be geo-limited or something then. I can access archive.is directly, including with curl or in a private window. [16:10] Interestingly, I get redirected to archive.fo in Firefox but not with curl. [16:14] JAA: well, that's interesting [16:16] accessing archive.fo in my place also still receives a cloudflare challange, but now with no https :/ [16:30] *** BartoCH has joined #archiveteam-bs [16:31] so we may want to faraday cage the IA and its backups: http://worldif.economist.com/article/13526/electromagnetic-shock [16:34] *** refeed has quit IRC (Quit: Leaving) [16:34] Hm, visiting archive.is in Pale Moon incognito doesn't trigger anything (with and without addons) [16:42] *** Mateon1 has quit IRC (Read error: Operation timed out) [16:42] *** Mateon1 has joined #archiveteam-bs [16:49] *** Aranje has joined #archiveteam-bs [17:08] there's a pretty widespread belief that the `vm` module in Node provides secure sandboxing, but it really really does not and never will [17:09] *** etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [17:10] *** etudier has joined #archiveteam-bs [17:14] *** TheLovina has joined #archiveteam-bs [17:41] *** kim__ has joined #archiveteam-bs [17:43] Does anyone know if I as a user on reddit.com can setup a filter, and filter out all the bots, maby whitelist some of the bots, but aargh, the bots piss me off, and THAT pisses me off, because thats the botcreators whole idea. [17:43] <-- new on reddit [17:46] Using RES, yes. [17:46] kim__, https://redditenhancementsuite.com/ [17:48] thankyou - will look into this :D [18:18] *** Asparagir has joined #archiveteam-bs [18:19] *** svchfoo3 sets mode: +o Asparagir [18:19] *** svchfoo1 sets mode: +o Asparagir [18:24] *** second has joined #archiveteam-bs [18:25] What do you mean by "categorising data"? [18:25] Is there a good tool to categorize data? [18:26] Like, apply meta data? [18:26] yes [18:26] Automatically or manually? [18:26] and then join them all together [18:26] both [18:26] For upload to IA, or just to the files in general? [18:26] such that going to a page for a game would also mention related movies and other articles on the subject [18:26] Both, mainly local though [18:27] the archive is nice but its a bit lacking in joining data together [18:31] Well, IA wise, you can mass edit metadata at upload time or after via the IA CLi tool, https://internetarchive.readthedocs.io/en/latest/. As for locally, the only one I have ever really played around with is GNOME Tracker, https://en.wikipedia.org/wiki/Tracker_(search_software) [18:39] I like to keep it simple and use directories and symlinks (or hardlinks, depending on the circumstances). Every single software can handle that, unlike xattrs or separate metadata files. [18:46] godane, see news at MySpleen... fuck me!! https://portland.craigslist.org/mlt/emd/d/trip-down-analog-lane-27/6283808412.html [18:47] Number of tapes: approx. 23-24,000 (consists primarily of 6 & 8 hr Sony and TDK tapes). [18:52] IA may need to email that guy to see if they can get a few of the tapes [18:52] All of them. [19:00] Is there anyway to download warcs from the IA? [19:01] since it has news on alot of tapes i think IA would take it [19:02] second: Depends. IA's own crawls and anything saved through web.archive.org/save/... isn't available. Our WARCs are. [19:03] odemg: if anyone is willing to send me tapes i will digitize them [19:03] (Those are just some examples. There are many more "types" of files on IA, e.g. those from the commercial (?) service Archive-It.) [19:06] godane, make sure the right people at ia know, I'd rather see them deal with this than myspleen [19:10] How do I check how a website was archived? [19:11] *** TheLovina has quit IRC (Read error: Operation timed out) [19:11] second: In the Wayback Machine, you mean? Click on "About this capture" in the top right. It doesn't help you with finding the actual IA item (where you might be able to download the WARCs) though. [19:17] re: craiglist VHS hoard. I'm a 40 minute drive away, but transporting would be a bit of an issue for me, and I would have to rent a storage unit [19:27] I know there is at least one other frequent ArchiveTeam member who lives near there too, but I don't want to out him here without his consent. But maybe he could help you with loading this stuff into a truck, if we can find someone to provide and drive the truck. [19:28] I am going to cross-post the link to the Internet Archive Slack channel; maybe someone there will have an idea. And yes, the IA owns a truck. [19:30] *** Mateon1 has quit IRC (Remote host closed the connection) [19:30] *** Mateon1 has joined #archiveteam-bs [19:32] Can we just stop and grasp... this guy has been recording tapes since 1986... and filled 24,000 tapes!! [19:33] He deserves some kind of medal. [19:33] Given his lowest conservative estimates that's 15 years of constant play video content [19:35] Are there digitization machines for VHS that can run the tape faster than real time? [19:35] A normal VHS player loses tracking as soon as you go faster than 1x. [19:37] Yeah, a better solution would probably just be to buy as many decks as possible and run them in parallel [19:37] Especially since they're relatively available and cheap right now [19:38] This seems kind of like the Marion Marguerite Stokes collection, although fewer tapes. This guy ran twelve VCR's in Oregon for twenty-seven years, while Stokes ran eight VCR's in Philadelphia for thirty-five years. [19:38] Stokes' tapes are now at the Internet Archive! [19:38] I just posted about all this in the Slack channel. I hope they come back to us soon. [19:38] SketchCow: this is something you need to see ^ [19:39] Asparagir, has IA documented what method they used for digitizing somewhere? [19:40] I have who moving boxes with VHS I should digitize and throw away sooner rather than later. [19:40] Dunno. I don't work there, I just drop by their main building a lot because I use their WiFi to upload big files to them. [19:40] s/who/two/ [19:40] 100 GB/s upload speeds. It is glorious. [19:40] Or was that MB/s. [19:40] Anyway, something insanely fast, and free. [19:41] Oh, too bad, I was going to ask what sorcery wifi you have over therte in the US. :) [19:41] Hahaha [19:41] Speaking of, does anyone have a good suggestion for a USB analog video capture device that plays nice with Linux? [19:41] (NTSC if that matters) [19:43] MrRadar, not a recommendation, but these seem commonly used on Linux: https://linuxtv.org/wiki/index.php/Easycap [19:43] And they are cheap. I bought 6 diffrent variant of them this spring, but I haven't had time to test them yet. [19:44] Thanks, I'll check them out [19:45] Where's the Craigslist link? [19:45] nvm [19:47] oh gosh. free of charge? has someone contacted him? [19:48] We need to make sure someone doesn't snatch it [19:49] I know, right? [19:49] *** K4k has quit IRC (Quit: WeeChat 1.6) [19:49] But we can't contact him unless we have a definite plan for pick-up, load-in, destination (presumably the IA), and digitization. [19:49] *** K4k has joined #archiveteam-bs [19:50] Not fair to take the tapes without having a solid backing from the IA, or another institution. [19:50] I smell an archive corps project coming up [19:51] Me too. But we don't have our 501(c)3 designation yet. So it would probably be Fearless Leader asking for help from the public directly. And the IA has to sign off on being the home of the data. [19:51] How much man work is involved in archiving this? For the past few days I've been thinking about having the robotics team that's at my old high school collect old software CDs and do the whole archiving process on them, but this could work too. [19:52] Also, I'm close to this place [19:52] *** K4k has quit IRC (Client Quit) [19:53] Did someone in slack respond yet? [19:53] *** K4k has joined #archiveteam-bs [19:55] I'm going to notify them and ask if they would be willing to help. [19:58] About how much space would this take up? [19:59] hook54321, Asparagir Problem being MySpleen has this post in their news second, hopefully the guy doesn't had off to them before ia/at get their ass into gear and convince him the tapes are best off at ia [19:59] second* (for 16 hours now) [20:00] huh? [20:02] I sent a message to Jason on twitter [20:03] odemg: Is there a link to the MySpleen post? I'm not a member of that tracker... [20:03] *** K4k has quit IRC (Quit: WeeChat 1.6) [20:04] *** K4k has joined #archiveteam-bs [20:04] Asparagir, https://i.imgur.com/nX4LEzD.png [20:04] seems like a job for archive corps [20:05] Yup, he want's the whole lot take as a collection, not to have it parted out [20:05] What's MySpleen? [20:05] odemg: Thanks. [20:05] We need to contact this guy asap. [20:05] hook54321, private tracker that has content much like this [20:05] ah [20:06] godane, this reminds me I need to give the other tape guy a nudge... [20:07] We can't contact him without there being buy-in from an archive or institution first. Unless someone here has a boatload of money and we can hire our own truck, hire a storage unit, and pay people's salaries to do digitization/uploading/metadata on this, which could take a year or more. [20:08] We could get volunteers. [20:09] How many car loads would it take to transport them somewhere? [20:09] For the load-in part, volunteers would be great. That's not so hard. But we then need to store the tapes, and work on systematically getting them turned into data. That's harder, and expensive. Remember how SketchCow was stuck with paying the fees on the storage unit for all the catalogs he rescued with ArchiveCorps, hundreds of dollars each month. [20:13] We could crowdfund it through something like GoFundMe. [20:14] IA took delivery of a truckload of videotapes a while ago iirc [20:15] Where were the tapes located though? [20:17] Jason responded to my message on Twitter. [20:17] He said "That is not a life well spent" [20:19] ouch [20:23] https://blog.archive.org/2013/11/22/a-dream-to-preserve-tv-news-on-the-road-to-realization-with-your-help/ [20:23] "140,000 video cassettes" [20:23] I think there's a good chance they'll accept it [20:25] i think it was figured out to be 40k tapes not 140k [20:25] SketchCow responded in the Slack channel and said he would get into it more when he gets home (from wherever he is right now). But he did tag the head of the Internet Archive (Brewster Kahle) into the conversation, so eyes are on the project. [20:26] can anyone post the internet archive slack channel here? [20:27] ^ [20:27] You have to have a archive.org email address, unless someone sends you an invite. [20:27] It's internetarchive.slack.com -- but you need someone who works at the Archive to invite you to it. I don't work there. [20:27] ok [20:27] And there is only one channel open to non-employees like us. [20:28] Apparently all the vhs tapes would take up 4800sqft [20:28] Ask SketchCow for invites, or maybe try the #internetarchive IRC channel, although people are rarely in there. [20:33] If this person took the time to record all of these, I would think that they would be glad to see it go to somewhere like the Internet Archive, if they haven't already given it away. [20:33] *** Aranje has quit IRC (Ping timeout: 245 seconds) [20:33] agreed [20:35] I think if we contacted this guy and said that we're just trying to figure out the details, and we talk about the historical significance of them some, what IA plans to do with them, etc., then he might give it to us instead of someone else if they are still in his possession. [20:43] SketchCow is writing back to the Slack channel right now -- he just got off a 30 minute phone call with the guy! [20:43] quotes: [20:44] WELL I JUST TALKED TO DON FOR 30 MINUTES, THANKS BROOKE [20:44] There are several people who want this, to digitize [20:44] He wants to give the tapes away [20:44] He actually kept mentioning Marion stokes [20:44] So he was impressed we were those people [20:44] I'm mail him, he's going to make us all meet each other [20:44] ... [20:44] So...that sounds promising! [20:47] nice! [20:49] *** Aranje has joined #archiveteam-bs [20:49] sweet :D [20:54] Awesome! Are we supposed to meet each other in person, or? [21:33] Well, there has been talk of having an ArchiveTeamCo one of these days... [21:33] *ArchiveTeamCon [21:49] hmm "It would probably be good to settle the main details by September 1st, so a possible January event could take place." http://www.archiveteam.org/index.php?title=Archive_Team_CONspiracy [21:52] Yeah [21:52] I got a little busy [21:52] And people weren't stepping up [21:52] If it gets shifted forward, it's forward [21:52] Maybe Valentine's Day [21:52] Archive Team Conspiracy: The Valentine's Day Massacre [21:53] that sounds exactly the right amount of sketchy [21:56] I look forward to explaining that to the friendly US customs gentleman [21:57] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [21:57] lol [21:58] *** drumstick has joined #archiveteam-bs [21:58] DFJustin: probably further improved by "bring your old tape drives" day [21:58] it seems that the older the technology, the more indistinguishable it is from homemade explosives [22:07] that sounds about right [22:16] *** dashcloud has quit IRC (Read error: Operation timed out) [22:19] *** dashcloud has joined #archiveteam-bs [22:20] Old nitrate film, in particular! [22:37] *** Honno has quit IRC (Read error: Operation timed out) [22:42] *** BartoCH has quit IRC (Quit: WeeChat 1.9) [22:46] *** dashcloud has quit IRC (Remote host closed the connection) [22:50] *** Mateon1 has quit IRC (Remote host closed the connection) [22:50] *** Mateon1 has joined #archiveteam-bs [22:52] *** dashcloud has joined #archiveteam-bs [23:06] eep ahahaha yes