[00:00] still funny though..I see some some even take it as real news :D [00:00] shits gonna go viral :D [00:01] *** jspiros has quit IRC (Read error: Operation timed out) [00:03] 'Irish villagers" lol ...you know , the kind with one pub in town, one logal drunk; and massive Viagra factory :D [00:04] sad thing is, these things might need archiving.. There's a propsoal for 5 years imprisonment for 'sharing fake news' in Ireland :/ [00:07] *** jspiros has joined #archiveteam-bs [00:08] CoolCanuk: i kind of hope this a fake news paper as well (it's not though) https://www.independent.ie/irish-news/politics/five-years-in-jail-for-spreading-fake-news-under-ff-proposal-36375745.html [00:08] nope [00:09] 'biggest selling newspaper to Irish in Britain' might need archiving then [00:09] lmao [00:09] also, why cant the fake news sites just say theyre fake [00:10] theyre allowed to be creative. parody and such is nice. But taking away their creativity is not [00:10] it people know they are, it just exploded kind of account of that proposal :d [00:10] i think* [00:10] is fanfiction illegal too? conspiracy? predictions? [00:10] :P [00:10] aye [00:11] writing a parody of a new article that catches on, 5 years in prison risk...that is bad [00:11] news* [00:12] I see what theyre trying to stop [00:12] aye. But it's like throwing the baby out with the bath wather though :/ [00:13] but I hope it doesnt impact people from sharing news. Sharing news you wrote and know is fake is one thing, but people have a hard time knowing if it's fake [00:13] yeah, but "sharing" it? [00:13] like, people would be uncertain if the news is real, so newspapers die because no one shares and eventually everyone forgets [00:13] I share news sometimes [00:14] who doesn't ? [00:14] I care about the environment, so if there's a report that the EPA head person isn't sure how much lead is safe in water (hint: NONE), I might share it [00:15] well, if you reside in Ireland, you might want to make sure it's factual and real.. :D [00:15] ye [00:15] then [00:15] there's the issue of accidental fake news [00:15] like a news reporting a crash on a highway, but it's for a movie skit and they didnt know [00:16] anyway, before i stray. There's no way that 'news site' is real..But, it's funny :D [00:16] :P [00:17] http://paste.nerds.io/ appears to be gone, and we link to some pastes. Nice -_- [00:17] the world needs funny stuffs [00:18] pastebin is no less prone than t.co :/ [00:19] some morning some nerd has the idea "Hey! Let's change subdomain!" ... [00:20] https://archive.org/details/manga_library?and%5B%5D=addeddate%3A2017-12%2A&sort=-publicdate&page=1 [00:20] 1,000 in. [00:20] 102,000 to go. [00:20] *** Ing3b0rg has quit IRC (Ping timeout: 260 seconds) [00:21] *** icedice has joined #archiveteam-bs [00:22] *** Ing3b0rg has joined #archiveteam-bs [00:28] in wget documentaion, anyone know what "--follow-tags=LIST" might mean? could that mean "alt=" in url links ? [00:29] --follow-tags=LIST comma-separated list of followed HTML tags. [00:30] not sure what is meant by HTML tags [00:31] *** kristian_ has joined #archiveteam-bs [00:33] *** bubblymic has joined #archiveteam-bs [00:33] hi [00:33] hello [00:34] *** bubblymic has quit IRC (Client Quit) [00:38] seems like --follow-tags is just , etc... [00:40] i wish we had svg uploads so bad >:( [00:40] png sucks [00:50] what are the domains that twitter user internally? (apart from t.co) is it just twitter.com and twimng.com ? [00:50] uses* [00:51] twimg.com [00:52] aye , ill trust this list https://github.com/dyne/domain-list/blob/master/data/twitter [00:53] sick and tired of wget crawling where it shoudlnt [00:55] --exclude-domains can even keep it from trying t.co etc.. [00:55] kids tv says "must be 18 years or older to log on" when mentioning a website [00:55] wget is gold stuff it seems [00:55] but 99% of the websites dont require you to login xD [00:55] Compliance hack :-P [01:00] CoolCanuk: just put in https://i.imgur.com/atGxO9o.jpg [01:00] to the archive? [01:00] internet archive asked if you are 18? [01:01] CoolCanuk: you were asked to verify age on IA ? [01:01] no. [01:04] i missed the "kids tv" part [01:04] *** ola_norsk has quit IRC (I blame beers!) [01:15] *** icedice has quit IRC (Quit: Leaving) [01:22] *** icedice has joined #archiveteam-bs [01:25] *** icedice has quit IRC (Client Quit) [01:26] *** icedice has joined #archiveteam-bs [02:09] *** Stiletto has quit IRC () [02:24] *** Stilett0 has joined #archiveteam-bs [02:27] *** kristian_ has quit IRC (Quit: Leaving) [02:38] *** pizzaiolo has quit IRC (Remote host closed the connection) [02:40] *** Stilett0 is now known as Stiletto [02:56] *** icedice has quit IRC (Quit: Leaving) [02:58] *** Ing3b0rg has quit IRC (Ping timeout: 248 seconds) [03:04] *** Ing3b0rg has joined #archiveteam-bs [03:15] why is this bot in this channel? [03:16] someone thought it was a good idea [03:16] not sure who [03:19] *** icedice has joined #archiveteam-bs [03:19] *** icedice2 has joined #archiveteam-bs [03:22] *** icedice2 has quit IRC (Client Quit) [03:28] *** icedice has quit IRC (Quit: Leaving) [03:42] *** BlueMaxim has joined #archiveteam-bs [04:02] *** bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…) [04:26] *** qw3rty118 has joined #archiveteam-bs [04:32] *** qw3rty117 has quit IRC (Read error: Operation timed out) [04:41] it's usually a good idea [04:42] nothing wrong with editing the wiki [04:42] thanks for the work, CoolCanuk! [04:42] hey who wants to work with me in the next few hours on compuswerve [04:42] er, compuserve [04:43] i'm gonna go do my laundry first. seeking: anyone who knows how to write a seesaw/pipeline script [04:47] *** voidsta has joined #archiveteam-bs [04:47] *** voidsta has quit IRC (Connection closed) [04:52] *** voidsta has joined #archiveteam-bs [04:54] Lol it was hardly any work but :) thx [04:56] *** voidsta has left [05:18] vidme backup starting tomorrow? [05:20] hm? [05:40] *** dashcloud has quit IRC (Read error: Operation timed out) [05:40] *** dashcloud has joined #archiveteam-bs [05:43] *** voidsta has joined #archiveteam-bs [06:11] *** voidsta has quit IRC (Quit: leaving) [06:17] *** voidsta has joined #archiveteam-bs [06:42] *** voidsta has quit IRC (Quit: leaving) [06:53] *** bwn has quit IRC (Ping timeout: 260 seconds) [07:02] *** CoolCanuk has quit IRC (Quit: Connection closed for inactivity) [07:08] *** bwn has joined #archiveteam-bs [07:10] *** CoolCanuk has joined #archiveteam-bs [07:33] that's what arkiver said before going to sleep [07:50] *** Boppen has quit IRC (Ping timeout: 186 seconds) [07:51] *** ReimuHaku has joined #archiveteam-bs [07:52] *** Boppen has joined #archiveteam-bs [07:55] JAA, ola_norsk: The code using Chrome seems to be working fine. Just needs some automated tests and a distributed task queue. I’m looking at Kue and Celery. Suggestions are welcome. [07:58] *** ReimuHaku has quit IRC (Read error: Connection reset by peer) [07:59] *** ReimuHaku has joined #archiveteam-bs [07:59] *** tuluu has quit IRC (Quit: No Ping reply in 180 seconds.) [08:02] *** tuluu has joined #archiveteam-bs [08:12] *** dashcloud has quit IRC (Ping timeout: 260 seconds) [08:18] *** wp494_ has joined #archiveteam-bs [08:21] *** dashcloud has joined #archiveteam-bs [08:25] *** wp494 has quit IRC (Ping timeout: 492 seconds) [08:25] *** wp494_ is now known as wp494 [08:51] *** jschwart has joined #archiveteam-bs [09:12] *** CoolCanuk has quit IRC (Quit: Connection closed for inactivity) [09:36] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [09:47] *** nyaomi has quit IRC (Ping timeout: 250 seconds) [09:57] *** nyaomi has joined #archiveteam-bs [10:06] *** Stilett0 has joined #archiveteam-bs [10:27] *** pizzaiolo has joined #archiveteam-bs [11:32] *** BlueMaxim has quit IRC (Quit: Leaving) [11:44] The bot is an important part of this. [13:10] SketchCow: How many requests is the wiki getting roughly? I think you mentioned 100k visitors (unique accesses, or requests?) per month some while ago; is that still accurate? If so, that's only one request per 25 seconds... [13:12] (On average, obviously.) [13:17] Just to avoid misunderstandings: I'm grateful that you're paying and running this! It's just really frustrating to do anything on the wiki currently, and I think we can do better. [13:19] What I need to do is sit down and see how the whole thing is structured. Probably with Astrid. [13:19] And then have something to figure out what to fix. [13:19] I'm in SF all next week, likely then [13:20] Ah, here we are. [13:21] I have a cpanel login into the machine. [13:21] Cpu Usage is 99/100 for some reason. [13:21] Memory is 419/1048 so that's fine. [13:21] Processes is 10/20 and Disk space is 2.62gb/9gb, also find. [13:22] Bandwidth is at 13gb out of a terabyte a month. [13:22] Looks like we shoot out about 100gb a month in bandwidth [13:23] Linux Journal is going away https://twitter.com/linuxjournal/status/936679052370481154 get the archive here: https://secure2.linuxjournal.com/pdf/dljdownload.php [13:23] how much are you paying SketchCow [13:23] Something is blowing up memory usage. [13:23] The wiki is [13:23] YES THE WIKI IS [13:23] mediawiki is a memory hog overall [13:23] THANKS SHERLOCK [13:23] :P [13:23] LOOK HERE MISTER [13:24] I can seriously stop talking about this. [13:24] I'd rather, I don't know, go outside and sort leaves by color [13:24] Or put a nail into my eye [13:24] I'm giving you shit SketchCow, I've ran a few larger mediawikis, I can profile it and see whats going on [13:25] Up to you [13:25] I'm not in the mood for being given shit [13:25] I'm mostly in the mood to report on stats people were asking for. [13:25] <3 [13:25] Monthly Statistics for November 2017 [13:25] Total Hits 4034995 [13:25] Total Files 3478298 [13:25] Total Pages 2480175 [13:25] Total Visits 414542 [13:25] Total KBytes 102906318 [13:25] Total Unique Sites 271552 [13:25] Total Unique URLs 8952 [13:25] Total Unique Referrers 23884 [13:25] so, 271,000 unique visitors [13:25] thats not too bad [13:26] /month? [13:26] that's november. [13:26] Monthly Statistics for December 2017 [13:26] Total Hits 647812 [13:26] Total Files 521735 [13:26] Total Pages 431106 [13:26] Total Visits 85111 [13:26] Total KBytes 16705965 [13:26] Total Unique Sites 43119 [13:26] Total Unique URLs 2849 [13:26] Total Unique Referrers 5524 [13:26] Total Unique User Agents 5849 [13:26] So far it's been 43,000 this month. [13:26] Note total kbytes. [13:27] Yeah, I agree, that doesn't look too bad. [13:27] I don't know much about MediaWiki though. [13:27] But 1.5 hits per second on average should be quite easy to achieve. [13:27] I'm sure something is set low. [13:28] Now, my hosting bill [13:28] odemg: Thanks for that, I'll throw it into ArchiveBot. (We're already grabbing the entire website through there as well.) [13:29] $239.40 USD a year. [13:29] only thing I can suggest is turning on php7.0+ [13:29] it will help a ton with mediawiki [13:30] Speaking of stats, CloudFlare are trying to push their enterprise plan on me now: https://imgur.com/a/oljZt [13:30] PHP version: 5.6.30 [13:30] $20 per month? That seems expensive. But I don't know US pricing too well. [13:30] Hrm, a dreamhost account is 119 [13:30] Ah, see, this is where you are all still children [13:31] you think $ + ? = $ [13:31] ya, if you can turn on php7.0 then it will help a ton [13:31] I have a host here who hosts textfiles.com, as well as you [13:32] When I was come after for having bomb info, and they blocked my entire ISP/host for me having bomb info, this host moved me to a new subnet, claimed he'd banned me (so all his other customers weren't affected), and won't blink. [13:32] The australian government has called me a terrorist link [13:32] He has stood firm [13:32] I take loyalty seriously. [13:32] So $239/yr is just fine [13:32] We've not had anyone go after archiveteam to shut us down recently, but if they do, we have a sandbag levee in place [13:33] I'll look into php 7.0 [13:33] See, that's why I wrote that it "seems" expensive. I wasn't aware of that. [13:33] You're not supposed to have to be [13:33] A lot of delightfully dark forces have things to say to me a lot [13:34] Also, I hate adminniing [13:34] I'll choose one of you fucksticks to take it over, but I'd prefer to meet you first [13:35] I'm still one of those oldschool "what am I dealing with" people [13:35] This shit https://parazite.the-eye.eu/ gets me many angry emails, not quite been called a terrorist yet though [13:35] The sex with dogs FAQ on textfiles.com gets a lot of hatemail but they don't know where to send the hate mail [13:35] In other news, I went to a dentist for the first time since 2006 and that went well [13:35] One cavity [13:36] It's the sex with corpses that does it for us [13:37] well if you are ever up here in NY, come down to interlock, I hang out there all the time, I hear you visited once [13:37] Everyone has their waterloo [13:37] Are you fucking down the street from me [13:37] I'm in Beacon [13:38] I guess we have the dog sex thing as well, I barely know what included, the site is much less organised and sane then textfiles: https://parazite.the-eye.eu/files.html#notpopsex [13:39] Also, as a couple of you already know, I'm uploading 102,000 issues of manga. [13:39] 2,663 issues so far [13:40] oh man, thats a drive, but not too bad [13:41] Oh, Rochester. [13:41] ya, im in Rochester [13:41] I always confuse interlock and the poughkeepsie guys [13:41] lol [13:42] I'm rebuilding their website for interlock since its a little bit in disarray and no one wants to take it up [13:48] also SketchCow, I'm still open to anything you need digitized over at strong, wouldn't mind spending my weekends doing that [13:50] *** Valentine has quit IRC (Read error: Operation timed out) [13:50] I'll try and figure out how to get access to the machine via ssh [13:50] Sounds like an update should happen, and I should hand admin keys over to someone or another [13:51] *** Valentine has joined #archiveteam-bs [14:01] sounds like a plan, also vid.me should be starting today as well [14:02] Yep, and we should get Roblox going ASAP too. [14:04] holy shit they are closing to [14:04] Yeah [14:04] hrm, [14:04] that will be interesting to save [14:05] We have the code, but wp494 noticed that the old posts (from the sections they "deleted" a few months ago) are still online as well, just not easily discoverable. [14:05] If we wanted to grab everything, we'd have to try ~230 million post IDs. [14:05] I'm not sure if that's feasible. [14:05] maybe with a discoery warrior [14:05] discovery [14:06] There is no way to discover posts in the hidden sections though. [14:06] is there anyway to get to them [14:06] You can only access it by the post ID as far as I can see. [14:06] ah [14:06] so [14:06] discovery would legit be, try block 10000-20000 [14:06] And if we iterate over all possible post IDs, we'd grab everything multiple times, too. [14:07] ya, some dedupe would be needed [14:07] Let's move this to #robloxd. [14:15] *** ZexaronS- has joined #archiveteam-bs [14:15] *** ZexaronS has quit IRC (Read error: Connection reset by peer) [14:17] *** ZexaronS- has quit IRC (Client Quit) [14:19] *** superkuh has quit IRC (Remote host closed the connection) [14:21] *** superkuh has joined #archiveteam-bs [14:45] *** bithippo has joined #archiveteam-bs [14:49] My postgres database of parsed Miiverse posts is now sitting at around 200 GBs. Will probably end up being 250~ or so when it's over. [14:50] The DB should be done by the end of the week, and the web frontend should hopefully be done by the end of the month. [14:51] so i made a linux journal archive 2017 zim file [14:52] It should be pretty cool, since I can show all the posts in chronological order for all content, something you couldn't do on the site when it was online. And hopefully people will stop asking where they can find their posts; they can just search for them on this. [14:58] *** MrDignity has quit IRC (Read error: Operation timed out) [15:38] How does one join the Archiveteam? [15:40] *** kimmer2 has joined #archiveteam-bs [15:41] *** Stilett0 is now known as Stiletto [15:42] DrasticAc: Sound great! [15:42] sec0nd: By being here and sharing our common goals. There is no membership signup form. ;-) [15:54] sec0nd: You're already in Archive team [16:04] i'm at 3957 items right now for this month: https://archive.org/details/@chris85?and[]=addeddate:2017-12 [16:12] *** Dimtree has quit IRC (Ping timeout: 506 seconds) [16:16] *** bithippo has quit IRC (Read error: Connection reset by peer) [16:16] Unstoppable [16:17] I need help from someone on an easy way to pull files [16:18] Well, let me take one more shot at it. [16:36] *** pizzaiolo has quit IRC (Ping timeout: 246 seconds) [16:37] *** ranavalon has quit IRC (Read error: Connection reset by peer) [16:37] *** ranavalon has joined #archiveteam-bs [17:24] *** ranavalon has quit IRC (Read error: Connection reset by peer) [17:24] *** ranavalon has joined #archiveteam-bs [17:32] *** CoolCanuk has joined #archiveteam-bs [17:33] *** ranavalon has quit IRC (Read error: Connection reset by peer) [17:34] *** ranavalon has joined #archiveteam-bs [17:52] *** jschwart has quit IRC (Read error: Connection reset by peer) [17:53] *** jschwart has joined #archiveteam-bs [18:19] *** icedice has joined #archiveteam-bs [18:19] *** icedice has quit IRC (Connection closed) [18:35] *** w0rp has quit IRC (Read error: Operation timed out) [18:37] *** w0rp has joined #archiveteam-bs [19:24] *** ld1 has quit IRC (Quit: ~) [19:36] *** ld1 has joined #archiveteam-bs [20:06] *** ld1 has quit IRC (Ping timeout: 260 seconds) [20:06] *** ld1 has joined #archiveteam-bs [20:35] oh wow, look who's colleague made the news [20:35] https://globalnews.ca/news/3897163/airbnb-hidden-camera-room/ [20:35] Hehe [20:36] Global news reports everything [20:37] Best free resource to learn Python? [20:37] Id love to be able to help code warriors etc one day [20:38] *** ZexaronS has joined #archiveteam-bs [20:44] *** Arctic has joined #archiveteam-bs [20:44] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [20:45] *** Arctic has quit IRC (Client Quit) [20:45] *** Arctic has joined #archiveteam-bs [20:45] *** Arctic has left [20:48] *** Arctic has joined #archiveteam-bs [20:48] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [20:49] o.O [20:56] sorry Arctic - i just realised where that quote comes from! :-) it has been a long time....and i need a cup of tea [20:57] one moment Arctic. Please stand by [20:57] an op will be with you shortly [20:57] What do you plan on editing Arctic [20:57] check #archiveteam , jrwr :) [20:59] *** schbirid has joined #archiveteam-bs [21:00] I was actually going to propose a project. [21:09] *** Arctic has quit IRC (Quit: Page closed) [21:09] *** ola_norsk has joined #archiveteam-bs [21:10] hopefully he comes back :( [21:10] lolwat [21:10] *** schbirid has quit IRC (Quit: Leaving) [21:11] *** Mateon1 has quit IRC (Read error: Operation timed out) [21:11] *** Mateon1 has joined #archiveteam-bs [21:11] we cant do much since it's behind cloudflare anyway lol [21:15] Eh, CloudFlare isn't too hard to defeat. [21:15] ohoho [21:15] :P [21:15] We have the code, just need to port it to Python so we can integrate it into wpull and do fun things with it. [21:15] orly [21:16] You basically just need to evaluate a JSFuck string. [21:16] I guess you haven't been introduced to this https://www.cloudflare.com/products/cloudflare-warp/ (: [21:16] no direct access to origin that way, only via cloudflare. :( [21:17] Yes, I'm talking about access via CF. [21:17] o [21:17] There might be other things in place, but the default "Just checking your browser" thing can be cracked easily. [21:18] if you're suspected as a bot, you might receive a captcha. [21:18] your IP can grow to have permanent captchas over all cloudflare sites unless whitelisted. [21:18] *** BlueMaxim has joined #archiveteam-bs [21:19] Yeah, I've seen those before. I haven't had any issues so far though. [21:19] I'll set up a cloudflare environment we can test in ;) including warp [21:30] CoolCanuk: I think that is where those 'earn bitcoin by solving captchas'-websites come in :d [21:30] something like 2captcha is an option, yes. [21:32] no offense, but it's kind of nasty business though, to put it mildly ;) [21:33] literally often involving child labor.. [21:33] we can possibly use https://www.ghacks.net/2017/09/18/wave-goodbye-to-cloudflare-captchas-cloudflare-edge-pass-lands/ . (we'd need to re-code it into wput or something) [21:33] data entry is not child labour :P [21:34] if they can recognize letters, they can do the task of punching them in :/ [21:34] anyway, my joke got too dark [21:35] a little lol [21:35] plus I cant find any proof [21:36] what do you think the 'tech support scam centers' do? [21:36] while their having people on the phone i mean.. [21:38] there's no way it would be a thing unless there was profit in it..and the profit is fucking small [21:39] anyway, this is unrelated to what you guys were talking about :D and my joke is now making me queezy :/ [21:41] ..it's an option though, albeit an unethical one ;) [21:41] :P [21:46] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [21:47] *** Arctic has joined #archiveteam-bs [21:47] I'm here. [21:48] So how are we going to do this? [21:48] Projects are usually quite large. We might use archivebot to just crawl [21:48] Okay. [21:48] Is there a way to use that with Android? [21:48] we do it on our side :) [21:49] *** BlueMaxim has joined #archiveteam-bs [21:49] Okay. [21:51] archivebot might not work too well, especially if there's not an easy way to list all posts [21:51] noticed today that ~12 hours of a twitter hashtag capture = ~250MB [21:52] I'll see if Arian will hand over the data. [21:52] you're right. there is infinite scroll.. ugh [21:53] if anyone is good at math, what would that entail, back to to ~2014-2016? [21:53] It actually goes back to sometime in September 2017. [21:53] twitter usage might be less in 2014 ;) [21:53] Oh, it's Twitter. [21:53] aye [21:54] 2 convos at once. Yours is still important :) [21:54] Okay. [21:54] I was just confused. [21:54] forget about damn twitter, it's just bullshit there anyway [21:55] It seems that you can access the posts through https://closed.pizza/posts/$postid, and $postid is only ~180k by now. [21:55] What's the Twitter thing about anyway? [21:56] I'll try to find user id [21:56] CoolCanuk: btw, for infininte scroll https://github.com/webrecorder/webrecorder [21:56] Communities are easy to find. [21:56] Where are you finding this? [21:56] CoolCanuk: sadly their only service are limited to 3 hours of scroll [21:57] a lot of manual work lol [21:57] no, press "Auto scroll", and it goes [21:57] oh [21:57] How can we use the $postid URL? It doesn't show all posts. [21:57] https://closed.pizza/posts/1 https://closed.pizza/posts/2 [21:58] Arctic: Not literally $postid, but https://closed.pizza/posts/1, https://closed.pizza/posts/2, etc. [21:58] (pretend 2 works) [21:58] Oh, I see now. [21:58] It might not grab all comments to a post though. [21:59] can use offset [21:59] https://closed.pizza/communities/7?offset=12 or https://closed.pizza/communities/7?offset=50 [22:00] Can you access closed.pizza/users/$userid ? [22:00] Anyway, this is probably done best in a warrior project rather than ArchiveBot or wpull. [22:00] But we're kind of busy in that area right now due to vid.me, Roblox, CompuServe, and Wine.Woot shutting down. [22:00] Is there a way to use Warrior with Android or within a web browser? [22:01] Roblox is shutting down? [22:01] So unless this platform is in immediate danger, I'd say we postpone it until we have more time and resources. [22:01] Only the forums, but yeah. [22:01] 230 million posts or something. [22:01] Woah. [22:02] Okay. [22:02] Arctic: if that Android device has the abilitiy to run virtual machines, it might [22:02] dont forget sears @JAA :P [22:02] I don't believe so. [22:02] JAA: which of these are warrior projects? compuswerve at the least is an archivebot job [22:03] Sears is shutting down? [22:03] https://sears.ca Canada only [22:03] CoolCanuk: Well yeah, but I don't see what we can do in terms of warrior projects there. The ArchiveBot job is doing what it can. [22:03] * https://www.sears.ca/ [22:03] Arctic: don't be so quick to dismiss though.. [22:04] astrid: Right. I'm still not convinced that the ArchiveBot job will capture everything though. Wine.Woot is also only ArchiveBot at the moment, but I'd like to make a project out of it because that job will likely not grab everything either. [22:04] I won't. Once again, I'll contact Arian soon to see if he'll hand over an archive of Closedverse data. [22:04] Arctic: https://github.com/limboemu/limbo [22:04] JAA: i looked into it the other day and convinced myself that archivebot *would* get all of compuswerve [22:04] => #compuswerve for further of this [22:04] Arctic: maybe that is something? [22:05] Is there a way to run programs in it? [22:05] If so, it'd be a godsend. [22:06] it's qemu, that's all i saw..so...yes...i think.. :D [22:06] cant help any more than that :D [22:06] Okay. Thanks! [22:07] i'm not saying it's a good idea though :/ [22:07] Hm... [22:08] I'll see how well it works. [22:09] Yeah, probably not a good idea. [22:10] You're probably right. [22:10] Not much storage (some projects can have quite large item sizes), very energy-inefficient I'd guess, etc. [22:11] it would depend on the device, wouldn't it? But, a phone or a tablet.. that's gonna hurt :/ [22:12] if anyone has a spare Oya box, maybe they could test it? [22:14] i'm putting my chips on it not being a good idea though [22:14] Ah true, Android is used everywhere nowadays. I always forget that. [22:15] I could probably see if I can split the archive and put it up for download. [22:18] I'll try Webcorder [22:18] . [22:19] Maybe this is a solution..a shitload of cheap, otherwise discarded' Ouya Boxes, placed all around the world. [22:20] We'll call it... OuyaNet... Or something. [22:21] aye.. There's playstation supercomputers, so why not? https://phys.org/news/2010-12-air-playstation-3s-supercomputer.html [22:21] So, Ouya supercomputer? [22:22] capture-puter [22:22] all running Warrior.. [22:24] A peer to peer network of PCs all running Warrior and forming a supernetwork to store archives? [22:24] there are some plans afoot to revamp archivebot to allow it to run on warrior-style disposable untrusted machines [22:25] Interesting... [22:25] The more we go on with this, the more confusing it gets! [22:25] it's a lot of engineering effort and people are spread pretty thin [22:27] It's still an interesting proposal. Maybe I can ask around on the Lost Media Wiki. They are interested in rediscovering lost media and archiving media before it becomes lost. [22:28] We'll need a GUI frontend for each computer and code for the backend to connect each computer for the supernetwork. [22:29] DHT? [22:29] What's that? [22:30] Arctic: https://en.wikipedia.org/wiki/Distributed_hash_table [22:30] it's e.g how torrent clients find eachother without trackers (i think) [22:31] Ah. Seems promising. [22:31] probably fancier than we need [22:31] Yeah. [22:32] We'll probably still want something central to be able to keep track of what's running etc. [22:32] there's nothing wrong with running a central node [22:32] makes design a lot simpler [22:32] Yep. [22:32] Indeed [22:33] all we need is some geniuses to put it into action! https://youtu.be/EPHPu4PV-Bw [22:33] Decentralized warriors...how would that work? [22:34] let's not waste time on that right now [22:34] optional tasks? + other tasks? :D [22:34] aye [22:34] But where would we find some?... [22:34] The Genius Bar maybe... [22:35] Archiving Closedverse? [22:36] For now, we need to focus on Roblox (6 days) and Vidme (10 days). [22:36] Okay. What can I do? [22:36] CompuServe might work through ArchiveBot, though I'm not sure yet if we'll manage to get everything in time. [22:36] Not much at the moment I'm afraid. [22:37] You can join the relevant channels, #robloxd and #vidmeh, if you want to follow the progress and discussions. [22:37] Sure. I might soon. [22:39] FWIW the vidme project is supposed to start up at some point today [22:39] Would using Archive.org to archive work? [22:40] We are pushing our data to IA, but we're doing the actual archival directly. [22:40] that's where our stuff will end up anyway! [22:40] wp494: that's good news seeing as i paid $1 dollar to boost this an hour ago :D https://www.minds.com/newsfeed/784890241298210818 [22:41] perfect timing, to say the least...hope someone notices it [22:42] I would love to try to port Warrior to Android, but I'm still learning HTML and CSS. [22:43] Before we port the warrior to new platforms, I'd say we need to upgrade it to a modern system (and ensure that it stays like that). [22:43] Android is pretty much Linux is it not? But, i'm thinking even the majority of tools used would require rooted Android device [22:43] Warrior on ARM would be pretty interesting but cellular carriers love to screw with stuff and restrict stuff too [22:43] I believe the VM is still running on an unpatched six-year-old system. [22:43] As in, unpatched kernel and everything. [22:44] I have a 4th generation Kindle Fire HD 7. Would that work? Would you need the OS version? [22:45] if it can run a virtual machine, it could do it..But, not a good idea i would say [22:46] Ah. [22:46] Is there anything else I can do? [22:47] i'm not running Warrior, i can't answer that question. I just manually archive shit :D [22:47] *** dboard2 is now known as dboard [22:48] in 'who we are' on archiveteam.org .. I'd be classified as 'Loudmouth' :d [22:48] How do you manually archive stuff? [22:48] e.g web.archive.org/save/ [22:49] Oh. [22:49] or uploading pdf's or using e.g youtube-dl [22:49] ..or even just subtitling shit.. [22:50] I know this sounds stupid, but can you provide an example of the subtitling? [22:50] aye..one sec [22:51] Okay, thanks. [22:51] Arctic: https://archive.org/details/Filmavisen_1941_08_25 [22:52] and no i did not steal the video, it belongs to norwegian people, and that includes me :d [22:52] Oh, you mean subtitling videos! [22:52] yeah [22:53] SRT files [22:54] Why are the Roblox forums shutting down? [22:54] these are not even mine, sometimes i ask whoever i know that speaks the language to be translate https://github.com/DuckHP/archive_org_subtitles/tree/master/Misc [22:55] Arctic: https://forum.roblox.com/Forum/ShowPost.aspx?PostID=228429979 [22:56] Arctic: i use this program to make the subtitles http://www.gnomesubtitles.org/ [22:57] Cool. [22:57] So Roblox mods are re-building the forumd? [22:57] Re-build as in massacre, right? [22:57] More like adding different features and thus nuking the forums. [22:57] The groups aren't publicly accessible, by the way. [22:58] You need at least an account. [22:58] Welp, as if Roblox wasn't already fucked. [22:58] Arctic: when coming across people willing to help transcribing, this is valuable tool http://otranscribe.com/ [22:59] Sounds great! [23:02] By the way, Archive.is ignores ROBOTS.TXT so we can archive websites Archive.org can't. [23:03] We also ignore robots.txt in all our efforts. [23:03] This is how I archived a lot of old Nintendo of Europe Flash promos in collaboration with members of the Lost Media Forums. [23:04] There's a page on it on the wiki. [23:04] I know, it's juat easier to archive that way. [23:04] (If you get it to load.) [23:04] I read the page. [23:04] *just [23:05] It's unfortunate that IA follows robots.txt, but there are good reasons for it I'm sure. [23:05] wget -q --limit-rate=256k --user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.89 Chrome/62.0.3202.89 Safari/537.36" --delete-after --page-requisites -e robots=off "https://web.archive.org/save/https://twitter.com/hashtag/netneutrality?f=tweets" [23:05] (Mostly legal reasons, I think.) [23:05] is it not the '-e robots=off' that prevent that? [23:05] Not sure. [23:06] Yes, but in this case it's about ignoring IA's robots.txt. [23:06] mhm [23:06] (Which prevents you from saving the page requisites with this technique.) [23:06] If you'd like, I can send you the link to the post about Nintendo of Europe's old flash promos if you want to see them. [23:07] There are several people in here dealing with those kinds of things, so sure. [23:07] Okay. Be right back. [23:07] archive.is seems kind of selfish .. http://web.archive.org/web/20171205230717/http://archive.is/ [23:08] What's most annoying re: robots.txt are the domain squatters. [23:08] oh, cloudflare [23:09] What do you mean by archive.is seeming selfish? [23:09] Parking domains often have a generic "disallow all" robots.txt, and that retroactively blocks all pages under that domain on IA... [23:09] trying to /save/ an archive.is fails [23:10] *** qw3rty119 has joined #archiveteam-bs [23:10] Arctic: ideally, they should be uploading their shit willingly.. [23:10] Yeah, I hope when archive.is shuts down, the owner will be open to donating the collection to IA directly or something. [23:10] Even though it's in a very different format, so integrating it into the Wayback Machine might prove challenging. [23:11] Here is the Nintendo flash promo link: http://forums.lostmediawiki.com/thread/1267/nintendo-flash-games-ca-2006 [23:12] What do you guys think? [23:13] what in the world are you talking about [23:13] Arctic: http://web.archive.org/web/20170311070231/http://archive.is/ [23:13] there are entirely too many conversations here [23:13] and the loudest one wins, which makes me uncomfortable [23:14] astrid: i think think it's just 2-3 topics going on [23:14] *** qw3rty118 has quit IRC (Read error: Operation timed out) [23:14] yes and it's all super eager people who don't understand how archiveteam works but really want to change things [23:14] it's overwhelming [23:14] robots.txt , and related to archive.is' selfishness..and some other thing [23:15] How did we go from archiving Closedverse to how Archive.is is selfish? [23:15] robots.txt and cloudflare [23:15] Sorry, I'm new to all of this. [23:15] we know :) [23:16] so am i, sorry for spamming [23:17] So... What now? [23:18] Arctic: if archive.is could be done by wget.. [23:19] or webrecorder.io [23:20] Sounds good. [23:20] ola_norsk: I'm sure that's possible. But be aware that archive.is is several hundred TB in size. Webrecorder doesn't actually store the stuff on their servers (long-term, I mean) as far as I know. [23:21] oh i just realized why it feels like everything is being lit on fire this month [23:21] JAA: so do so many use archive.is? :/ [23:21] JAA: why* [23:21] it's because it's the end of the calendar year and lots of more or less unprofitable shit is being lit on fire for tax reasons [23:21] Yeah [23:22] For some reason, a lot of websites are shutting down on December 15th. [23:22] it's before all the employees leave for christmas [23:22] christmas bonuses are expensive [23:22] and it's a friday [23:22] ola_norsk: Because archive.is can grab pages behind robots.txt, handles Google Cache very well, is capable of dealing with JavaScript, and various other reasons. [23:22] It's a quite good platform really. Just a shame that it isn't open source and that it isn't saved as WARCs (as far as we know). [23:23] so far as we can tell, it saves the DOM of the page, not the resources used to construct it [23:23] JAA: check out webrecorder.io .. I'm pretty sure i saw 'java, flash' in one of the browsers they offer to use :/ [23:23] AOL is shutting down Plaxo by the way. [23:23] Yep, that's probably correct. [23:24] ola_norsk: I'm aware of Webrecorder. I was just explaining why so many people use archive.is. [23:24] I mean Comcast. [23:24] Arctic: hmm, thanks for the heads-up. does that still have any public content? [23:24] Webrecorder doesn't offer keeping the saved pages for others to see, so that's a big -1 for many people. [23:25] JAA: when registered they do..but yeah, i see that point [23:25] JAA: and no pdf download i think, only the WARC [23:25] What's WARC? [23:27] Arctic: https://en.wikipedia.org/wiki/Web_ARChive [23:27] Also http://archiveteam.org/index.php?title=The_WARC_Ecosystem [23:28] Sounds good. [23:28] why doesn't archiveteam.org utilize redirection to wayback when failing to load? [23:29] *** jschwart has quit IRC (Konversation terminated!) [23:29] Not sure. [23:29] Is there a way to use WARC files on Android? [23:30] ola_norsk: Why would it? The information there would be outdated for many pages anyway. [23:31] I have a channel similar to this on my satellite tv. wtf? http://teletext.mb21.co.uk/gallery/ceefax/bbc-world-210a-032000.gif [23:31] And stuff will be done to improve the performance of the wiki. [23:31] Cool. [23:31] Teletext, nice. [23:31] ya [23:31] I wonder if there are any archives of that. [23:32] JAA: it does present sensible text. I'm not sure how often the wiki changes. But good point. [23:32] is it axtual text that comes to the tv? [23:32] this appears to be just images [23:32] https://en.wikipedia.org/wiki/Teletext [23:32] Short story: it's complicated. [23:32] it looks ecactly like this http://1.bp.blogspot.com/-MqF4BKkQ_mE/VOvxolBzuZI/AAAAAAAAGWI/bFWfZAFsOBw/s1600/retrotextnews.jpg [23:32] *exactly [23:33] Interesting. [23:34] I just read the convo in #archiveteam about closeverse. Would be sad for the Miiverse community replacement to go down so soon. [23:34] *** Arctic_ has joined #archiveteam-bs [23:34] I'm back. [23:35] Accidentally closed the tab with the channel. [23:35] no problem :) [23:36] Is it possible to use WARC files on Android devices without a virtual machine? [23:36] Arctic: webrecorder.io does allow upload of WARC files. It might work in any browser of Android devices. [23:37] How do I make WARC files on Android? [23:37] webrecorder.io would be my best bet :D [23:37] Yeah, likely. [23:37] *** Arctic has quit IRC (Ping timeout: 260 seconds) [23:37] Basically every software we use is written for real computers. [23:38] Real computers? [23:38] ;-) [23:38] ...JAA being a devicivist.... [23:38] Mostly unixoid systems, really. Some stuff might also work on Windows to some degree. [23:38] :D [23:39] Welp... [23:39] there's debian for Wii...just sayin' :D [23:40] ...butwhy.gif [23:40] I don't have an SD card to use with my Wii U. RIP. [23:40] why not? :D [23:40] The stuff I wrote for the Miiverse stuff was in .NET, run on Windows, but it should work in Mono. [23:41] JAA: why not? [23:41] Arctic_: i meant 'old wii' [23:41] Sure, I guess. [23:41] There is a Wii mode on the Wii U that most Wii hacks work with. [23:42] debain write speed on wii and wii u are too slow [23:42] anyway, it's not a good idea even if it works..https://archive.org/details/iaCSS64_test .. 'two layer emulation' as someone here pointed out [23:42] I just don't see why you'd want to install a generic OS on a platform specifically designed for gaming, but whatever. [23:42] Any options for Android? [23:43] (Other than to figure out how to do it.) [23:43] I'm trying to figure that out. [23:44] I'm sure you can build something that kind-of works on Android. [23:44] Whether that's worth the time and effort is another question. [23:44] And whether you'll want to use it afterwards. [23:44] True. [23:44] Why do you want to? [23:44] Arctic_: i'm unsure what you mean by options. There are certianly options, but if it's a Kindle device, i'd say it's a bad idea to run a virtual machine on it that's running Warrior [23:45] You could always rent a small VPS, use some SSH app to connect to it, and then do whatever you want there. [23:46] I just want to find a way to help archiving efforts with my shitty Android device. [23:46] I doubt that's possible without a *lot* of work. [23:46] JAA: that's bollocks.. [23:47] Yeah, you can archive stuff, but not in the context of ArchiveTeam really. [23:47] as I'm sure you've all seen, Archive.org currently has someone matching donations 3 to 1. Which, I think is 3x or $3 per $1 [23:47] so you insert $1 and IA gets $4 [23:48] Is there a way to bulk archive pages on Android? Is there a website I can use? [23:48] so 4x? :O [23:48] 1, +3x [23:48] JAA: what about Manual projects ? [23:48] astrid: But it says "triple your impact"? [23:48] well maybe i'm misunderstanding [23:48] Arctic_: yes, webrecorder.io [23:49] Okay, thanks! [23:49] Arctic_: you'd have to register to share the captures though [23:49] ola_norsk: Do you mean running the warrior scripts manually or manually doing independent stuff? [23:50] the latter i guess [23:50] I registered earlier. [23:52] JAA: i'm not in any way being offensive or aggresive, but do i seriously have to be running Warrior software to be here in "BS" ? [23:53] Not sure. [23:53] Not at all. [23:53] skål! :D [23:54] Miiverse didn’t use Warrior. [23:54] Well, we tried to at one point, but it stared slowing down/stopping their servers and they started banning ips [23:55] But our manual jobs went through fine. [23:55] i'm mainly here secretly watching for ways to record ENTIRE twitter hashtags :/ [23:55] But our tooling was done on Windows and Linux. [23:55] And I had a bunch of Azure VMs running with the stuff I wrote. [23:56] ola_norsk: That's something I'd like to look into at some point. Also entire user accounts. Also Instagram and other services. [23:57] JAA: i'm just focused on '#netneutrality' at the moment, seeing as it its a big deal atm [23:57] astrid: On /donate, it explicitly says "That means for every dollar you donate right now, the Internet Archive will receive $4 in all." [23:57] I'm going to go contact Arian about sending us a Closedverse archive. [23:57] But yeah, it should be "quadruple your impact". [23:59] JAA: there's gonna be so much digging into 'netneutrality' , whatever way the voting goes, i figured it shouldn't just be a privilige of organizations that's got money to pay twitter for archive that should get to look into it [23:59] thumbs up emoji, ola_norsk