[00:04] *** robbierut has quit IRC (Read error: Operation timed out) [00:11] *** LowLevelM has joined #archiveteam-ot [00:15] When this google+ thing is over I have a project idea [00:16] We already have at least three projects in the queue I think, but what's your idea? [00:17] What are the projects in the queue [00:17] ? [00:18] JamiiForums, Reddit, and HardForum (in no particular order) [00:19] JAA: The project is to re-archive Thingiverse. I have already made a python script to download things, but it is broken, and far too large for me to do on my own. [00:19] We really really need to get to Jamii Forums because its still at risk [00:19] Reddit is massive and needs to be looked into further [00:19] LowLevelM: How large is it? [00:19] HardForum I have no idea what it is [00:19] 3.5 million things at the moment [00:19] plus the forum [00:20] It will be super easy to archive, as the ids are an integer, and it has a JSON api. [00:20] hardforum is a background job by new standards and not high priority [00:21] Flashfire: https://hardforum.com/ Hardware discussions, 15 million posts, and at risk of disappearing because the owner no longer works at HardOCP. [00:21] *** marked has quit IRC (Quit: WeeChat 2.2) [00:22] https://www.hardocp.com/article/2019/03/19/goodbye_hardocp_hello_intel/ [00:23] LowLevelM: I meant more in terms of data size. 3.5 million doesn't sound too bad, but if each thing is measured in megabytes, it gets messy. [00:23] JAA he said no changes will be coming to the website/forum, will change the owner and keep going [00:23] VADemon: Ah, that's good to hear. [00:23] each thing plus the photos is a few megabytes [00:24] JAA: isn't reddit already archived by some guy? [00:24] VADemon: In that case, it might not even be worth a warrior project but just an independent long-term grab. [00:24] this one i mean: http://files.pushshift.io/reddit/ [00:24] phiresky: Yes, kind of, but not in a format that is accessible to most people. [00:24] We want to grab it such that it can be viewed in the Wayback Machine. [00:25] Thingiverse has gotten super slow in the past months, and shows signs of being forgotten by it's parent company; Makerbot. [00:25] are the warcs from google+ etc accessible in the wayback machine? [00:25] Hmm, why are we having this conversation in -ot? Let's move this to -bs. [00:25] JAA it's a Xenforo forum and I'd like to grab bukkit.org forums, it's years of Minecraft administration and we've seen what Wikia has done to MC Forums [00:25] *** marked has joined #archiveteam-ot [00:26] imho worth to make a new grab script for Xenforo forum-types alone [00:26] -> #archiveteam-bs [00:45] *** marked has quit IRC (Read error: Operation timed out) [00:49] *** marked has joined #archiveteam-ot [00:58] *** killsushi has joined #archiveteam-ot [00:59] *** Evie has joined #archiveteam-ot [01:30] *** LowLevelM has left [01:50] *** robbierut has joined #archiveteam-ot [02:07] *** Exairnous has joined #archiveteam-ot [02:46] *** ephemer0l has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [02:57] *** rnduser_ has joined #archiveteam-ot [02:58] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [03:01] *** rnduser has quit IRC (Ping timeout: 252 seconds) [03:09] *** DustinV has joined #archiveteam-ot [03:33] *** qw3rty115 has joined #archiveteam-ot [03:36] *** qw3rty114 has quit IRC (Ping timeout: 600 seconds) [03:47] *** IanR has joined #archiveteam-ot [04:02] *** Stiletto has quit IRC () [04:03] *** odemg has quit IRC (Ping timeout: 615 seconds) [04:04] *** ephemer0l has joined #archiveteam-ot [04:08] *** Stiletto has joined #archiveteam-ot [04:09] *** odemg has joined #archiveteam-ot [04:22] *** DustinV has quit IRC (Read error: Connection reset by peer) [04:32] *** m007a83_ is now known as m007a83 [04:50] *** dhyan_nat has joined #archiveteam-ot [05:08] *** DustinV has joined #archiveteam-ot [05:36] *** robbierut has quit IRC (Read error: Connection reset by peer) [05:37] any way to poke a warrior out of throttle redirect sleep? [05:50] tell me more about this soft ratelimit? a power outage deprived me of this chat [05:51] or if this isn't on topic enough t'll go back to main with this question ;-) [06:04] *** jut has quit IRC (Ping timeout: 252 seconds) [06:05] *** icedice has quit IRC (Quit: Leaving) [06:11] *** jut has joined #archiveteam-ot [06:20] *** cutepillo has joined #archiveteam-ot [06:20] is this where the anime is [06:21] anime and ignored questions about google soft rate limiting ips [06:24] easy answer is no [06:24] harder answer is I don't k(no)w [06:29] *** BlueMax has quit IRC (Quit: Leaving) [06:33] I appreciate the response [06:34] *** Exairnous has quit IRC (Read error: Operation timed out) [06:38] Damn how many upcoming projects do we have?! [06:39] well, I hear a live action big robot marathon is on the horizon [06:41] I hear the noise of rsync targets crying in a corner [06:42] under or over loaded? [06:42] I've lost my pile of graphana tabs in an outage [06:42] targets are doing very well, don't think we're hitting slot limits anywhere [06:45] *** deevious has joined #archiveteam-ot [06:59] *** julientm has joined #archiveteam-ot [07:07] *** BlueMax has joined #archiveteam-ot [07:16] *** robbierut has joined #archiveteam-ot [07:21] *** eythian has joined #archiveteam-ot [07:31] I remember when I did the tumblr project we made FoS cry [08:06] *** Joseph__ has joined #archiveteam-ot [08:07] *** VerifiedJ has quit IRC (Read error: Connection reset by peer) [08:17] *** julientm has quit IRC (Remote host closed the connection) [08:19] any thoughts and theories on the rate limiting are welcome [08:21] Is it still happening? [08:32] *** DustinVF has joined #archiveteam-ot [08:32] what's the nature of Google's blocking? [08:32] *** julientm has joined #archiveteam-ot [08:33] *** DustinV has quit IRC (Ping timeout: 252 seconds) [08:33] Google has an edge firewall that looks at HTTP request headers (and no-routes you if unhappy) and application firewalls with things like request-per-day limits [08:34] *** julientm has quit IRC (Remote host closed the connection) [08:36] *** DustinVF has quit IRC (Read error: Operation timed out) [08:37] *** julientm has joined #archiveteam-ot [08:39] *** IanR has quit IRC (Read error: Connection reset by peer) [08:39] *** DustinV has joined #archiveteam-ot [08:40] *** julientm has quit IRC (Remote host closed the connection) [08:40] *** IanR has joined #archiveteam-ot [08:40] *** julientm has joined #archiveteam-ot [08:56] *** IanR has quit IRC (Read error: Connection reset by peer) [08:56] *** IanR has joined #archiveteam-ot [08:58] *** julientm has quit IRC (Remote host closed the connection) [08:59] firefox and the long form of the google minus tracker seem to have conspired against my system, had to reset out of a swap storm [08:59] *** julientm has joined #archiveteam-ot [09:12] *** julientm has quit IRC (Remote host closed the connection) [09:12] *** julientm has joined #archiveteam-ot [09:19] *** MR9K has quit IRC (Read error: Connection reset by peer) [09:21] *** MR9K has joined #archiveteam-ot [09:24] IanR: I have had that happen also [09:47] *** ryry has quit IRC (Ping timeout: 260 seconds) [09:54] ye, that page seems to have an issue with memory leaking (perticularly if you hit show all) [10:00] *** julientm has quit IRC (Remote host closed the connection) [10:11] *** julientm has joined #archiveteam-ot [10:11] *** BlueMax has quit IRC (Quit: Leaving) [10:25] *** jesso has joined #archiveteam-ot [10:36] chrome hasn't nuked me yet, but also doesn't work as well, exciting choices! [10:44] *** Oddly has joined #archiveteam-ot [11:07] *** jesso has quit IRC (Quit: jesso) [11:10] *** jesso has joined #archiveteam-ot [11:31] *** killsushi has quit IRC (Quit: Leaving) [11:46] *** dhyan_nat has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** Mateon1 has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** SketchCow has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** ats has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** betamax has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** noirscape has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** argus has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** asie has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** Tenebrae has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** MrRadar2 has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** BnAboyZ has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** Frogging has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** jodizzle has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** VoynichCr has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** t2t2 has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** wp494 has quit IRC (hub.efnet.us irc.efnet.nl) [11:46] *** Hintswen has quit IRC (hub.efnet.us irc.efnet.nl) [11:50] *** KoalaBear has quit IRC (Read error: Operation timed out) [11:51] Does anyone know what would be a good way to go about scraping, a profiled media resource from a portal, on my webbrowser. Opening the video media url and saving it by numeric sequence is what I want to automate. Any idea how I can go about that ? [11:53] *** dhyan_nat has joined #archiveteam-ot [11:53] *** t2t2 has joined #archiveteam-ot [11:53] *** Mateon1 has joined #archiveteam-ot [11:53] *** wp494 has joined #archiveteam-ot [11:53] *** Hintswen has joined #archiveteam-ot [11:53] *** SketchCow has joined #archiveteam-ot [11:53] *** ats has joined #archiveteam-ot [11:53] *** betamax has joined #archiveteam-ot [11:53] *** noirscape has joined #archiveteam-ot [11:53] *** argus has joined #archiveteam-ot [11:53] *** asie has joined #archiveteam-ot [11:53] *** Tenebrae has joined #archiveteam-ot [11:53] *** MrRadar2 has joined #archiveteam-ot [11:53] *** BnAboyZ has joined #archiveteam-ot [11:53] *** Frogging has joined #archiveteam-ot [11:53] *** jodizzle has joined #archiveteam-ot [11:53] *** VoynichCr has joined #archiveteam-ot [11:53] *** Fusl sets mode: +o SketchCow [11:59] https://i.imgur.com/9XuBgDg.png I need to always pull the same resource,a.mp4 it comes with a policy and a refferal code on the url. How can I automate this? [12:02] Well, where are those Policy and referrer values coming from? [12:04] JAA, they are not really important since I don't abuse any server loads, it's just automating, a small batch that is the issue, [12:05] Right now, I manually opening the link, and instead of JS video player, I get html5 firefox player, and can click save-as [12:05] julientm: But you probably need those values to download the correct video. [12:05] okay well it is from my local library [12:06] I am using the online services, to view some videos, with safari [12:06] Yeah I always use them [12:06] just looking to automate the flow instead of doing it manually [12:07] Mhm [12:08] Well, you need to figure out how to construct the URL that gives you the video file. [12:09] it randomly changes, every segment of video, and they split the video into every topic, so one book can have 72 5 minute videos. I am looking to download locally and then just add in a vlc playlist. [12:09] here is what I am working with [12:10] Does the player download a .m3u or .m3u8 file? [12:11] let me get you some graphics 2 secs [12:26] JAA, https://youtu.be/ER3isecQ334 https://ghostbin.com/paste/6cg72/raw [12:26] *** julientm has quit IRC (Read error: Connection reset by peer) [12:34] *** julientm_ has joined #archiveteam-ot [12:34] JAA, sorry irc rebooted [12:35] JAA, were you able to take a look at it? , https://youtu.be/ER3isecQ334 https://ghostbin.com/paste/6cg72/raw ? [12:39] *** Despatche has joined #archiveteam-ot [12:48] *** julientm_ has quit IRC (Remote host closed the connection) [12:48] *** julientm_ has joined #archiveteam-ot [12:52] julientm_: Yeah, but I can't really help you much with that information. As I said, you need to figure out where that a.mp4 URL comes from since there's no way you'll guess it. Chances are it's either in a playlist file (.m3u or .m3u8) or somehow retrieved with JavaScript. But without direct access to the website, being able to see all requests with every detail, and playing around with them, there's [12:52] no way I can tell you how you have to do it. [13:22] okay thank you JAA [13:25] can also try youtube-dl it has some logic for common hosting methods afaik [13:27] *** Wizzito has joined #archiveteam-ot [13:45] phiresky, JAA so I managed to get it down by manually, clicking on the videos and exporting .har file and then using bash to process text and curl to get videos [13:50] *** robbierut has quit IRC (Read error: Operation timed out) [13:50] *** robbierut has joined #archiveteam-ot [13:56] Ahaha, nice one: https://marc.info/?l=openbsd-tech&m=155407864604288&w=2 [13:56] (Context for those who missed it: https://twitter.com/RedTeamPT/status/1110843396657238016 ) [13:59] haha [14:00] *** julientm_ has quit IRC (Read error: Connection reset by peer) [14:04] *** julientm has joined #archiveteam-ot [14:09] *** robbierut has quit IRC (Ping timeout: 360 seconds) [14:09] *** robbierut has joined #archiveteam-ot [14:12] *** robbierut has quit IRC (Read error: Connection reset by peer) [14:13] *** robbierut has joined #archiveteam-ot [14:18] *** DustinVF has joined #archiveteam-ot [14:22] *** DustinVFP has joined #archiveteam-ot [14:22] *** deevious has quit IRC (Quit: deevious) [14:22] *** DustinVFP is now known as otherDust [14:23] *** DustinVF has quit IRC (Read error: Operation timed out) [14:27] *** DustinV has quit IRC (Read error: Operation timed out) [14:27] *** otherDust is now known as DustinV [14:33] *** deevious has joined #archiveteam-ot [14:40] *** Wizzito has quit IRC (Quit: Leaving) [14:50] *** DustinV has quit IRC (Remote host closed the connection) [14:51] *** DustinV has joined #archiveteam-ot [14:51] *** DustinV has quit IRC (Read error: Connection reset by peer) [14:52] *** DustinV has joined #archiveteam-ot [15:14] *** cutepillo has quit IRC (Read error: Operation timed out) [15:36] *** julientm has quit IRC (Ping timeout: 252 seconds) [15:40] *** julientm has joined #archiveteam-ot [15:44] *** julientm has quit IRC (Remote host closed the connection) [15:45] *** julientm has joined #archiveteam-ot [16:14] *** Dj-Wawa has joined #archiveteam-ot [16:23] *** dhyan_nat has quit IRC (Read error: Operation timed out) [16:27] *** robbierut has quit IRC (Read error: Operation timed out) [16:27] *** robbierut has joined #archiveteam-ot [16:54] *** Joseph__ has quit IRC (Read error: Connection reset by peer) [16:55] *** VerifiedJ has joined #archiveteam-ot [17:49] *** robbierut has quit IRC (Read error: Connection reset by peer) [17:50] *** robbierut has joined #archiveteam-ot [17:56] *** Oddly has quit IRC (Ping timeout: 257 seconds) [18:20] *** marked has quit IRC (Read error: Operation timed out) [18:22] *** Oddly has joined #archiveteam-ot [18:22] *** Exairnous has joined #archiveteam-ot [18:25] *** marked has joined #archiveteam-ot [18:47] *** robbierut has quit IRC (Read error: Connection reset by peer) [18:47] *** robbierut has joined #archiveteam-ot [18:50] *** Exairnous has quit IRC (Remote host closed the connection) [18:51] *** Exairnous has joined #archiveteam-ot [18:52] *** icedice has joined #archiveteam-ot [18:56] *** Oddly has quit IRC (Ping timeout: 255 seconds) [19:00] *** robbierut has quit IRC (Read error: Connection reset by peer) [19:02] *** Odd0002_ has joined #archiveteam-ot [19:02] *** julientm has quit IRC (Read error: Connection reset by peer) [19:02] *** robbierut has joined #archiveteam-ot [19:03] *** Despatche has quit IRC (Read error: Operation timed out) [19:04] *** Exairnous has quit IRC (Read error: Operation timed out) [19:06] *** Odd0002 has quit IRC (Ping timeout: 615 seconds) [19:06] *** Odd0002_ is now known as Odd0002 [19:12] *** Exairnous has joined #archiveteam-ot [19:17] *** dhyan_nat has joined #archiveteam-ot [19:22] *** Exairnous has quit IRC (Ping timeout: 615 seconds) [19:29] *** robbierut has quit IRC (Read error: Connection reset by peer) [19:30] *** DustinV has quit IRC (Ping timeout: 600 seconds) [19:31] *** robbierut has joined #archiveteam-ot [19:43] *** simon816 has quit IRC (Read error: Operation timed out) [19:43] *** dashcloud has quit IRC (Read error: Operation timed out) [19:45] *** ivan has quit IRC (Ping timeout: 246 seconds) [19:45] *** JAA has quit IRC (Ping timeout: 246 seconds) [19:45] *** ivan has joined #archiveteam-ot [19:46] *** logres133 has joined #archiveteam-ot [19:46] *** dashcloud has joined #archiveteam-ot [19:54] *** Stilett0 has joined #archiveteam-ot [19:57] *** julientm has joined #archiveteam-ot [19:58] *** julientm has quit IRC (Remote host closed the connection) [19:58] *** Stilett0 has quit IRC (Ping timeout: 252 seconds) [19:58] *** Stiletto has quit IRC (Read error: Operation timed out) [19:58] *** Stiletto has joined #archiveteam-ot [19:58] *** julientm has joined #archiveteam-ot [19:59] *** julientm has quit IRC (Remote host closed the connection) [20:01] *** julientm has joined #archiveteam-ot [20:03] *** Stilett0 has joined #archiveteam-ot [20:05] *** Stiletto has quit IRC (Ping timeout: 255 seconds) [20:10] *** robbierut has quit IRC (Read error: Operation timed out) [20:10] *** robbierut has joined #archiveteam-ot [20:27] *** robbierut has quit IRC (Read error: Connection reset by peer) [20:27] *** robbierut has joined #archiveteam-ot [20:29] *** Despatche has joined #archiveteam-ot [20:37] *** martini has joined #archiveteam-ot [20:43] *** simon816 has joined #archiveteam-ot [20:44] *** Stiletto has joined #archiveteam-ot [20:44] *** JAA has joined #archiveteam-ot [20:44] *** Fusl sets mode: +o JAA [20:45] *** bakJAA sets mode: +o JAA [20:48] *** Stilett0 has quit IRC (Ping timeout: 615 seconds) [20:49] *** tuluu_ has quit IRC (Ping timeout: 265 seconds) [20:49] *** dhyan_nat has quit IRC (Read error: Operation timed out) [20:59] *** tuluu has joined #archiveteam-ot [21:10] That's the future of Internet of Things. [21:25] *** robbierut has quit IRC (Read error: Connection reset by peer) [21:25] http://time.spacescience.tech/ this is for reddit's new april fools thing [21:25] /r/sequence/new/ auto timelapse [21:25] *** robbierut has joined #archiveteam-ot [21:27] *** kode54 has quit IRC (Quit: ZNC 1.7.2 - https://znc.in) [21:35] *** kode54 has joined #archiveteam-ot [21:49] jrwr: tl;dr this subreddit [21:49] ? [21:52] Kaz: April Fools event by Reddit I believe. [22:09] *** rnduser has joined #archiveteam-ot [22:09] *** BlueMax has joined #archiveteam-ot [22:12] *** rnduser_ has quit IRC (Ping timeout: 252 seconds) [22:32] *** Exairnous has joined #archiveteam-ot [22:43] *** Exairnous has quit IRC (Ping timeout: 615 seconds) [22:46] *** rnduser has quit IRC (Read error: Connection reset by peer) [22:46] *** rnduser has joined #archiveteam-ot [22:53] *** martini has quit IRC (Quit: No Reasson) [22:54] *** rnduser has quit IRC (Read error: Connection reset by peer) [22:54] *** rnduser has joined #archiveteam-ot