[00:00] *** DogsRNice has joined #archiveteam-ot [00:08] *** sHATNER has quit IRC (hub.efnet.us irc.efnet.nl) [00:08] *** Tenebrae has quit IRC (hub.efnet.us irc.efnet.nl) [00:08] *** Fusl has quit IRC (hub.efnet.us irc.efnet.nl) [00:08] *** Dallas has quit IRC (hub.efnet.us irc.efnet.nl) [00:08] *** jodizzle has quit IRC (hub.efnet.us irc.efnet.nl) [00:08] *** MrRadar2 has quit IRC (hub.efnet.us irc.efnet.nl) [00:08] *** BnAboyZ has quit IRC (hub.efnet.us irc.efnet.nl) [00:08] *** nyaomi has quit IRC (hub.efnet.us irc.efnet.nl) [00:09] *** yano has quit IRC (Read error: Operation timed out) [00:10] *** schbirid has quit IRC (Read error: Operation timed out) [00:13] *** sHATNER has joined #archiveteam-ot [00:13] *** Tenebrae has joined #archiveteam-ot [00:13] *** Fusl has joined #archiveteam-ot [00:13] *** Dallas has joined #archiveteam-ot [00:13] *** jodizzle has joined #archiveteam-ot [00:13] *** MrRadar2 has joined #archiveteam-ot [00:13] *** BnAboyZ has joined #archiveteam-ot [00:13] *** nyaomi has joined #archiveteam-ot [00:13] *** irc.efnet.nl sets mode: +oo Fusl MrRadar2 [00:14] *** Fusl_ sets mode: +o Fusl [00:15] *** yano has joined #archiveteam-ot [00:18] *** schbirid has joined #archiveteam-ot [00:18] *** Fusl sets mode: +o kiska1 [00:18] *** Fusl sets mode: +o kiskabak [00:18] *** Fusl sets mode: +o kiska [00:18] *** Fusl sets mode: +o JAA [00:18] *** Fusl sets mode: +o svchfoo3 [00:18] *** Fusl sets mode: +o jrwr [00:18] *** Fusl sets mode: +o astrid [00:18] *** Fusl sets mode: +o dxrt [00:18] *** Fusl sets mode: +o chfoo [00:18] *** Fusl sets mode: +o svchfoo1 [00:18] *** Fusl sets mode: +o ivan_ [00:19] *** Fusl sets mode: +o dxrt_ [00:19] *** Fusl sets mode: +o AlsoJAA [00:19] *** Fusl sets mode: +o arkiver [00:19] *** Fusl sets mode: +o Kenshin [00:19] *** Fusl sets mode: +o SketchCow [00:19] *** Fusl sets mode: +o Fusl_ [00:19] *** Fusl sets mode: +o hook54321 [00:19] *** Fusl sets mode: +o Kaz [00:19] *** Fusl sets mode: +o HCross [00:24] *** Dragnog2 has quit IRC (Quit: Connection closed for inactivity) [02:18] *** qw3rty111 has joined #archiveteam-ot [02:24] *** qw3rty119 has quit IRC (Read error: Operation timed out) [02:59] *** Flashfire has quit IRC (Remote host closed the connection) [02:59] *** kiska has quit IRC (Remote host closed the connection) [02:59] *** Flashfire has joined #archiveteam-ot [03:00] *** kiska has joined #archiveteam-ot [03:00] *** Fusl sets mode: +o kiska [03:00] *** Fusl_ sets mode: +o kiska [03:29] *** qw3rty112 has joined #archiveteam-ot [03:35] *** qw3rty111 has quit IRC (Read error: Operation timed out) [03:37] *** killsushi has quit IRC (Quit: Leaving) [03:51] *** odemg has quit IRC (Read error: Operation timed out) [04:06] *** odemg has joined #archiveteam-ot [04:07] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [05:48] *** godane has quit IRC (Ping timeout: 506 seconds) [05:54] *** Dragnog2 has joined #archiveteam-ot [06:01] *** killsushi has joined #archiveteam-ot [07:08] *** schbirid has quit IRC (Read error: Operation timed out) [08:20] *** bluefoo has quit IRC (Read error: Operation timed out) [08:41] *** godane has joined #archiveteam-ot [09:24] *** Dragnog2 has quit IRC (Quit: Connection closed for inactivity) [09:45] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:24] *** bluefoo has joined #archiveteam-ot [13:02] *** VerifiedJ has joined #archiveteam-ot [13:05] *** bluefoo has quit IRC (Ping timeout: 615 seconds) [15:45] *** Dragnog2 has joined #archiveteam-ot [16:27] *** schbirid has joined #archiveteam-ot [17:36] *** Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) [17:41] *** systwi_ is now known as systwi [17:49] *** bluefoo has joined #archiveteam-ot [18:18] *** Dj-Wawa has joined #archiveteam-ot [18:24] https://www.phoronix.com/scan.php?page=news_item&px=CVE-2019-1125-SWAPGS [18:28] *** nataraj has joined #archiveteam-ot [18:38] Is there anywhere I can find a list of network protocols? I'm trying to find ones like http:// https:// ftp:// etc. I'm writing a script & need to check if any of those have a . in them [18:39] And before you mention it I did look it up earlier [18:40] I was just having a difficult time finding a list that explicitly stated if a protocol can or cannot have a "." in the name [18:43] systwi: not really. you can probably find a list of common URL protocols but since technically any arbritary URL protocol can be registered and the database be extended, there can always be others [18:43] https://www.mediawiki.org/wiki/Manual:$wgUrlProtocols is a good list [18:49] *** nataraj has quit IRC (Read error: Operation timed out) [18:55] systwi: according to the spec it can be any ascii string https://url.spec.whatwg.org/#url-representation [18:56] it cannot be null but it can be an empty string too [19:00] Git can work with ssh+git:// [19:01] Better use a proper lib to parse the url and escape all chars you can't handle [19:05] *** deevious has quit IRC (Ping timeout: 252 seconds) [19:19] *** Stiletto has quit IRC (Read error: Connection reset by peer) [19:28] *** Stiletto has joined #archiveteam-ot [19:28] Ugh, well that's going to be difficult. See, in my script I need it to accept urls like: https://www.example.com/ , but also urls like www.example.com [19:29] And to filter out something that's junk, i.e. akjh.sf.dsfa://www.example.com/ [19:31] Technical that is not junk. But if you cant handle some schemes you should just ignore them. And when there is no scheme you have to define a default one for your program [19:32] urls are hard to handle universally [19:32] There might be a *nix program I can pass the URL into but I don't know of any [19:33] I remember a video of a talk, where 3 different python libraries parsed 3 different host information from the same specially crafted url [19:34] I guess I can just do that, blacklist all protocols that I know of that I don't want to use, and then for unrecognized ones tack on "https://" to the beginning and if it doesn't work, then report to the user that the URL is bad [19:34] I dont quite understand your problem. A url parsing lib should be all you need. [19:35] I'm sorry I don't understand that. This is a shell script btw [19:37] Uh ok. Kind of hard to do that with shell. Maybe a small python script which parses a url and returns all parts (scheme, host, etc) line separated? [19:37] Whatever, you will figure something out. [19:37] Sounds like a good idea except for the fact I know nothing about python :-/ But yeah I will keep looking around, thanks [19:50] systwi: This might help you: https://pastebin.com/N7xiCErW [19:51] Thats how you would parse a url in python. You can then print whatever properties you want and use then in your shell script [19:55] *** VerifiedJ has quit IRC (Quit: Leaving) [20:04] Thanks h3ndr1k I'll give that a try [20:05] np [20:13] So, I'm probably doing it wrong, but I saved that pastebin content into a .py file and ran it in my Terminal like so: $ ./url_test.py [20:14] That gave an exit code of 1 which would make sense since there is no URL [20:14] Trying: $ ./url_test.py irc://example.com [20:14] Outputted some text to the screen giving an exit code of 0. [20:15] The exact same result happen with http://example.com [20:15] :-/ [20:24] Yeah it is just a template. In the moment there is a constant string passed to the parser instead of the argument [20:25] You just have to replace 'ab...://......' with url [20:26] systwi: ^ so that you have urlparse(url) in there [20:27] Ok thanks let my try that. Is there a way I can pass argument 1 into the (url) place? e.g. in bash it would be "$1" [20:28] sys.argv[1] is exactly $1 [20:28] And the line url=sys.argv[1] essentially asigns $1 to $url (In shell terms :) ) [20:29] Ohhh okay cool let me try that [20:30] Thanks for the shell terms btw makes more sense to me :) [20:31] :) [20:48] It looks like it's working well so far, however I tried filling in a junk input ( $ ./url_test.py htt://shttp:example.com/ ) and it kinda killed python: [20:49] $ ./url.py htt://shttp:example.com/ [20:49] htt [20:49] shttp [20:49] Traceback (most recent call last): [20:49] File "./url.py", line 16, in [20:49] print(parsed.port) [20:49] File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/parse.py", line 169, in port [20:49] port = int(port, 10) [20:49] ValueError: invalid literal for int() with base 10: 'example.com' [20:49] That's okay, though, I can just check the user input beforehand to make sure a bad url like that isn't passed to url_test.py [21:04] systwi: We could wrap the parsing in a try-catch-block and exit with 1 in case of such an error. But Im about to go to bed now. I may find something tomorrow. [21:05] The error says, that the port (after the colon) is not a number btw. [21:05] No worries, see you tomorrow, thanks! [21:19] i found this in my irc logs from a few years ago; it seems to be a collection of over a billion reddit comments from 2007 to 2015 according to the readme, magnet:?xt=urn:btih:7690f71ea949b868080401c749e878f98de34d3d&dn=reddit_data [21:20] yano: What format? [21:20] "Each file is compressed with bzip2 compression. When uncompressed, each file [21:20] is a series of JSON blocks delimited by new lines (\n). The name of each file [21:20] follows the format RC_yyyy-mm.bz2 where yyyy is the year and mm is the month. [21:20] RC stands for "reddit comments."" [21:21] this is the readme file, https://0bin.net/paste/qzH7ebO+eKGBGwKn#6m7gsYDKUNcX0pbKg56Sfh4qLGWfuC2IEEBz26HnSva [21:22] it's about 150 GB [21:23] *** Jens has quit IRC (Remote host closed the connection) [21:24] *** Jens has joined #archiveteam-ot [23:54] *** Dragnog2 has quit IRC (Quit: Connection closed for inactivity) [23:57] *** Flashfire has quit IRC (Remote host closed the connection) [23:57] *** kiska has quit IRC (Remote host closed the connection) [23:57] *** Flashfire has joined #archiveteam-ot [23:58] *** kiska has joined #archiveteam-ot [23:58] *** Fusl sets mode: +o kiska [23:58] *** Fusl_ sets mode: +o kiska